Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Senior algorithm expert at Alibaba Group working on large language models, multimodal large language models, and diffusion models.

Researcher on Alibaba's Qwen team focused on large language models and NLP, with public research profiles listing a Nankai University background.

Alibaba Qwen report author whose DBLP record includes Qwen2.5-VL and Qwen technical report work on multimodal and large language models.

Alibaba Qwen report author whose DBLP profile identifies an Alibaba Group affiliation and Qwen technical report authorship.

Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.

Researcher focused on large language models and multimodal learning, with public profiles linking Keqin Chen to Beihang University and to Qwen vision-language model work.

Xuejing Liu is a researcher whose public OpenReview profile includes the Qwen2-VL and Qwen2.5-VL technical report papers.

Research scientist in Tongyi Lab and contributor to Qwen2-VL, with public work on multimodal large language models.

Research scientist in Tongyi Lab and maintainer of Qwen-VL, with public work on vision-language models.

Jun Tang works on multimodal foundation models, open-source language models, and agent systems. His personal site highlights work on Qwen and Qwen3-VL alongside related multimodal research.

Research scientist in Tongyi Lab and a major contributor to Qwen2-VL, with public work on multimodal foundation models.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.

Research scientist in Tongyi Lab and technical lead of Qwen2-VL, with public work on vision-language models.

Research scientist in Alibaba DAMO Academy's Tongyi Lab working on multimodal learning, vision-language models, and embodied AI; author on the Qwen2-VL and Qwen2.5-VL technical reports.

Research scientist in Alibaba DAMO Academy's Tongyi Lab working on machine learning, computer vision, and multimodal large language models; author on the Qwen2-VL and Qwen2.5-VL technical reports.

Research scientist at Alibaba working on multimodal learning and generation; previously a postdoctoral researcher at Carnegie Mellon University.

Tongyi Lab researcher working on large language models, vision-language models, and reinforcement learning; public profiles connect Zheren Fu to the Qwen2-VL technical report.

Researcher at OpenAI whose homepage highlights work on document understanding, coding agents, and computer-use agents.

Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.

Xi Zhang works on multimodal and vision-language model research. Public profiles connect him to Qwen2-VL and related open research projects.

Research scientist on the Qwen team at Alibaba Group, focusing on foundation models and language agents. He received a PhD in computer science from the University of Illinois Urbana-Champaign.

Qwen researcher and author on the Qwen2-VL and Qwen2.5-VL technical reports, with public profiles linking his work to multimodal and vision-language systems.

Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.

Canonical link

Shuai Bai

Kai Dang

Wenbin Ge

Peng Wang

Shijie Wang

Keqin Chen

Xuejing Liu

Jialin Wang

Sibo Song

Jun Tang

Humen Zhong

Yuanzhi Zhu

Mingkun Yang

Zhaohai Li

Jianqiang Wan

Pengfei Wang

Wei Ding

Zheren Fu

Yiheng Xu

Jiabo Ye

Xi Zhang

Tianbao Xie

Zesen Cheng

Hang Zhang