Atlas / Fields / Detail
Vision-Language Models
Researchers connected to this field in the public atlas.
Daya Guo
DeepSeek / Moonshot AI
AI researcher at DeepSeek working on natural language processing, code intelligence, and large language model reasoning.
Radu Soricut
Google Gemini
Radu Soricut is a Distinguished Scientist at Google DeepMind working on natural language processing and machine learning, with earlier Google Research and Google Translate work.
Haoyu Lu
DeepSeek / Moonshot AI
Haoyu Lu is a Ph.D. student at Renmin University of China working on multimodal foundation models and video understanding. His homepage highlights papers and code including DeepSeek-VL, UniAdapter, and VDT.
Wanli Ouyang
MiniMax
Wanli Ouyang is a professor at Shanghai AI Laboratory. His homepage says he is also with MMlab and the SIGMA lab, obtained a PhD from the Chinese University of Hong Kong, and works on AI4Science, computer vision, and pattern recognition.
Yue Cao
DeepSeek
CEO of Sand AI. His homepage describes prior work leading multimodal and vision research at BAAI and serving as a senior researcher at Microsoft Research Asia.
Jifeng Dai
Shanghai AI Laboratory
Jifeng Dai is a tenured associate professor in electronic engineering at Tsinghua University and founder of Fundamental Vision. His research spans computer vision, deep learning, multimodal learning, and autonomous driving. He previously worked at Microsoft Research Asia and SenseTime Research, and he received both his bachelor's and PhD degrees from Tsinghua University.
Junyang Lin
Alibaba Qwen
Junyang Lin (Justin Lin) is a researcher and open-source maintainer known for the Qwen family of models. His public profiles list interests in LLMs, AI agents, multimodal learning, long-horizon reasoning, world models, and reinforcement learning; multiple March 2026 news reports said he stepped down from the Qwen tech lead role.
Jifeng Dai
DeepSeek / MiniMax
Jifeng Dai is a tenured associate professor in the Department of Electronic Engineering at Tsinghua University. His homepage says his current research focuses on agentic AI and continual learning, and lists prior roles at Shanghai AI Lab, SenseTime Research, and Microsoft Research Asia.
Xingcheng Yao
Moonshot AI
Xingcheng Yao is a research scientist at Moonshot AI. His public profile notes prior work as a research engineer at Tencent AI Lab, a PhD in computer science from the University of Southern California, and research interests spanning NLP, multimodal systems, and AI agents.
Xinyi Chen
Google Gemini
Xinyi Chen is a PhD candidate in computer science at Princeton University and concurrently a research scientist at Google DeepMind. Her public homepage says she works at the intersection of machine learning, optimization, and dynamical systems, focusing on robust and efficient methods for sequential decision-making and control, and that she previously completed undergraduate studies in mathematics at Princeton.
Hao Zhang
Moonshot AI / NVIDIA
Researcher at NVIDIA Research. Previously a PhD student in Computer Science and Engineering at HKUST, with earlier internships at International Digital Economy Academy and Microsoft Research.
Luke Zettlemoyer
Ai2
Luke Zettlemoyer works on empirical methods for natural language semantics, machine learning, new tasks and datasets, and self-supervision for pre-training.
Pengyu Cheng
Moonshot AI
Pengyu Cheng is a researcher at Alibaba Group leading reinforcement-learning training for the Qwen large-model application team. His homepage also lists prior work with Moonshot AI and Tencent's Hunyuan large-model team.
Hao Yang
DeepSeek / Moonshot AI
Hao Yang works on multimodal data infrastructure at Moonshot.ai. He previously worked at ByteDance ICVG and Microsoft Research Asia, and received BS and PhD degrees from Tsinghua University.
Yiheng Xu
Alibaba Qwen
Researcher at OpenAI whose homepage highlights work on document understanding, coding agents, and computer-use agents.
Runxin Xu
DeepSeek
Researcher at DeepSeek whose public homepage describes work on DeepSeek R1, V1, V2, V3, Math, Coder, and mixture-of-experts systems.
Jiahui Yu
Google Gemini
Jiahui Yu is a Research Lead at OpenAI leading the Perception team. His homepage notes prior co-leadership on Gemini Multimodal at Google DeepMind and work on deep learning and high-performance computing.
Shuai Bai
Alibaba Qwen
Senior algorithm expert at Alibaba Group working on large language models, multimodal large language models, and diffusion models.
Jingren Zhou
MiniMax / Moonshot AI
Jingren Zhou is Chief Technology Officer of Alibaba Cloud. Public speaker biographies describe him as a computer scientist and entrepreneur whose work includes large-scale AI and cloud systems.
Jian Yang
Alibaba Qwen
Jian Yang is an Associate Professor at Beihang University whose research focuses on code intelligence, large language models, and AI agents. He worked with Alibaba Qwen from 2023 to July 2025.
Huazuo Gao
DeepSeek
Researcher at DeepSeek AI working on decision-making and post-training for large language models.
Jiabo Ye
Alibaba Qwen
Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.
Jinze Bai
Alibaba Qwen
PhD student at The Hong Kong University of Science and Technology (Guangzhou) whose research interests include large language models, vision-language models, AI agents, and multimodal retrieval.
Yulun Du
Moonshot AI
Yulun Du is a Moonshot AI-affiliated researcher. Public profiles also show prior work and study at Carnegie Mellon University, including a Master of Language Technologies completed in 2020.
Siyuan Li
Google Gemini / NVIDIA
Siyuan Li is a research scientist at NVIDIA working on large language models, multimodal foundation models, and reinforcement learning. His homepage says he received a PhD in computer science from the University of Toronto in 2024 and previously worked at Meta AI, Microsoft Research, and Mila.
Flood Sung
Moonshot AI
Researcher and engineer focused on reinforcement learning and embodied intelligence; his public profile lists work spanning Huawei Noah's Ark Lab, Momenta, Moonshot AI, and XVI Robotics, and he is credited on Moonshot AI technical reports.
Liang Chen
Moonshot AI
Research scientist at Moonshot AI working on foundation models, multimodal large language models, and agents; previously worked at Huawei Noah's Ark Lab and studied at the Chinese University of Hong Kong.
Wei Ding
Alibaba Qwen
Research scientist at Alibaba working on multimodal learning and generation; previously a postdoctoral researcher at Carnegie Mellon University.
Yixiao Ge
Shanghai AI Laboratory
Yixiao Ge is a Research Scientist at Shanghai AI Laboratory and OpenGVLab. His work focuses on multimodal large language models, computer vision, efficient deep learning, and vision-language understanding.
Congcong Wang
Moonshot AI
Research scientist at Moonshot AI focused on large multimodal models and large language model post-training.
Deyao Zhu
DeepSeek
Researcher focused on AGI, multimodal models, and reasoning. Coauthor of Janus and JanusFlow.
Dongliang Wang
Moonshot AI
Dongliang Wang is a research scientist at Moonshot AI whose public profiles highlight multimodal large language models. His homepage also notes earlier PhD work at Shanghai AI Lab and Shanghai Jiao Tong University.
Huabin Zheng
Moonshot AI
Huabin Zheng is a research scientist at Moonshot AI. His homepage says he works on large language models, multi-agent systems, code generation, and game agents.
Jun Tang
Alibaba Qwen
Jun Tang works on multimodal foundation models, open-source language models, and agent systems. His personal site highlights work on Qwen and Qwen3-VL alongside related multimodal research.
Junzhe Pan
DeepSeek
PhD student at Tsinghua University focusing on multimodal large language models, reasoning, and reinforcement learning.
Keqin Chen
Alibaba Qwen
Researcher focused on large language models and multimodal learning, with public profiles linking Keqin Chen to Beihang University and to Qwen vision-language model work.
Tao Yu
Moonshot AI
Assistant Professor of Computer Science at the University of Hong Kong and director of XLANG Lab, focusing on natural language processing and embodied AI agents.
Xiaoqian Shen
DeepSeek
PhD student at Tsinghua University focusing on LLM reasoning, RLHF, and multimodal large language models; research intern at DeepSeek.
Zesen Cheng
Alibaba Qwen
Qwen researcher and author on the Qwen2-VL and Qwen2.5-VL technical reports, with public profiles linking his work to multimodal and vision-language systems.
Andrea Steiner
Google Gemini
Research scientist at Google DeepMind working on multimodal generative models, visual generation, and image editing; previously completed a PhD at TU Munich.
Jiahao Liu
Alibaba Qwen
Jiahao Liu works on multimodal large language models, reasoning systems, and continual learning. His public profiles connect him to the Qwen2.5-VL technical report and related open research work.
Sangho Lee
Ai2
Researcher at the Allen Institute for AI (Ai2) working on vision-language and multimodal AI, with a focus on reliable reasoning and understanding beyond text.
Xi Zhang
Alibaba Qwen
Xi Zhang works on multimodal and vision-language model research. Public profiles connect him to Qwen2-VL and related open research projects.
Noah A. Smith
Ai2
Noah A. Smith is a computer scientist and professor at the University of Washington, where he serves as Vice Provost for Artificial Intelligence and co-directs the OLMo open language modeling effort with Ai2. His research focuses on natural language processing, machine learning, and evaluation methodology.
Caiming Xiong
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the xLAM-2 Technical Report.
Mingkun Yang
Alibaba Qwen
Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.
Angang Du
Moonshot AI
Research Scientist at Moonshot AI whose public work focuses on large language models, multimodal models, and embodied AI; he previously earned a PhD from Zhejiang University and was a visiting student at Oxford.
Jianlin Su
Moonshot AI
Research scientist and writer behind Scientific Spaces whose public profile lists work on large language models and service on the Kimi team at Moonshot AI.
Jianqiang Wan
Alibaba Qwen
Research scientist in Alibaba DAMO Academy's Tongyi Lab working on multimodal learning, vision-language models, and embodied AI; author on the Qwen2-VL and Qwen2.5-VL technical reports.
Lucas Beyer
Google Gemini
Lucas Beyer is an ML researcher at Google DeepMind in Zurich. His public homepage highlights prior work at Google Brain and a PhD at ETH Zurich.
Maarten Sap
Ai2
Maarten Sap is an assistant professor at the University of Washington and a senior research scientist at the Allen Institute for AI. His work focuses on human-centered language technologies and social NLP.
Nikolay Savinov
Google Gemini
Research scientist at Google DeepMind on the Gemini team, working on multimodal AI.
Qing Yu
DeepSeek
Researcher at DeepSeek and a first-year computer science PhD student at the University of Science and Technology of China; works on multimodal reasoning and world models; coauthor of Janus.
Shaowei Liu
Moonshot AI
Researcher working on multimodal learning and vision-language systems, with public academic work on visual question answering and related topics.
Y. Charles
Moonshot AI
Research scientist at Moonshot AI focused on multimodal large language models.
Yunfei Chu
Alibaba Qwen
Algorithm expert at Alibaba Group working on computer vision, multimodal learning, and large language models.
Yuqi Wang
DeepSeek
Research scientist at DeepSeek and PhD student at the University of Illinois Urbana-Champaign working on multimodal foundation models, large language models, and embodied AI.
Zaida Zhou
Moonshot AI
Associate research scientist at Moonshot AI based in Beijing, China; previously worked as a postdoctoral researcher.
Zhibo Yang
Alibaba Qwen
Zhibo Yang works on multimodal and vision-language systems. Public profiles connect him to the Qwen2.5-VL technical report and to an individual GitHub account that links back to his personal site.
Zhiqi Huang
Moonshot AI
Machine learning researcher at Moonshot AI and incoming assistant professor at Shanghai Jiao Tong University.
Junxiao Song
DeepSeek
DeepSeek report author whose DBLP record includes DeepSeek LLM, DeepSeekMath, DeepSeek-Coder-V2, DeepSeek-V3, DeepSeek-R1, Janus, and JanusFlow work.
Haowei Zhang
DeepSeek
DeepSeek report author whose DBLP-linked publication record includes DeepSeek LLM, DeepSeek-Coder-V2, Janus, DeepSeek-V3, and DeepSeek-R1 work.
Dejian Yang
DeepSeek
DeepSeek team member and co-author of the DeepSeek-V3, DeepSeek-V2, and DeepSeek LLM technical reports.
Peng Wang
Alibaba Qwen
Alibaba Qwen report author whose DBLP profile identifies an Alibaba Group affiliation and Qwen technical report authorship.
Wenbin Ge
Alibaba Qwen
Alibaba Qwen report author whose DBLP record includes Qwen2.5-VL and Qwen technical report work on multimodal and large language models.
Yonghui Wu
Google Gemini
Google researcher whose official profile says he joined Google in September 2008 and has been with Google Brain since January 2015, with research interests spanning information retrieval, machine learning, machine translation, and natural language processing.
Yejin Choi
Ai2
Dieter Schwarz Foundation Professor and Senior Fellow in Stanford Computer Science and HAI. Her public homepage notes previous roles as professor at the University of Washington and senior director at Ai2.
Leonardo Beyer
Google Gemini
Leonardo Beyer is a research scientist at Google DeepMind. His public homepage highlights work across representation learning, multimodal models, and large-scale machine learning systems.
Nuo Xu
Moonshot AI
Multimodal and omni-model engineer whose public profile lists Moonshot AI experience and Kimi-VL among recent projects.
Yufei Zhang
DeepSeek
Researcher at the University of Illinois Urbana-Champaign focused on vision-language models, multimodal large language models, and physical AI.
Yuqing Wang
DeepSeek
Research intern at DeepSeek and PhD student at Princeton University whose research interests include large language models and multimodal foundation models.
Zhengyang Wang
DeepSeek
Research intern at DeepSeek and master's student at Renmin University of China working on multimodal large language models and AI agents.
Zheren Fu
Alibaba Qwen
Tongyi Lab researcher working on large language models, vision-language models, and reinforcement learning; public profiles connect Zheren Fu to the Qwen2-VL technical report.
Ali Farhadi
Ai2
CEO of the Allen Institute for AI and professor of computer science at the University of Washington. His work spans computer vision, multimodal learning, reasoning, and embodied AI.
An Yang
Alibaba Qwen
Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.
Kai Dang
Alibaba Qwen
Researcher on Alibaba's Qwen team focused on large language models and NLP, with public research profiles listing a Nankai University background.
Koray Kavukcuoglu
Google Gemini
Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.
Xinlong Wang
DeepSeek
Xinlong Wang is a researcher working across computer vision, embodied AI, robotics, and machine learning. Public profiles link him to OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.
Chenzhuang Du
Moonshot AI
Technical staff member at Moonshot AI whose public profile highlights work on web and app agents, multimodal systems, reinforcement learning, and LLMs.
Dikang Du
Moonshot AI
Dikang Du is a research scientist at Moonshot AI. His homepage says he received a Ph.D. from Cornell University and works on natural language processing, machine learning, and multimodal learning.
Hao Hu
Moonshot AI
Technical staff member at Moonshot AI working on general AI agents, reinforcement learning, and multimodal foundation models.
Haoning Wu
Moonshot AI
PhD student in computer science at the University of Hong Kong working in vision and machine intelligence.
Lin Sui
Moonshot AI
Researcher in computer vision and multimodal learning. Public profile lists PhD study in computer science and engineering at HKUST under Qifeng Chen.
Christopher Clark
Ai2
Christopher Clark is a researcher working on language models, efficient inference, and trustworthy NLP systems. His public profile highlights work at the intersection of NLP, efficiency, and model evaluation.
Tianbao Xie
Alibaba Qwen
Research scientist on the Qwen team at Alibaba Group, focusing on foundation models and language agents. He received a PhD in computer science from the University of Illinois Urbana-Champaign.
Wenfeng Liang
DeepSeek
Wenfeng Liang, also known as Liang Wenfeng, is linked to DeepSeek technical reports in LLMpeople and is identified in public references as the founder and CEO of DeepSeek.
Fei Huang
Alibaba Qwen
Alibaba Qwen report author listed on Qwen, Qwen2.5, Qwen2.5-1M, Qwen3, Qwen3 Embedding, QwQ-32B, and Qwen-VL reports, with report-backed work on large language models, embeddings, reranking, and multimodal models.
Zeyu Cui
Alibaba Qwen
Zeyu Cui is listed as an author of the Qwen technical report Qwen3 Technical Report.
Yang Fan
Alibaba Qwen
Alibaba Qwen report author listed on Qwen, Qwen2.5, Qwen3, Qwen-VL, and Qwen-Image technical reports, with report-backed work on large language models, vision-language models, and image generation.
Xiaodong Deng
Alibaba Qwen
Research scientist in Tongyi Lab whose official profile highlights post-training and multimodal large language models.
Antonio Torralba
Google Gemini
Antonio Torralba is the Delta Electronics Professor in the EECS Department at MIT and a member of CSAIL whose research focuses on computer vision, visual learning, and scene understanding.
Jinbo Zhao
Alibaba Qwen
PhD student in CSLT at Tsinghua University working on large language models, multimodal large language models, and speech-language models; publication context connects Jinbo Zhao to the Qwen2.5-VL technical report.
Wenhai Wang
DeepSeek
Wenhai Wang is a researcher working on visual perception foundation models, efficient learning, and multimodal large models. Public profiles list him with OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.
Yuanzhi Zhu
Alibaba Qwen
Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.
Jena D. Hwang
Ai2
Research scientist at the Allen Institute for AI (Ai2) whose work focuses on natural language understanding and commonsense reasoning.
Pradeep Dasigi
Ai2
Research scientist on the AllenNLP team at the Allen Institute for AI, focused on post-training language models.
Bowen Wang
Moonshot AI
PhD student at the University of Hong Kong who worked as a research intern at Moonshot AI in 2025 and studies digital agents, computer-use agents, and multimodal intelligence.
Cheng Chen
Moonshot AI
Research scientist at Moonshot AI with public profiles covering large language models, diffusion models, and generative AI.
Jiaqi Deng
Moonshot AI
Computer science graduate from the University of Hong Kong who worked as a research intern at Moonshot AI on general-purpose computer-use agents.
Jin Xie
Moonshot AI
Researcher at Moonshot AI with public homepage and GitHub profiles under the name Xixia Zhong.
Kun Ouyang
Moonshot AI
Technical staff at Moonshot AI working on large language model reasoning, agents, and multimodal large models.
Matthieu Devin
Google Gemini
Research scientist at Google DeepMind based in Paris, focused on deep learning and computer vision.
Weixin Xu
Moonshot AI
Research scientist at Moonshot AI with public GitHub and Google Scholar profiles covering efficient inference and multimodal systems.
Xiaodong Zhu
DeepSeek
Research intern at DeepSeek and master's student at Tsinghua University working on large language models, multimodal models, and reinforcement learning.
Xiaohua Zhai
Google Gemini
Xiaohua Zhai is a researcher on the Google Research team in Zurich whose work focuses on large multimodal models and efficient deep learning.
Xiaokun Yuan
Moonshot AI
AI researcher at Moonshot AI with a public homepage and Google Scholar profile spanning robust AI, computer vision, and multimodal systems.
Yibo Miao
Moonshot AI
Moonshot AI researcher working on large language models, coding agents, and multimodal safety; his public homepage also documents earlier study at Shanghai Jiao Tong University and Huazhong University of Science and Technology.
Yiqin Wang
Moonshot AI
Researcher at Moonshot AI with a personal homepage and GitHub profile covering machine learning research.
Yuzhi Wang
Moonshot AI
Researcher at Moonshot AI focused on large language models, computational photography, and low-level computer vision; previously worked at Megvii and completed a PhD and postdoc at Tsinghua University.
Zhaowei Li
Moonshot AI
Research scientist at Moonshot AI working on multimodal AI agents, large multimodal models, video generation, speech, machine learning systems, and AI for science.
Yuzi Yan
Moonshot AI
PhD student in Computer Science and Technology at Tsinghua University with public research interests in machine learning, natural language processing, and large language models.
Sifan Zhou
DeepSeek
DeepSeek report author listed on DeepSeek-VL2, with report-backed work on mixture-of-experts vision-language models and multimodal understanding.
Han Zhu
Moonshot AI
Machine learning researcher with a public homepage and GitHub profile covering AI research and engineering projects.
Hao Fei
Shanghai AI Laboratory
Research scientist at Shanghai AI Laboratory working on NLP and multimodal AI, and a co-author of InternLM-XComposer2.5.
Jiaming Guo
DeepSeek
PhD student at The Chinese University of Hong Kong focused on multimodal reasoning, optical character recognition, and document parsing; coauthor of DeepSeek-VL.
Liangtao Shi
Ai2
Research scientist at the Allen Institute for AI working on multimodal large language models, embodied agents, and reasoning for robots and games.
Matt Deitke
Ai2
Matt Deitke is a researcher at Ai2 whose public homepage and Google Scholar profile highlight work on multimodal learning, vision-language models, embodied AI, and open models.
Molly S. Lewis
Ai2
Molly S. Lewis is an Assistant Professor of Psychology at Princeton University whose research examines how language is shaped by social and cultural structure.
Roi Reichart
Ai2
Professor at the Technion and head of the CHIA Lab, with research spanning natural language processing, machine learning, and social-good applications.
Xiuye Gu
Google Gemini
Xiuye Gu is a researcher whose public work focuses on vision-language modeling and machine learning systems.
Yan Zhong
Moonshot AI
Research scientist at Kimi AI (Moonshot AI). Previously completed a PhD in computer science at the University of Wisconsin-Madison.
Yao Lu
DeepSeek / Google Gemini
Yao Lu is listed as an author of the Google technical report Gemini Robotics: Bringing AI into the Physical World.
Zheng Zhang
Moonshot AI
Publicly available Moonshot AI technical reports list Zheng Zhang as a coauthor on Kimi-VL and Kimi K2. The surviving public evidence supports research authorship on language and multimodal systems, not a separately verified individual employer profile.
Chang Zhou
Alibaba Qwen
Qwen researcher and co-lead whose work focuses on pretraining and post-training, multimodal models, agent systems, and large-scale model infrastructure.
Shijie Wang
Alibaba Qwen
Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.
Maxwell Collins
Google Gemini
Maxwell Collins is a Research Scientist at Google DeepMind.
Dahua Lin
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.
Jiaqi Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.
Yu Qiao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.
Shujie Wang
DeepSeek
First-year PhD student at Shanghai Jiao Tong University focused on multimodal large language models, text-to-image generation, and image/video generation; coauthor of DeepSeek-VL2.
Yonggang Zhang
DeepSeek
Yonggang Zhang is a researcher whose public OpenReview profile includes the DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding paper.
Jiaqi Gao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.
Niket Tandon
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-7B Technical Report.
Prasad Reddy Yadati
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-7B Technical Report.
Shan Lu
DeepSeek
Shan Lu is listed as an author of the DeepSeek technical report JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation.
Xiaogang Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Xinxing Zu
Moonshot AI
Xinxing Zu is listed as an author of the Moonshot AI technical report Kimi K2.5: Visual Agentic Intelligence.
Yafei Wen
MiniMax
Yafei Wen is a MiniMax report-backed author on MiniMax-Text-01, a MiniMax technical report in the LLMpeople catalog.
Yijia Shao
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-7B Technical Report.
Yuhang Zang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Zhenguo Li
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Zhe Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.
Zhongyue Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Ziyu Shao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Adam Koepke
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Ailin Qiu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Akshay Gupta
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Alon Albalak
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Ankush Garg
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Chien-Sheng Wu
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Chris Alberti
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Chun-Liang Li
Apple
Chun-Liang Li is listed as a core author of the FastVLM paper, with Apple affiliation and an @apple.com contact address in the report HTML.
Chunping Li
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Conghui He
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
David Yang
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Dian Shen
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Fengyun Rao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Haidong Duan
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Han Zhou
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Hongshan Yu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Hu Xu
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Jaeson Jang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jiahao Huang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Jiajin Wu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jianbin Jiao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jiangning Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Jian Guo
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jiarui Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jie Tang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jingren Zhou
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jingyao Ye
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Jinpeng Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Jonathan Young
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Jun Liu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Justin Wang
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Kai Chen
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Kai Chen
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Kaipeng Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Kaizheng Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Kiyoung Song
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Likun Wang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Lintao Zhang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Mauro Caccia
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Meng Liao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Mingqi Gao
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Minlie Huang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Nuan Wen
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Pan Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Peng Gao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Peter Henderson
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Ping Luo
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Qinglin Lu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Sebastian Borgeaud
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Shaodong Wang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Shijie Cao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Shu Liu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Siyuan Yang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Tianning Zhao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Wangtianyu Luo
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Weijie Liu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Wei Liu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Weizhu Chen
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Wenbo Chen
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Wenhai Wang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Wenyan Cong
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Xiangyu Yue
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Xiaohan Ding
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Xiaowen Zhang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Xiaoyi Dong
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Xinyu Gao
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Xin Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Xunliang Cai
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yabin Zhang
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Yao Zhu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Yimeng Zhu
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yingjie Chen
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Yiping Wang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yiwen Lu
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yiwen Luo
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yongqiang Ma
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yucheng Zou
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yuchen Zhou
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yuchong Xiao
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yujia Qin
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
Yulong Chen
Salesforce AI Research
Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
Yuntao Liu
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yutaka Matsuo
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Yutao Yue
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory and coauthor of the InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
Zehuan Yuan
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Zhaoye Yang
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Zihan Dong
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Zihan Wei
ByteDance Seed
Researcher at ByteDance Seed and coauthor of the Seed1.5-VL Technical Report.
Yu Qiao
MiniMax
Yu Qiao is listed as an author of the MiniMax technical report MiniMax-01: Scaling Foundation Models with Lightning Attention.
Zihao Huang
Moonshot AI
Zihao Huang is listed as an author of the Moonshot AI technical report Kimi-VL Technical Report.
Aman Singh
DeepSeek
Research intern at DeepSeek and PhD student at Stanford University working on generative vision-language models, large language models, and large-scale training.
Bohong Yin
Moonshot AI
Research scientist at Moonshot AI focused on machine learning systems; public profiles note prior PhD study at the Max Planck Institute and Technical University of Munich.
Bowei Xing
Moonshot AI
Technical Staff at Moonshot AI.
Bowen Qu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Chu Wei
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Dehao Zhang
Moonshot AI
Technical staff member at Moonshot AI and machine learning researcher; public profiles note prior study at the Gaoling School of AI at Renmin University of China.
Enming Yuan
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.
Enzhe Lu
Moonshot AI
PhD student in Computer Science at the University of Hong Kong. His research interests include multimodal large language models and embodied AI, and he co-authored the Kimi-VL technical report.
Fang Li
Moonshot AI
Research Scientist at Moonshot AI.
Guokun Lai
Moonshot AI
Research scientist at Moonshot AI whose work focuses on large foundation models and multimodal models.
Haiyang Xu
Alibaba Qwen
Independent researcher focused on multimodal learning, document intelligence, and efficient training; coauthor of Qwen2.5-VL and mPLUG-related vision-language systems.
Hang Zhang
Alibaba Qwen
Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.
Hao Ding
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and computer vision.
Haotian Yao
Moonshot AI
Research scientist at Moonshot AI who previously studied at Tsinghua University and works on large foundation models.
Hongcheng Gao
Moonshot AI
Generative AI researcher at Moonshot AI with public work spanning computational imaging and AI systems.
Jialin Wang
Alibaba Qwen
Research scientist in Tongyi Lab and contributor to Qwen2-VL, with public work on multimodal large language models.
Jiezhong Qiu
Moonshot AI
Researcher at Moonshot AI with public GitHub and scholarly profiles covering machine learning and AI systems.
Jinhong Wang
Moonshot AI
Technical Staff at Moonshot AI.
Junjie Yan
Moonshot AI
Research scientist at Moonshot AI with public work on computer vision and multimodal models.
Longhui Yu
Moonshot AI
Research Scientist at Moonshot AI.
Mengfan Dong
Moonshot AI
Technical Staff at Moonshot AI.
Mengnan Dong
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Nan Ding
Google Gemini
Researcher at Google Research whose public work includes multimodal and vision-language modeling, with arXiv publications tied to PaliGemma and related transfer work.
Pengfei Wang
Alibaba Qwen
Research scientist in Alibaba DAMO Academy's Tongyi Lab working on machine learning, computer vision, and multimodal large language models; author on the Qwen2-VL and Qwen2.5-VL technical reports.
Qizheng Gu
Moonshot AI
Research scientist at Moonshot AI with public work on language models and reasoning.
Rui Hu
DeepSeek
PhD student at the University of Science and Technology of China focused on machine learning and multimodal understanding and generation; coauthor of Janus.
Runjie Zhou
Moonshot AI
Research scientist at Moonshot AI and PhD student at Shanghai Jiao Tong University whose homepage highlights multimodal understanding, generation, large language models, and agents.
Sibo Song
Alibaba Qwen
Research scientist in Tongyi Lab and maintainer of Qwen-VL, with public work on vision-language models.
Tianhui Song
Moonshot AI
Research Scientist at Moonshot AI.
Tongtong Bai
Moonshot AI
Research scientist at Moonshot AI and the University of Wisconsin-Madison with public work on large language models and reasoning.
Weiran He
Moonshot AI
Research scientist at Moonshot AI whose public GitHub profile highlights work on multimodal large language models and agents.
Weixiao Huang
Moonshot AI
Research Scientist at Moonshot AI.
Xinhao Li
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal and long-context model research.
Xinyuan Wang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Xinyu Luo
DeepSeek
PhD student at Shanghai Jiao Tong University working on multimodal large language models and image understanding and generation; coauthor of Janus.
Xinyu Zhou
Moonshot AI
Technical Staff at Moonshot AI.
Xuejing Liu
Alibaba Qwen
Xuejing Liu is a researcher whose public OpenReview profile includes the Qwen2-VL and Qwen2.5-VL technical report papers.
Yang Li
Moonshot AI
Co-founder and chief executive officer of Moonshot AI.
Yangyang Hu
Moonshot AI
Researcher at Moonshot AI with a public GitHub profile and work spanning machine learning systems.
Yanru Chen
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yejie Wang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yibo Liu
Moonshot AI
Research scientist at Moonshot AI whose public profile highlights work on multimodal generation, multimodal large language models, and efficient LLMs.
Yimin Chen
Moonshot AI
Research scientist at Moonshot AI with public GitHub projects spanning language models and multimodal systems.
Yiping Bao
Moonshot AI
Researcher at Moonshot AI with a public GitHub profile covering AI systems work.
Yuanxin Liu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yu Han
Alibaba Qwen
Researcher affiliated with Alibaba Group on Google Scholar and coauthor of the Qwen technical report.
Yuhao Dong
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yukang Chen
DeepSeek
PhD student at The University of Hong Kong focused on large multimodal models and data-centric AI, especially multimodal understanding and generation; coauthor of Janus.
Yuxin Wu
Moonshot AI
Researcher at Moonshot AI with public GitHub projects spanning AI systems.
Yuxuan Cao
DeepSeek
Research assistant at The University of Hong Kong focused on multimodal reasoning and generation, large language models, and embodied AI; coauthor of Janus.
Zhejun Jiang
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.
Zhilin Yang
Moonshot AI
Co-founder and CTO of Moonshot AI, and co-author of the Kimi-VL and Kimi K2.5 technical reports.
Zhiyuan Ruan
DeepSeek
PhD student at The University of Hong Kong focused on multimodal large language models, image and video understanding, generation, and editing; coauthor of Janus.
Zijia Zhao
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on language and multimodal models.
Ziwei Chen
Moonshot AI
Research scientist at Moonshot AI with public work on multimodal learning and language models.
Andrew Shen
Google Gemini
Andrew Shen is listed as an author of the Google technical report PaliGemma 2: A Family of Versatile VLMs for Transfer.
Jiangxin Wang
Ai2
Jiangxin Wang is listed as an author of the Ai2 technical report Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models.
Matias Mazzocconi
Google Gemini
Matias Mazzocconi is listed as an author of the Google technical report PaliGemma 2: A Family of Versatile VLMs for Transfer.
Mikhail Ryabinin
Google Gemini
Mikhail Ryabinin is listed as an author of the Google technical report PaliGemma 2: A Family of Versatile VLMs for Transfer.
Siddhartha Srinivasa
Ai2
Siddhartha Srinivasa is listed as an author of the Ai2 technical report Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models.
Wei Xiong
DeepSeek
Wei Xiong is listed as an author of the DeepSeek technical report DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding.
Yuxuan Ren
DeepSeek
Yuxuan Ren is listed as an author of the DeepSeek technical report DeepSeek-VL: Towards Real-World Vision-Language Understanding.
Dieter Fox
Ai2
Senior director of embodied AI at Ai2 and professor at the University of Washington working in robotics, computer vision, and machine learning.
Alexander Kolesnikov
Google Gemini
Alexander Kolesnikov is a Research Scientist at Google DeepMind exploring multimodal general intelligence.
Alyssa Sellitto
Ai2
Research scientist at Ai2 focused on multimodal machine learning, vision-language models, and understanding human-centered image variation.
Andrea Dafoe
Google Gemini
Andrea Dafoe is a senior research scientist at Google DeepMind whose work focuses on frontier AI risks, international governance, and the societal impacts of advanced AI.
Bilal Mustafa
Google Gemini
Senior research scientist at Google DeepMind.
Chenlin Zhang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report.
Guangda Wei
Moonshot AI
Research scientist at Moonshot AI with public publications on multimodal learning and efficient large-model systems.
Heng Wang
Moonshot AI
Research Scientist at Moonshot AI.
Humen Zhong
Alibaba Qwen
Research scientist in Tongyi Lab and a major contributor to Qwen2-VL, with public work on multimodal foundation models.
Jiaming Li
Moonshot AI
Research Scientist at Moonshot AI.
Jianzhou Wang
Moonshot AI
Research scientist at Moonshot AI working on multimodal large models and point-cloud perception and generation.
Jingyuan Liu
Moonshot AI
Research scientist at Moonshot AI with a public homepage covering prior academic work and research projects.
Olivier Henaff
Google Gemini
Research scientist at Google DeepMind working on deep learning, reinforcement learning, self-supervised learning, and robotics.
Rohit Saxena
Google Gemini
Rohit Saxena is a Research Scientist at Google DeepMind working on visual perception, multimodal learning, and language understanding.
Roman Shapovalov
Salesforce AI Research
Research scientist at Salesforce AI Research working on multimodal and vision-language models.
Ronak Mandlekar
Ai2
PhD student at Stanford and research scientist at the Allen Institute for AI working on robotics, multimodal models, and embodied AI.
Sihan Cao
Moonshot AI
Researcher affiliated with Moonshot AI on Google Scholar and coauthor of the Kimi-VL technical report.
Sipeng Zhang
DeepSeek
PhD student at The University of Hong Kong focused on large multimodal models, image and video generation, and multimodal understanding; coauthor of Janus.
Wei Song
Moonshot AI
Researcher at Moonshot AI. Public profile notes prior PhD study in computer science at the Chinese University of Hong Kong.
Weiyi Su
Shanghai AI Laboratory
Researcher at Shanghai AI Laboratory focused on multimodal large language models, with public publications including InternVL 1.5, Video-LLaVA, and VCD.
William Kolesnikov
Google Gemini
Staff software engineer at Google DeepMind working on post-training, alignment, multimodal models, and data filtering. He previously worked on hardware and software co-design for machine learning.
Xingzhe Wu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report.
Xinyu Chen
DeepSeek
Research intern at NUS and Nanjing University working on machine learning and multimodal large language models; coauthor of DeepSeek-VL2.
Yang Cao
Shanghai AI Laboratory
Researcher working on open multimodal models, including InternVL3.
Yanxia Cui
DeepSeek
Researcher working on multimodal and vision-language models, including DeepSeek-VL2 and related model optimization work.
Yidao Qin
Moonshot AI
Research Scientist at Moonshot AI.
Yongsheng Kang
Moonshot AI
Research Scientist at Moonshot AI.
Yuanhang Zhang
Alibaba Qwen
Research scientist in Tongyi Lab and major contributor to Qwen2.5-VL, with public work on multimodal large language models.
Yuhang Zheng
ByteDance Seed
Researcher working on multimodal and embodied agents, including Seed1.5-VL and related planning work.
Yusuke Iwasawa
Shanghai AI Laboratory
Project associate professor at the University of Tokyo whose work spans deep learning, artificial intelligence, and machine learning for medicine and healthcare.
Yuyi Wang
Alibaba Qwen
Research intern in Tongyi Lab whose public profile highlights work on multimodal large language models and video understanding.
Zhaohai Li
Alibaba Qwen
Research scientist in Tongyi Lab and technical lead of Qwen2-VL, with public work on vision-language models.
Zhenyu Yang
Alibaba Qwen
PhD student at Nanjing University and research intern at Alibaba Tongyi Lab working on multimodal large language models and visual understanding; coauthor of Qwen2.5-VL.
Zihan Liu
DeepSeek
Zihan Liu is a research scientist at DeepSeek. His public homepage highlights work in multimodal learning, vision-language models, and large-scale machine learning.
Zongyu Lin
Moonshot AI
Technical Staff at Moonshot AI.