Atlas / Fields / Detail
Vision-Language Models
Researchers connected to this field in the public atlas.
Radu Soricut
Google Gemini
Research scientist focused on machine learning and natural language understanding, with work spanning machine translation, semantic parsing, and large-scale language modeling.
Jifeng Dai
DeepSeek / MiniMax
Researcher focused on computer vision, multimodal learning, and generative AI. His public homepage says he is currently with Stepfun, after serving as a principal scientist at SenseTime Research and a researcher at Microsoft Research Asia, and that he earned a PhD in computer science from Tsinghua University.
Sifan Zhou
DeepSeek
Researcher at DeepSeek AI interested in generative models, large language models, multimodal learning, and computer vision. He is pursuing a PhD in electrical engineering at Stanford University after earning a bachelor's degree from Tsinghua University, and has also worked at Meta AI and Google.
Junyang Lin
Alibaba Qwen
Junyang Lin (Justin Lin) is a researcher and open-source maintainer known for the Qwen family of models. His public profiles list interests in LLMs, AI agents, multimodal learning, long-horizon reasoning, world models, and reinforcement learning; multiple March 2026 news reports said he stepped down from the Qwen tech lead role.
Haoyu Lu
DeepSeek / Moonshot AI
PhD student at Renmin University of China whose homepage highlights multimodal foundation models and video understanding, with public projects and papers including DeepSeek-VL and Kimi-VL.
Renjie Zheng
ByteDance Seed
Renjie Zheng is a researcher at ByteDance. His public OpenReview profile lists prior research experience at Baidu Research and earlier study at Oregon State University and Tongji University, with public work spanning NLP, language models, and reasoning-related research.
Yuwen Xiong
ByteDance Seed
Yuwen Xiong is a Research Scientist at ByteDance Seed in the Bay Area. He received a Ph.D. from the University of Toronto's Machine Learning Group and previously worked at Waabi and Uber ATG.
Yiheng Xu
Alibaba Qwen
Yiheng Xu is a research scientist focused on multimodal AI, coding agents, and reasoning systems. His public profiles link him to Qwen research and later work at OpenAI, with publications spanning vision-language models and code generation.
Feida Zhu
ByteDance Seed
Feida Zhu is a ByteDance researcher in computer vision. Public sources show prior roles at Tencent Youtu Lab and Nanyang Technological University, and a PhD in Computer Science from The University of Hong Kong.
Can Huang
ByteDance Seed
Public sources identify Can Huang as a ByteDance researcher. OpenReview lists prior research work at Intsig Information Co. Ltd., master's study at Shanghai Jiao Tong University, and undergraduate study at Xi'an Jiaotong University.
Weihao Yu
ByteDance Seed
Weihao Yu is a researcher in computer vision, multimodal AI, and neural architecture design. Public profiles show PhD and postdoctoral work at the National University of Singapore, a 2025 Research Scientist role at ByteDance Seed, and a 2026 appointment at Peking University Shenzhen Graduate School.
Hao Zhang
Moonshot AI / NVIDIA
Research scientist at Moonshot AI studying scaling in reinforcement learning and large language models.
Jianhui Duan
ByteDance Seed
Jian-Hui Duan is an algorithm researcher in the ByteDance Seed LLM team whose public homepage highlights work on pretraining data, training optimization, and distribution-shift mitigation for large language models.
Xinxing Zu
Moonshot AI
Xinxing Zu is a research scientist at Moonshot AI. His public profile notes a PhD in computer science from New York University, prior work as an AI scientist at Amazon, and research interests in AI agents, reasoning, and multimodality.
Rui Qian
ByteDance Seed
Rui Qian is a researcher at ByteDance Seed. Qian received a Ph.D. from The Chinese University of Hong Kong and a bachelor's degree from Shanghai Jiao Tong University in 2021.
Yujia Qin
ByteDance Seed
Yujia Qin focuses on LLM/VLM-based agents. The official homepage lists a Ph.D. in Computer Science from Tsinghua University (2020-2024), a B.E. in Electronic Information Science and Technology from Tsinghua University (2016-2020), and Seed at ByteDance starting in July 2024.
Xingcheng Yao
Moonshot AI
Xingcheng Yao is a research scientist at Moonshot AI. His public profile notes prior work as a research engineer at Tencent AI Lab, a PhD in computer science from the University of Southern California, and research interests spanning NLP, multimodal systems, and AI agents.
Yifan Du
ByteDance Seed
Yifan Du is a Ph.D. student at Renmin University of China advised by Wayne Xin Zhao. His public homepage lists research interests in multimodal large language models, visual instruction tuning, long video understanding, and complex visual reasoning, and it lists a VLM post-training internship at ByteDance Seed.
Chenwei Lou
ByteDance Seed
Chenwei Lou is a researcher at ByteDance Seed. Public profiles indicate earlier research experience at Tencent, an MS period at Harbin Institute of Technology, and earlier undergraduate study at Jilin University.
Jianhua Zhu
ByteDance Seed
MS student at Peking University's Wangxuan Institute of Computer Technology working on visual reasoning, VLM, OCR, and handwritten mathematical expression recognition.
Chenhui Gou
ByteDance Seed
Chenhui Gou is a PhD student at Monash University in the Vision & Language for Autonomous AI Group. Public profiles list his research in computer vision, visual perception, and reasoning, following earlier master's study at the Australian National University.
Tenglong Ao
ByteDance Seed
Tenglong Ao is a computer science researcher whose public work focuses on character animation, co-speech gesture generation, music-driven dance generation, and multimodal agent behavior. Public sources indicate he received a Ph.D. in computer science from Peking University and a B.S. in electronic engineering from Tsinghua University.
Zilong Huang
ByteDance Seed
Zilong Huang is a principal researcher at Tencent Singapore whose work focuses on multimodal interaction, multimodal agents, multimodal pretraining, and efficient model architectures. He previously worked at ByteDance and received his Ph.D. and B.E. from Huazhong University of Science and Technology.
Bencheng Liao
ByteDance Seed
Bencheng Liao is a PhD student at the Institute of Artificial Intelligence, Huazhong University of Science and Technology. Public profiles list research interests including autonomous driving, object detection, and visual perception.
Joya Chen
ByteDance Seed
Joya Chen is a final-year Ph.D. candidate at Show Lab, National University of Singapore, advised by Mike Zheng Shou. Her research focuses on large multimodal models for video, including data scaling, model architecture, pre-training, post-training, and benchmarking.
Renrui Zhang
ByteDance Seed
Public sources connect Renrui Zhang to multimodal large language models, 3D point cloud analysis, and efficient model adaptation.
Rui Yang
ByteDance Seed
Rui Yang is a PhD student at The University of Hong Kong. Public OpenReview and DBLP records link him to research on multimodal large language models, language agents, and vision-language systems, including GPT4Tools, HaploVL, and Visual Spatial Tuning.
Yining Ye
ByteDance Seed
Yining Ye (叶奕宁) is a master's student in the THUNLP Lab at Tsinghua University advised by Zhiyuan Liu. Public profiles describe research on LLM/VLM agents, tool learning, and unified language models.
Yurui Ren
ByteDance Seed
Yurui Ren is a researcher working on image/video synthesis and image enhancement. Public sources identify him as a Ph.D. student at Peking University (2017-2022), advised by Ge Li, with a B.S. from Dalian University of Technology.
Zewei Sun
ByteDance Seed
Public sources associate Zewei Sun with ByteDance and list Nanjing University education (BS 2013-2017, MS 2017-2020), with research spanning machine translation and later large language model work.
Yufeng Yuan
ByteDance Seed
Yufeng Yuan is listed on OpenReview as staff at ByteDance Seed, previously at Tencent AI Lab, with an MS in Computing Science from the University of Alberta.
Xuefeng Xiao
ByteDance Seed
Public records identify Xuefeng Xiao as a ByteDance-affiliated researcher whose work centers on model compression, optimization, and acceleration; his OpenReview profile also lists M.S. study at South China University of Technology.
Junting Lu
ByteDance Seed
Junting Lu is a final-year M.S. student at the Institute for Software Engineering, Peking University. His homepage lists research interests in tool learning, OS agents, and native vision-language agents.
Xinchen Zhang
ByteDance Seed
Xinchen Zhang is a master's student at Tsinghua University and a research intern at ByteDance Seed. His public profiles describe research on multimodal large language models, reinforcement learning for multimodal post-training, and earlier work on diffusion-based text-to-image generation.
Shuai Bai
Alibaba Qwen
Senior algorithm expert at Alibaba Group working on large language models, multimodal large language models, and diffusion models.
Jianyu Jiang
ByteDance Seed
Jianyu Jiang is a tech lead at ByteDance Seed working on large-scale AI foundation model training and efficient AI training frameworks.
Yanwei Li
ByteDance Seed
Yanwei Li is a Research Scientist at ByteDance Seed in San Jose working on foundation models for vision and language. He received his PhD from The Chinese University of Hong Kong under Jiaya Jia and previously interned at NVIDIA Research and MEGVII.
Kai Hua
ByteDance Seed
Public profile information identifies Kai Hua as a researcher at ByteDance Seed with prior research experience at Kuaishou MMU and a master's background in EECS at Peking University. His listed research areas include deep learning, NLP, language modeling, and multimodal AI.
Bairen Yi
ByteDance Seed
Bairen Yi is a researcher at ByteDance's Seed Foundation Team. Public publication records link him to work on efficient LLM inference, KV-cache compression, and model/model-expert merging.
Runxin Xu
DeepSeek
Researcher at DeepSeek whose public homepage describes work on DeepSeek R1, V1, V2, V3, Math, Coder, and mixture-of-experts systems.
Hao Yang
DeepSeek / Meta AI
Researcher at Moonshot AI working on multimodal large language models; previously a key member of Alibaba's Qwen team and author of work including Kimi-VL, DeepSeek-VL, and Qwen technical reports.
Chunyuan Li
ByteDance Seed
Chunyuan Li is a multimodal AI researcher whose public homepage highlights work on large-scale language and vision training, including LLaVA, GroundingDINO, GLIP, GLIGEN, Florence, and Oscar.
Haoqi Fan
ByteDance Seed
Haoqi Fan is a computer vision researcher working on multimodal foundation models. His public homepage says he graduated from Carnegie Mellon University's Robotics Institute, and his OpenReview profile lists him as a researcher at Seed, ByteDance Seed.
Lin Yan
ByteDance Seed
Lin Yan is a ByteDance Seed researcher whose public OpenReview profile lists expertise in NLP and LLMs, with earlier work in recommender systems.
Kunchang Li
ByteDance Seed
Kunchang Li is a PhD student at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Public sources list his research interests as multimodal large language models, video understanding, action recognition, and lightweight networks; he is also listed as an author on ByteDance Seed publications including the Seed1.5-VL technical report.
Yunhao Fang
ByteDance Seed
Public sources identify Yunhao Fang as a multimodal AI researcher with a master's degree from UC San Diego and a B.Eng. from Zhejiang University; his public GitHub activity includes the ByteDance-Seed AHN repository.
Zhengzhuo Xu
ByteDance Seed
Public profiles list Zhengzhuo Xu as a PhD student at Tsinghua University (2023-2026), after an MS at Tsinghua (2020-2023). His publication record includes work on chart understanding, visual language modeling, and long-tailed recognition.
Hongxiang Hao
ByteDance Seed
Hongxiang Hao is a researcher at ByteDance Inc. listed on OpenReview since November 2024. That profile lists prior research experience at Baidu, an M.S. from the University of the Chinese Academy of Sciences, and undergraduate study at Nankai University; public authorship also ties him to the Seed1.5-VL report and the Goku image-video generation project.
Jiabo Ye
Alibaba Qwen
Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.
Chenggang Li
ByteDance Seed
Publicly listed as a ByteDance researcher focused on large language models, with Tsinghua University education history, and as an author on ByteDance Seed's Seed1.5-VL and Seed1.5-Thinking reports.
Yulun Du
Moonshot AI
Yulun Du is a Moonshot AI-affiliated researcher. Public profiles also show prior work and study at Carnegie Mellon University, including a Master of Language Technologies completed in 2020.
Heng Ji
ByteDance Seed
Heng Ji is a professor of computer science at the University of Illinois Urbana-Champaign and founding director of the Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). Her research focuses on natural language processing, information extraction, knowledge-enhanced large language models, and vision-language models.
Flood Sung
Moonshot AI
Researcher and engineer focused on reinforcement learning and embodied intelligence; his public profile lists work spanning Huawei Noah's Ark Lab, Momenta, Moonshot AI, and XVI Robotics, and he is credited on Moonshot AI technical reports.
Wei Ding
Alibaba Qwen
Research scientist at Alibaba working on multimodal learning and generation; previously a postdoctoral researcher at Carnegie Mellon University.
Congcong Wang
Moonshot AI
Research scientist at Moonshot AI focused on large multimodal models and large language model post-training.
Huabin Zheng
Moonshot AI
Huabin Zheng is a research scientist at Moonshot AI. His homepage says he works on large language models, multi-agent systems, code generation, and game agents.
Jun Tang
Alibaba Qwen
Jun Tang works on multimodal foundation models, open-source language models, and agent systems. His personal site highlights work on Qwen and Qwen3-VL alongside related multimodal research.
Junzhe Pan
DeepSeek
PhD student at Tsinghua University focusing on multimodal large language models, reasoning, and reinforcement learning.
Tao Yu
Moonshot AI
Assistant Professor of Computer Science at the University of Hong Kong and director of XLANG Lab, focusing on natural language processing and embodied AI agents.
Xiaoqian Shen
DeepSeek
PhD student at Tsinghua University focusing on LLM reasoning, RLHF, and multimodal large language models; research intern at DeepSeek.
Yue Cao
DeepSeek
Yue Cao is a researcher working on multimodal large language models and computer vision. His public homepage lists previous time at DeepSeek and Apple and links to work including DeepSeek-VL2.
Zesen Cheng
Alibaba Qwen
Qwen researcher and author on the Qwen2-VL and Qwen2.5-VL technical reports, with public profiles linking his work to multimodal and vision-language systems.
Andrea Steiner
Google Gemini
Research scientist at Google DeepMind working on multimodal generative models, visual generation, and image editing; previously completed a PhD at TU Munich.
Jiahao Liu
Alibaba Qwen
Jiahao Liu works on multimodal large language models, reasoning systems, and continual learning. His public profiles connect him to the Qwen2.5-VL technical report and related open research work.
Pengyu Cheng
Moonshot AI
Pengyu Cheng is a research scientist at Moonshot AI. His homepage says he works on multimodal large language models, reinforcement learning, and agents, and previously interned at Microsoft Research.
Sangho Lee
Ai2
Researcher at the Allen Institute for AI (Ai2) working on vision-language and multimodal AI, with a focus on reliable reasoning and understanding beyond text.
Wei Xiong
DeepSeek
Research scientist at DeepSeek working on large language models, multimodal learning, and machine learning systems. He was previously an applied scientist at AWS AI Labs and earned a PhD in computer science from Johns Hopkins University.
Xi Zhang
Alibaba Qwen
Xi Zhang works on multimodal and vision-language model research. Public profiles connect him to Qwen2-VL and related open research projects.
Yuxuan Ren
DeepSeek
Researcher focused on multimodal generative models and reinforcement learning; currently at ByteDance Seed and previously at DeepSeek.
Haibin Lin
ByteDance Seed
Haibin Lin works on LLM infrastructure at ByteDance Seed, focusing on training systems for large language and multimodal models from pre-training to post-training.
Chengzhi Wei
ByteDance Seed
Chengzhi Wei is credited on the Seed1.5-VL and Seed1.5-Thinking technical reports. An OpenReview profile under the reordered name 'Wei Chengzhi' lists a ByteDance researcher role starting in 2023 and MS study at the University of California, Los Angeles from 2020 to 2022.
Daya Guo
DeepSeek / Moonshot AI
DeepSeek researcher focused on NLP, code intelligence, and LLM reasoning, with public work spanning DeepSeek-Coder, DeepSeekMath, DeepSeek-V2, DeepSeek-V3, and DeepSeek-R1.
Jiahui Yu
Google Gemini
Jiahui Yu is a research scientist at Google DeepMind working on multimodal learning and large language models.
Noah A. Smith
Ai2
Noah A. Smith is a computer scientist and professor at the University of Washington, where he serves as Vice Provost for Artificial Intelligence and co-directs the OLMo open language modeling effort with Ai2. His research focuses on natural language processing, machine learning, and evaluation methodology.
Huazuo Gao
DeepSeek
Researcher at DeepSeek AI working on decision-making and post-training for large language models.
Jihao Liu
ByteDance Seed
Public sources identify Jihao Liu as a researcher working on computer vision, multimodal learning, and vision-language topics. A public GitHub profile under `jihaonew` lists "CUHK, MMLab" and a personal website, and DBLP shows publications from 2020 to 2024 across image modeling, vision pretraining, and multimodal alignment.
Kaiyuan Zhang
ByteDance Seed
Kaiyuan Zhang has a public research record in AI security and privacy. DBLP lists publications on backdoor attacks, mitigation, and model security across venues including IEEE S&P, NDSS, CVPR, ECCV, NeurIPS, and ICLR, and OpenReview lists a PhD affiliation with Purdue University Computer Science.
Zhipeng Chen
ByteDance Seed
Public sources identify Zhipeng Chen as a ByteDance engineer with earlier Baidu experience and PhD training in Electronic Engineering at Tsinghua University. Speech recognition is listed as an expertise area, and he is also listed as an author on the Seed1.5-VL technical report.
Zhuofan Zheng
ByteDance Seed
Zhuofan Zheng is a ByteDance artificial intelligence researcher publicly listed on OpenReview, with undergraduate study at Tsinghua University from 2017 to 2021 and MS study there from 2021 to 2024. Zhuofan Zheng is also listed as an author of the Seed1.5-VL Technical Report.
Zihao Huang
ByteDance Seed / Moonshot AI
Researcher at Moonshot AI. Public profile notes computer science and engineering study at HKUST from 2021 to 2025.
Mingkun Yang
Alibaba Qwen
Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.
Lucas Beyer
Google Gemini
Lucas Beyer is an ML researcher at Google DeepMind in Zurich. His public homepage highlights prior work at Google Brain and a PhD at ETH Zurich.
Maarten Sap
Ai2
Maarten Sap is an assistant professor at the University of Washington and a senior research scientist at the Allen Institute for AI. His work focuses on human-centered language technologies and social NLP.
Y. Charles
Moonshot AI
Research scientist at Moonshot AI focused on multimodal large language models.
Yejin Choi
Ai2
Senior director of natural language processing at the Allen Institute for AI and professor at the University of Washington. Her public profile highlights work in natural language processing, machine learning, computer vision, and AI ethics.
Yunfei Chu
Alibaba Qwen
Algorithm expert at Alibaba Group working on computer vision, multimodal learning, and large language models.
Zaida Zhou
Moonshot AI
Associate research scientist at Moonshot AI based in Beijing, China; previously worked as a postdoctoral researcher.
Zhibo Yang
Alibaba Qwen
Zhibo Yang works on multimodal and vision-language systems. Public profiles connect him to the Qwen2.5-VL technical report and to an individual GitHub account that links back to his personal site.
Zhiqi Huang
Moonshot AI
Machine learning researcher at Moonshot AI and incoming assistant professor at Shanghai Jiao Tong University.
Liangqiang Chen
ByteDance Seed
Public sources identify Liangqiang Chen as a researcher at ByteDance AI Foundation. His OpenReview profile lists a confirmed @bytedance.com email and links to the GitHub account chenup; he is also listed as an author on the Seed1.5-VL and Seed1.5-Thinking technical reports.
Yuntao Li
ByteDance Seed
OpenReview lists Yuntao Li as a Researcher at ByteDance Inc. and a PhD student at Peking University from 2017 to 2022.
Xijin Zhang
ByteDance Seed
Public sources identify Xijin Zhang as a researcher with a confirmed @bytedance.com OpenReview profile. That profile lists MS studies at Tsinghua University from 2014 to 2017 and undergraduate studies at Xi'an University of Electronic Science and Technology from 2010 to 2014.
Jiashi Feng
ByteDance Seed
Jiashi Feng is a research lead at ByteDance whose public profiles and project pages span computer vision, deep learning, vision-language work, and 3D content creation.
Jian Yang
Alibaba Qwen
Jian Yang is an Associate Professor at Beihang University whose research focuses on code intelligence, large language models, and AI agents. He worked with Alibaba Qwen from 2023 to July 2025.
Yang Fan
Alibaba Qwen
Yang Fan is a research scientist at Alibaba Group. His homepage says he works on large language model post-training and deployment.
Ru Zhang
ByteDance Seed
Ru Zhang is a researcher publicly linked to ByteDance through technical-report authorship and an OpenReview profile with a confirmed @bytedance.com email.
Xiaoying Jia
ByteDance Seed
Publicly listed on dblp as Xiaoying Jia 0005 with affiliation ByteDance and an attached ORCID, with 2025 publications on high-performance LLM serving, model merging in LLM pre-training, and Seed Diffusion; also listed as an author on the Seed1.5-VL and Seed1.5-Thinking technical reports.
Dejian Yang
DeepSeek
DeepSeek team member and co-author of the DeepSeek-V3, DeepSeek-V2, and DeepSeek LLM technical reports.
Zeyu Cui
Alibaba Qwen
Research scientist at Meta in New York City and research advisor at the UCLA NLP group; previously completed a PhD in computer science at UCLA.
Jinze Bai
Alibaba Qwen
PhD student at The Hong Kong University of Science and Technology (Guangzhou) whose research interests include large language models, vision-language models, AI agents, and multimodal retrieval.
Ali Farhadi
Ai2
CEO of the Allen Institute for AI and professor of computer science at the University of Washington. His work spans computer vision, multimodal learning, reasoning, and embodied AI.
Koray Kavukcuoglu
Google Gemini
Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.
Xinlong Wang
DeepSeek
Xinlong Wang is a researcher working across computer vision, embodied AI, robotics, and machine learning. Public profiles link him to OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.
Mengfei Du
ByteDance Seed
Public evidence links Mengfei Du to EmbSpatial-Bench, a benchmark for spatial understanding in embodied large vision-language models, and to authorship on ByteDance Seed's Seed1.5-VL Technical Report.
Shuai Peng
ByteDance Seed
Public author records associate Shuai Peng with work on mathematical OCR and formula recognition, logical reasoning data synthesis, and multimodal reasoning; Shuai Peng is also listed as an author of the Seed1.5-VL technical report.
Youbin Wu
ByteDance Seed
Public OpenReview activity for Youbin Wu shows work on large language models, multimodal reasoning, and visual generation, consistent with the Seed1.5-VL report authorship.
Yonghui Wu
ByteDance Seed / Google Gemini
Google researcher whose public profile says he joined Google in September 2008 and has been with the Google Brain team since January 2015, with interests spanning information retrieval, learning to rank, machine learning, machine translation, and natural language processing.
Liang Chen
Moonshot AI
Research scientist at Moonshot AI working on foundation models, multimodal large language models, and agents; previously worked at Huawei Noah's Ark Lab and studied at the Chinese University of Hong Kong.
Dongliang Wang
Moonshot AI
Dongliang Wang is a research scientist at Moonshot AI whose public profiles highlight multimodal large language models. His homepage also notes earlier PhD work at Shanghai AI Lab and Shanghai Jiao Tong University.
Keqin Chen
Alibaba Qwen
Researcher focused on large language models and multimodal learning, with public profiles linking Keqin Chen to Beihang University and to Qwen vision-language model work.
Chenzhuang Du
Moonshot AI
Technical staff member at Moonshot AI whose public profile highlights work on web and app agents, multimodal systems, reinforcement learning, and LLMs.
Dikang Du
Moonshot AI
Dikang Du is a research scientist at Moonshot AI. His homepage says he received a Ph.D. from Cornell University and works on natural language processing, machine learning, and multimodal learning.
Hao Hu
Moonshot AI
Technical staff member at Moonshot AI working on general AI agents, reinforcement learning, and multimodal foundation models.
Haoning Wu
Moonshot AI
PhD student in computer science at the University of Hong Kong working in vision and machine intelligence.
Lin Sui
Moonshot AI
Researcher in computer vision and multimodal learning. Public profile lists PhD study in computer science and engineering at HKUST under Qifeng Chen.
Shu Zhong
ByteDance Seed
Public sources identify Shu Zhong as a researcher at ByteDance Inc. and list him as an author of the Seed1.5-VL Technical Report.
Christopher Clark
Ai2
Christopher Clark is a researcher working on language models, efficient inference, and trustworthy NLP systems. His public profile highlights work at the intersection of NLP, efficiency, and model evaluation.
Tianbao Xie
Alibaba Qwen
Research scientist on the Qwen team at Alibaba Group, focusing on foundation models and language agents. He received a PhD in computer science from the University of Illinois Urbana-Champaign.
Zheng Zhang
ByteDance Seed / Moonshot AI
Publicly listed coauthor of the Kimi-VL, Kimi K2.5, and Seed1.5-Thinking technical reports, with public work spanning vision-language, multimodal, and reasoning models.
Huixia Li
ByteDance Seed
Publicly listed as a coauthor on the Seed1.5-VL technical report and ByteDance Seed's ICML 2025 paper "Polybasic Speculative Decoding Through a Theoretical Perspective."
Peibin Chen
ByteDance Seed
Publicly identifiable as Peibin Chen via an OpenReview profile with a confirmed @bytedance.com email. Public sources link this researcher to the MM 2024 poster "SimpliGuard: Robust Mesh Simplification In the Wild" and to authorship on the Seed1.5-VL Technical Report.
Antonio Torralba
Google Gemini
Antonio Torralba is the Delta Electronics Professor in the EECS Department at MIT and a member of CSAIL whose research focuses on computer vision, visual learning, and scene understanding.
Jinbo Zhao
Alibaba Qwen
PhD student in CSLT at Tsinghua University working on large language models, multimodal large language models, and speech-language models; publication context connects Jinbo Zhao to the Qwen2.5-VL technical report.
Wenhai Wang
DeepSeek
Wenhai Wang is a researcher working on visual perception foundation models, efficient learning, and multimodal large models. Public profiles list him with OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.
Chaoyi Deng
ByteDance Seed
Chaoyi Deng is listed on Mingsheng Long's official Tsinghua University page as a Tsinghua undergraduate from 2019 to 2023 and a master's student from 2023 to 2026. Public sources also list Deng as a coauthor of the NeurIPS 2023 paper "Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning" and the ByteDance Seed "Seed1.5-VL Technical Report."
Jingren Zhou
Moonshot AI / Alibaba Qwen
Alibaba senior technology leader and researcher associated with Qwen. Public profiles list him with Alibaba Group, and official Alibaba Cloud coverage identifies him as a chief technology officer leading large-model work.
Kai Dang
Alibaba Qwen
Researcher on Alibaba's Qwen team focused on large language models and NLP, with public research profiles listing a Nankai University background.
An Yang
Alibaba Qwen
Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.
Fei Huang
Alibaba Qwen
Researcher at Alibaba Group working on natural language processing and multimodal AI.
Ke Shen
ByteDance Seed
Public OpenReview information lists Ke Shen as a researcher at Seed, ByteDance Inc. with expertise in language models, pretraining, and information retrieval.
Shipeng Yan
ByteDance Seed
Shipeng Yan is a co-author on ByteDance Seed's Seed1.5-Thinking and Seed1.5-VL technical reports. DBLP records additional publications on large-scale LLM training, self-rewarding preference optimization, and continual vision-language pretraining.
Shuangzhi Wu
ByteDance Seed
Shuangzhi Wu is listed as an author on ByteDance Seed technical reports and has a public DBLP publication record spanning NLP topics including machine translation, summarization, translation quality estimation, and related language modeling work.
Wanjun Zhong
ByteDance Seed
Wanjun Zhong is a researcher in large language models whose public bibliography includes work on AGIEval, MemoryBank, tool-use benchmarking, knowledge editing, and code-generation reliability. Zhong is also listed as a coauthor on the Seed1.5-VL and Seed1.5-Thinking technical reports.
Zuquan Song
ByteDance Seed
Zuquan Song is publicly listed as a ByteDance Seed author on Seed technical reports and on ByteDance Seed's OSDI 2025 publication "Understanding Stragglers in Large Model Training Using What-if Analysis." dblp also attributes 2024-2025 publications on LLM training infrastructure, checkpointing, faulty-machine detection, and GPU communication overlap to Zuquan Song.
Jena D. Hwang
Ai2
Research scientist at the Allen Institute for AI (Ai2) whose work focuses on natural language understanding and commonsense reasoning.
Xiaodong Deng
Alibaba Qwen
Research scientist in Tongyi Lab whose official profile highlights post-training and multimodal large language models.
Yuanzhi Zhu
Alibaba Qwen
Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.
Donghong Zhong
ByteDance Seed
Publicly documented as a co-author of the Seed1.5-VL Technical Report and other DBLP-indexed work spanning multimodal reasoning and computer vision.
Jingqun Tang
ByteDance Seed
Jingqun Tang is a computer vision and multimodal AI researcher whose public publication record includes work on vision-language models, document image parsing, OCR and text-centric benchmarks, scene text recognition, and diffusion-based image editing.
Ke Wang
ByteDance Seed
Ke Wang is a ByteDance Seed researcher and corresponding author of WideSearch, a benchmark for broad information-seeking agents. He is also listed as an author of the Seed1.5-VL Technical Report.
Qingyao Shuai
ByteDance Seed
Publicly indexed publications for Qingyao Shuai include ICCV 2023 work on unsupervised 3D object reconstruction and a 2024 large language model reasoning paper; Qingyao Shuai is also listed as an author on the Seed1.5-VL Technical Report.
Sihang Yuan
ByteDance Seed
Sihang Yuan is publicly listed as a coauthor on ByteDance Seed's Seed1.5-VL Technical Report and on DBLP-indexed 2025 multimodal research papers.
Tianheng Cheng
ByteDance Seed
Tianheng Cheng is a computer vision researcher listed as a coauthor of the Seed1.5-VL Technical Report and on DBLP for prior vision publications including work on instance segmentation and open-vocabulary detection.
Xuehan Xiong
ByteDance Seed
Public records link Xuehan Xiong to research in computer vision and video understanding, with DBLP-listed publications from 2010 to 2025 including work on face alignment, video recognition, video localization, and authorship on the Seed1.5-VL Technical Report.
Angang Du
Moonshot AI
Research Scientist at Moonshot AI whose public work focuses on large language models, multimodal models, and embodied AI; he previously earned a PhD from Zhejiang University and was a visiting student at Oxford.
Jianlin Su
Moonshot AI
Research scientist and writer behind Scientific Spaces whose public profile lists work on large language models and service on the Kimi team at Moonshot AI.
Jianqiang Wan
Alibaba Qwen
Research scientist in Alibaba DAMO Academy's Tongyi Lab working on multimodal learning, vision-language models, and embodied AI; author on the Qwen2-VL and Qwen2.5-VL technical reports.
Nikolay Savinov
Google Gemini
Research scientist at Google DeepMind on the Gemini team, working on multimodal AI.
Qing Yu
DeepSeek
Researcher at DeepSeek and a first-year computer science PhD student at the University of Science and Technology of China; works on multimodal reasoning and world models; coauthor of Janus.
Shaowei Liu
Moonshot AI
Researcher working on multimodal learning and vision-language systems, with public academic work on visual question answering and related topics.
Yuqi Wang
DeepSeek
Research scientist at DeepSeek and PhD student at the University of Illinois Urbana-Champaign working on multimodal foundation models, large language models, and embodied AI.
Bowen Wang
Moonshot AI
PhD student at the University of Hong Kong who worked as a research intern at Moonshot AI in 2025 and studies digital agents, computer-use agents, and multimodal intelligence.
Cheng Chen
Moonshot AI
Research scientist at Moonshot AI with public profiles covering large language models, diffusion models, and generative AI.
Jiaqi Deng
Moonshot AI
Computer science graduate from the University of Hong Kong who worked as a research intern at Moonshot AI on general-purpose computer-use agents.
Jin Xie
Moonshot AI
Researcher at Moonshot AI with public homepage and GitHub profiles under the name Xixia Zhong.
Kun Ouyang
Moonshot AI
Technical staff at Moonshot AI working on large language model reasoning, agents, and multimodal large models.
Matthieu Devin
Google Gemini
Research scientist at Google DeepMind based in Paris, focused on deep learning and computer vision.
Weixin Xu
Moonshot AI
Research scientist at Moonshot AI with public GitHub and Google Scholar profiles covering efficient inference and multimodal systems.
Xiaodong Zhu
DeepSeek
Research intern at DeepSeek and master's student at Tsinghua University working on large language models, multimodal models, and reinforcement learning.
Xiaohua Zhai
Google Gemini
Xiaohua Zhai is a researcher on the Google Research team in Zurich whose work focuses on large multimodal models and efficient deep learning.
Xiaokun Yuan
Moonshot AI
AI researcher at Moonshot AI with a public homepage and Google Scholar profile spanning robust AI, computer vision, and multimodal systems.
Yibo Miao
Moonshot AI
Moonshot AI researcher working on large language models, coding agents, and multimodal safety; his public homepage also documents earlier study at Shanghai Jiao Tong University and Huazhong University of Science and Technology.
Yiqin Wang
Moonshot AI
Researcher at Moonshot AI with a personal homepage and GitHub profile covering machine learning research.
Yuzhi Wang
Moonshot AI
Researcher at Moonshot AI focused on large language models, computational photography, and low-level computer vision; previously worked at Megvii and completed a PhD and postdoc at Tsinghua University.
Yuzi Yan
Moonshot AI
Research scientist at Moonshot AI. Previously, he was a PhD student at Princeton and interned at Google and Amazon.
Zhaowei Li
Moonshot AI
Research scientist at Moonshot AI working on multimodal AI agents, large multimodal models, video generation, speech, machine learning systems, and AI for science.
Mingxuan Wang
ByteDance Seed / Mistral AI
Mingxuan Wang is a researcher at ByteDance Seed. Public ByteDance Seed sources identify Wang Mingxuan as a Senior Researcher on the Doubao Seed Team, and official Seed publications list Mingxuan Wang as an author on reasoning and multimodal technical reports.
Yu Liu
ByteDance Seed / MiniMax
Yu Liu is publicly listed as a co-author on the 2025 Seed1.5-VL, MiniMax-01, and MiniMax-M1 technical reports.
Leonardo Beyer
Google Gemini
Leonardo Beyer is a research scientist at Google DeepMind. His public homepage highlights work across representation learning, multimodal models, and large-scale machine learning systems.
Nuo Xu
Moonshot AI
Multimodal and omni-model engineer whose public profile lists Moonshot AI experience and Kimi-VL among recent projects.
Yufei Zhang
DeepSeek
Researcher at the University of Illinois Urbana-Champaign focused on vision-language models, multimodal large language models, and physical AI.
Yuqing Wang
DeepSeek
Research intern at DeepSeek and PhD student at Princeton University whose research interests include large language models and multimodal foundation models.
Zhengyang Wang
DeepSeek
Research intern at DeepSeek and master's student at Renmin University of China working on multimodal large language models and AI agents.
Zheren Fu
Alibaba Qwen
Tongyi Lab researcher working on large language models, vision-language models, and reinforcement learning; public profiles connect Zheren Fu to the Qwen2-VL technical report.
Yue Ling
ByteDance Seed
Public evidence links Yue Ling to ByteDance research through an OpenReview profile that lists a researcher role, a confirmed @bytedance.com email, and expertise in large language models and VLMs. Yue Ling is also listed as an author of the Seed1.5-VL Technical Report on arXiv.
Han Zhu
Moonshot AI
Machine learning researcher with a public homepage and GitHub profile covering AI research and engineering projects.
Jiaming Guo
DeepSeek
PhD student at The Chinese University of Hong Kong focused on multimodal reasoning, optical character recognition, and document parsing; coauthor of DeepSeek-VL.
Liangtao Shi
Ai2
Research scientist at the Allen Institute for AI working on multimodal large language models, embodied agents, and reasoning for robots and games.
Matt Deitke
Ai2
Matt Deitke is a researcher at Ai2 whose public homepage and Google Scholar profile highlight work on multimodal learning, vision-language models, embodied AI, and open models.
Molly S. Lewis
Ai2
Molly S. Lewis is an Assistant Professor of Psychology at Princeton University whose research examines how language is shaped by social and cultural structure.
Roi Reichart
Ai2
Professor at the Technion and head of the CHIA Lab, with research spanning natural language processing, machine learning, and social-good applications.
Xiuye Gu
Google Gemini
Xiuye Gu is a researcher whose public work focuses on vision-language modeling and machine learning systems.
Yan Zhong
Moonshot AI
Research scientist at Kimi AI (Moonshot AI). Previously completed a PhD in computer science at the University of Wisconsin-Madison.
Junxiao Song
DeepSeek
Member of Technical Staff at DeepSeek.
Haowei Zhang
DeepSeek
Research scientist at DeepSeek with public GitHub work on language models and AI systems.
Peng Wang
Alibaba Qwen
Researcher affiliated with the Qwen team at Alibaba Group on Google Scholar and coauthor of the Qwen and Qwen3 technical reports.
Wenbin Ge
Alibaba Qwen
Research scientist in Tongyi Lab whose official profile highlights work on efficient reinforcement learning, generalization, inference-time scaling, and reasoning for large language models.
Luke Zettlemoyer
Ai2
Professor at the University of Washington whose public research spans NLP, machine learning, and semantic parsing; arXiv author results include OLMo and OLMo 2.
Guang Shi
ByteDance Seed
Guang Shi is publicly listed as a coauthor on ByteDance Seed papers including Seed1.5-VL Technical Report, Seed1.5-Thinking, Emerging Properties in Unified Multimodal Pretraining, and UI-TARS, supporting a conservative profile centered on vision-language models, multimodal pretraining, and GUI agents.
Jiaze Chen
ByteDance Seed
Jiaze Chen is publicly listed as a coauthor on ByteDance Seed's 2025 Seed-Thinking-v1.5 publication, the Seed1.5-VL technical report, and the Seed-Coder repository/report, indicating contributions across reasoning, vision-language, and code-focused foundation model work.
Liang Xiang
ByteDance Seed
ByteDance Seed publicly identified Liang Xiang in December 2024 as Head of the Doubao Foundation Model Team. Public Seed and DBLP records also list him as a coauthor on work about LLM pre-training model merging and ByteDance LLM training infrastructure.
Shen Yan
ByteDance Seed
Shen Yan is publicly listed as a co-author on ByteDance Seed research publications including Seed1.5-VL, Seed1.5-Thinking, MME-CoT, and TC-MoE.
Yanghua Peng
ByteDance Seed
Public sources list Yanghua Peng as a coauthor on ByteDance Seed technical reports and on ByteDance engineering research artifacts related to distributed training and checkpointing for large models.
Zeyu Wang
ByteDance Seed / MiniMax
Public sources list Zeyu Wang as an author of ByteDance Seed's Seed1.5-VL Technical Report and the Seed publication "Emerging Properties in Unified Multimodal Pretraining" (BAGEL), and as a coauthor of the MiniMax-M1 technical report.
Maxwell Collins
Google Gemini
Maxwell Collins is a Research Scientist at Google DeepMind.
Xinyu Yang
ByteDance Seed / DeepSeek
Researcher credited as a co-author on DeepSeek-V2, DeepSeek-V3, and Seed1.5-VL technical reports, and listed among the authors of the Nature paper on DeepSeek-R1.
Chang Zhou
Alibaba Qwen
Qwen researcher and co-lead whose work focuses on pretraining and post-training, multimodal models, agent systems, and large-scale model infrastructure.
Pradeep Dasigi
Ai2
Research scientist at Ai2 whose work focuses on natural language processing, semantic parsing, grounded language understanding, and question answering.
Shijie Wang
Alibaba Qwen
Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.
Shujie Wang
DeepSeek
First-year PhD student at Shanghai Jiao Tong University focused on multimodal large language models, text-to-image generation, and image/video generation; coauthor of DeepSeek-VL2.
Yonggang Zhang
DeepSeek
Yonggang Zhang is a researcher whose public OpenReview profile includes the DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding paper.
Chaorui Deng
ByteDance Seed
Chaorui Deng is publicly credited as an author on ByteDance Seed multimodal research publications, including the Seed1.5-VL technical report and the BAGEL paper on unified multimodal pretraining.
Jiahao Li
ByteDance Seed
Jiahao Li is publicly listed as a coauthor on ByteDance Seed's Seed1.5-VL Technical Report and on the official ByteDance Seed publication page for UI-TARS, indicating work related to multimodal and GUI-agent research.
Shihao Liang
ByteDance Seed
Public sources list Shihao Liang as a coauthor on ByteDance's UI-TARS project and the Seed1.5-VL technical report.
Aman Singh
DeepSeek
Research intern at DeepSeek and PhD student at Stanford University working on generative vision-language models, large language models, and large-scale training.
Bohong Yin
Moonshot AI
Research scientist at Moonshot AI focused on machine learning systems; public profiles note prior PhD study at the Max Planck Institute and Technical University of Munich.
Bowei Xing
Moonshot AI
Technical Staff at Moonshot AI.
Bowen Qu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Chu Wei
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Dehao Zhang
Moonshot AI
Technical staff member at Moonshot AI and machine learning researcher; public profiles note prior study at the Gaoling School of AI at Renmin University of China.
Enming Yuan
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.
Enzhe Lu
Moonshot AI
PhD student in Computer Science at the University of Hong Kong. His research interests include multimodal large language models and embodied AI, and he co-authored the Kimi-VL technical report.
Fang Li
Moonshot AI
Research Scientist at Moonshot AI.
Guokun Lai
Moonshot AI
Research scientist at Moonshot AI whose work focuses on large foundation models and multimodal models.
Haiyang Xu
Alibaba Qwen
Independent researcher focused on multimodal learning, document intelligence, and efficient training; coauthor of Qwen2.5-VL and mPLUG-related vision-language systems.
Hang Zhang
Alibaba Qwen
Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.
Hao Ding
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and computer vision.
Haotian Yao
Moonshot AI
Research scientist at Moonshot AI who previously studied at Tsinghua University and works on large foundation models.
Hongcheng Gao
Moonshot AI
Generative AI researcher at Moonshot AI with public work spanning computational imaging and AI systems.
Jialin Wang
Alibaba Qwen
Research scientist in Tongyi Lab and contributor to Qwen2-VL, with public work on multimodal large language models.
Jiezhong Qiu
Moonshot AI
Researcher at Moonshot AI with public GitHub and scholarly profiles covering machine learning and AI systems.
Jinhong Wang
Moonshot AI
Technical Staff at Moonshot AI.
Junjie Yan
Moonshot AI
Research scientist at Moonshot AI with public work on computer vision and multimodal models.
Longhui Yu
Moonshot AI
Research Scientist at Moonshot AI.
Mengfan Dong
Moonshot AI
Technical Staff at Moonshot AI.
Mengnan Dong
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Nan Ding
Google Gemini
Researcher at Google Research whose public work includes multimodal and vision-language modeling, with arXiv publications tied to PaliGemma and related transfer work.
Pengfei Wang
Alibaba Qwen
Research scientist in Alibaba DAMO Academy's Tongyi Lab working on machine learning, computer vision, and multimodal large language models; author on the Qwen2-VL and Qwen2.5-VL technical reports.
Qizheng Gu
Moonshot AI
Research scientist at Moonshot AI with public work on language models and reasoning.
Rui Hu
DeepSeek
PhD student at the University of Science and Technology of China focused on machine learning and multimodal understanding and generation; coauthor of Janus.
Runjie Zhou
Moonshot AI
Research scientist at Moonshot AI and PhD student at Shanghai Jiao Tong University whose homepage highlights multimodal understanding, generation, large language models, and agents.
Shan Lu
DeepSeek
Research scientist at ByteDance Seed focused on multimodal representation learning, self-supervised learning, and diffusion models; coauthor of Janus and JanusFlow.
Sibo Song
Alibaba Qwen
Research scientist in Tongyi Lab and maintainer of Qwen-VL, with public work on vision-language models.
Tianhui Song
Moonshot AI
Research Scientist at Moonshot AI.
Tongtong Bai
Moonshot AI
Research scientist at Moonshot AI and the University of Wisconsin-Madison with public work on large language models and reasoning.
Weiran He
Moonshot AI
Research scientist at Moonshot AI whose public GitHub profile highlights work on multimodal large language models and agents.
Weixiao Huang
Moonshot AI
Research Scientist at Moonshot AI.
Xinhao Li
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal and long-context model research.
Xinyuan Wang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Xinyu Luo
DeepSeek
PhD student at Shanghai Jiao Tong University working on multimodal large language models and image understanding and generation; coauthor of Janus.
Xinyu Zhou
Moonshot AI
Technical Staff at Moonshot AI.
Xuejing Liu
Alibaba Qwen
Xuejing Liu is a researcher whose public OpenReview profile includes the Qwen2-VL and Qwen2.5-VL technical report papers.
Yang Li
Moonshot AI
Co-founder and chief executive officer of Moonshot AI.
Yangyang Hu
Moonshot AI
Researcher at Moonshot AI with a public GitHub profile and work spanning machine learning systems.
Yanru Chen
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yejie Wang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yibo Liu
Moonshot AI
Research scientist at Moonshot AI whose public profile highlights work on multimodal generation, multimodal large language models, and efficient LLMs.
Yimin Chen
Moonshot AI
Research scientist at Moonshot AI with public GitHub projects spanning language models and multimodal systems.
Yiping Bao
Moonshot AI
Researcher at Moonshot AI with a public GitHub profile covering AI systems work.
Yuanxin Liu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yu Han
Alibaba Qwen
Researcher affiliated with Alibaba Group on Google Scholar and coauthor of the Qwen technical report.
Yuhao Dong
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.
Yukang Chen
DeepSeek
PhD student at The University of Hong Kong focused on large multimodal models and data-centric AI, especially multimodal understanding and generation; coauthor of Janus.
Yuxin Wu
Moonshot AI
Researcher at Moonshot AI with public GitHub projects spanning AI systems.
Yuxuan Cao
DeepSeek
Research assistant at The University of Hong Kong focused on multimodal reasoning and generation, large language models, and embodied AI; coauthor of Janus.
Zhejun Jiang
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.
Zhilin Yang
Moonshot AI
Co-founder and CTO of Moonshot AI, and co-author of the Kimi-VL and Kimi K2.5 technical reports.
Zhiyuan Ruan
DeepSeek
PhD student at The University of Hong Kong focused on multimodal large language models, image and video understanding, generation, and editing; coauthor of Janus.
Zijia Zhao
Moonshot AI
Research scientist at Moonshot AI with public scholarly work on language and multimodal models.
Ziwei Chen
Moonshot AI
Research scientist at Moonshot AI with public work on multimodal learning and language models.
Xiaohan Ding
ByteDance Seed
Xiaohan Ding is publicly credited as a coauthor on the Seed1.5-VL Technical Report and InternVL3 technical report, both focused on multimodal and vision-language models.
Shuaishuai Cao
ByteDance Seed
Publicly listed as a coauthor on the Seed1.5-VL Technical Report and the 2025 CoRR paper OVERLORD on scaling data loading for large foundation model training.
Alexander Kolesnikov
Google Gemini
Alexander Kolesnikov is a Research Scientist at Google DeepMind exploring multimodal general intelligence.
Alyssa Sellitto
Ai2
Research scientist at Ai2 focused on multimodal machine learning, vision-language models, and understanding human-centered image variation.
Andrea Dafoe
Google Gemini
Andrea Dafoe is a senior research scientist at Google DeepMind whose work focuses on frontier AI risks, international governance, and the societal impacts of advanced AI.
Bilal Mustafa
Google Gemini
Senior research scientist at Google DeepMind.
Chenlin Zhang
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report.
Dieter Fox
Ai2
Senior director of embodied AI at Ai2 and professor at the University of Washington working in robotics, computer vision, and machine learning.
Guangda Wei
Moonshot AI
Research scientist at Moonshot AI with public publications on multimodal learning and efficient large-model systems.
Heng Wang
Moonshot AI
Research Scientist at Moonshot AI.
Humen Zhong
Alibaba Qwen
Research scientist in Tongyi Lab and a major contributor to Qwen2-VL, with public work on multimodal foundation models.
Jiaming Li
Moonshot AI
Research Scientist at Moonshot AI.
Jianzhou Wang
Moonshot AI
Research scientist at Moonshot AI working on multimodal large models and point-cloud perception and generation.
Jingyuan Liu
Moonshot AI
Research scientist at Moonshot AI with a public homepage covering prior academic work and research projects.
Olivier Henaff
Google Gemini
Research scientist at Google DeepMind working on deep learning, reinforcement learning, self-supervised learning, and robotics.
Rohit Saxena
Google Gemini
Rohit Saxena is a Research Scientist at Google DeepMind working on visual perception, multimodal learning, and language understanding.
Ronak Mandlekar
Ai2
PhD student at Stanford and research scientist at the Allen Institute for AI working on robotics, multimodal models, and embodied AI.
Sihan Cao
Moonshot AI
Researcher affiliated with Moonshot AI on Google Scholar and coauthor of the Kimi-VL technical report.
Sipeng Zhang
DeepSeek
PhD student at The University of Hong Kong focused on large multimodal models, image and video generation, and multimodal understanding; coauthor of Janus.
Wei Song
Moonshot AI
Researcher at Moonshot AI. Public profile notes prior PhD study in computer science at the Chinese University of Hong Kong.
William Kolesnikov
Google Gemini
Staff software engineer at Google DeepMind working on post-training, alignment, multimodal models, and data filtering. He previously worked on hardware and software co-design for machine learning.
Xingzhe Wu
Moonshot AI
Researcher and co-author of the Kimi-VL Technical Report.
Xinyu Chen
DeepSeek
Research intern at NUS and Nanjing University working on machine learning and multimodal large language models; coauthor of DeepSeek-VL2.
Yanxia Cui
DeepSeek
Researcher working on multimodal and vision-language models, including DeepSeek-VL2 and related model optimization work.
Yidao Qin
Moonshot AI
Research Scientist at Moonshot AI.
Yongsheng Kang
Moonshot AI
Research Scientist at Moonshot AI.
Yuanhang Zhang
Alibaba Qwen
Research scientist in Tongyi Lab and major contributor to Qwen2.5-VL, with public work on multimodal large language models.
Yuyi Wang
Alibaba Qwen
Research intern in Tongyi Lab whose public profile highlights work on multimodal large language models and video understanding.
Zhaohai Li
Alibaba Qwen
Research scientist in Tongyi Lab and technical lead of Qwen2-VL, with public work on vision-language models.
Zhenyu Yang
Alibaba Qwen
PhD student at Nanjing University and research intern at Alibaba Tongyi Lab working on multimodal large language models and visual understanding; coauthor of Qwen2.5-VL.
Zihan Liu
DeepSeek
Zihan Liu is a research scientist at DeepSeek. His public homepage highlights work in multimodal learning, vision-language models, and large-scale machine learning.
Zongyu Lin
Moonshot AI
Technical Staff at Moonshot AI.