Vision-Language Models | Field

Research scientist focused on machine learning and natural language understanding, with work spanning machine translation, semantic parsing, and large-scale language modeling.

Researcher focused on computer vision, multimodal learning, and generative AI. His public homepage says he is currently with Stepfun, after serving as a principal scientist at SenseTime Research and a researcher at Microsoft Research Asia, and that he earned a PhD in computer science from Tsinghua University.

Researcher at DeepSeek AI interested in generative models, large language models, multimodal learning, and computer vision. He is pursuing a PhD in electrical engineering at Stanford University after earning a bachelor's degree from Tsinghua University, and has also worked at Meta AI and Google.

Junyang Lin (Justin Lin) is a researcher and open-source maintainer known for the Qwen family of models. His public profiles list interests in LLMs, AI agents, multimodal learning, long-horizon reasoning, world models, and reinforcement learning; multiple March 2026 news reports said he stepped down from the Qwen tech lead role.

PhD student at Renmin University of China whose homepage highlights multimodal foundation models and video understanding, with public projects and papers including DeepSeek-VL and Kimi-VL.

Renjie Zheng is a researcher at ByteDance. His public OpenReview profile lists prior research experience at Baidu Research and earlier study at Oregon State University and Tongji University, with public work spanning NLP, language models, and reasoning-related research.

Yuwen Xiong is a Research Scientist at ByteDance Seed in the Bay Area. He received a Ph.D. from the University of Toronto's Machine Learning Group and previously worked at Waabi and Uber ATG.

Yiheng Xu is a research scientist focused on multimodal AI, coding agents, and reasoning systems. His public profiles link him to Qwen research and later work at OpenAI, with publications spanning vision-language models and code generation.

Feida Zhu is a ByteDance researcher in computer vision. Public sources show prior roles at Tencent Youtu Lab and Nanyang Technological University, and a PhD in Computer Science from The University of Hong Kong.

Public sources identify Can Huang as a ByteDance researcher. OpenReview lists prior research work at Intsig Information Co. Ltd., master's study at Shanghai Jiao Tong University, and undergraduate study at Xi'an Jiaotong University.

Weihao Yu is a researcher in computer vision, multimodal AI, and neural architecture design. Public profiles show PhD and postdoctoral work at the National University of Singapore, a 2025 Research Scientist role at ByteDance Seed, and a 2026 appointment at Peking University Shenzhen Graduate School.

Research scientist at Moonshot AI studying scaling in reinforcement learning and large language models.

Jian-Hui Duan is an algorithm researcher in the ByteDance Seed LLM team whose public homepage highlights work on pretraining data, training optimization, and distribution-shift mitigation for large language models.

Xinxing Zu is a research scientist at Moonshot AI. His public profile notes a PhD in computer science from New York University, prior work as an AI scientist at Amazon, and research interests in AI agents, reasoning, and multimodality.

Rui Qian is a researcher at ByteDance Seed. Qian received a Ph.D. from The Chinese University of Hong Kong and a bachelor's degree from Shanghai Jiao Tong University in 2021.

Yujia Qin focuses on LLM/VLM-based agents. The official homepage lists a Ph.D. in Computer Science from Tsinghua University (2020-2024), a B.E. in Electronic Information Science and Technology from Tsinghua University (2016-2020), and Seed at ByteDance starting in July 2024.

Xingcheng Yao is a research scientist at Moonshot AI. His public profile notes prior work as a research engineer at Tencent AI Lab, a PhD in computer science from the University of Southern California, and research interests spanning NLP, multimodal systems, and AI agents.

Yifan Du is a Ph.D. student at Renmin University of China advised by Wayne Xin Zhao. His public homepage lists research interests in multimodal large language models, visual instruction tuning, long video understanding, and complex visual reasoning, and it lists a VLM post-training internship at ByteDance Seed.

Chenwei Lou is a researcher at ByteDance Seed. Public profiles indicate earlier research experience at Tencent, an MS period at Harbin Institute of Technology, and earlier undergraduate study at Jilin University.

MS student at Peking University's Wangxuan Institute of Computer Technology working on visual reasoning, VLM, OCR, and handwritten mathematical expression recognition.

Chenhui Gou is a PhD student at Monash University in the Vision & Language for Autonomous AI Group. Public profiles list his research in computer vision, visual perception, and reasoning, following earlier master's study at the Australian National University.

Tenglong Ao is a computer science researcher whose public work focuses on character animation, co-speech gesture generation, music-driven dance generation, and multimodal agent behavior. Public sources indicate he received a Ph.D. in computer science from Peking University and a B.S. in electronic engineering from Tsinghua University.

Zilong Huang is a principal researcher at Tencent Singapore whose work focuses on multimodal interaction, multimodal agents, multimodal pretraining, and efficient model architectures. He previously worked at ByteDance and received his Ph.D. and B.E. from Huazhong University of Science and Technology.

Bencheng Liao is a PhD student at the Institute of Artificial Intelligence, Huazhong University of Science and Technology. Public profiles list research interests including autonomous driving, object detection, and visual perception.

Joya Chen is a final-year Ph.D. candidate at Show Lab, National University of Singapore, advised by Mike Zheng Shou. Her research focuses on large multimodal models for video, including data scaling, model architecture, pre-training, post-training, and benchmarking.

Public sources connect Renrui Zhang to multimodal large language models, 3D point cloud analysis, and efficient model adaptation.

Rui Yang is a PhD student at The University of Hong Kong. Public OpenReview and DBLP records link him to research on multimodal large language models, language agents, and vision-language systems, including GPT4Tools, HaploVL, and Visual Spatial Tuning.

Yining Ye (叶奕宁) is a master's student in the THUNLP Lab at Tsinghua University advised by Zhiyuan Liu. Public profiles describe research on LLM/VLM agents, tool learning, and unified language models.

Yurui Ren is a researcher working on image/video synthesis and image enhancement. Public sources identify him as a Ph.D. student at Peking University (2017-2022), advised by Ge Li, with a B.S. from Dalian University of Technology.

Public sources associate Zewei Sun with ByteDance and list Nanjing University education (BS 2013-2017, MS 2017-2020), with research spanning machine translation and later large language model work.

Yufeng Yuan is listed on OpenReview as staff at ByteDance Seed, previously at Tencent AI Lab, with an MS in Computing Science from the University of Alberta.

Public records identify Xuefeng Xiao as a ByteDance-affiliated researcher whose work centers on model compression, optimization, and acceleration; his OpenReview profile also lists M.S. study at South China University of Technology.

Junting Lu is a final-year M.S. student at the Institute for Software Engineering, Peking University. His homepage lists research interests in tool learning, OS agents, and native vision-language agents.

Xinchen Zhang is a master's student at Tsinghua University and a research intern at ByteDance Seed. His public profiles describe research on multimodal large language models, reinforcement learning for multimodal post-training, and earlier work on diffusion-based text-to-image generation.

Senior algorithm expert at Alibaba Group working on large language models, multimodal large language models, and diffusion models.

Jianyu Jiang is a tech lead at ByteDance Seed working on large-scale AI foundation model training and efficient AI training frameworks.

Yanwei Li is a Research Scientist at ByteDance Seed in San Jose working on foundation models for vision and language. He received his PhD from The Chinese University of Hong Kong under Jiaya Jia and previously interned at NVIDIA Research and MEGVII.

Public profile information identifies Kai Hua as a researcher at ByteDance Seed with prior research experience at Kuaishou MMU and a master's background in EECS at Peking University. His listed research areas include deep learning, NLP, language modeling, and multimodal AI.

Bairen Yi is a researcher at ByteDance's Seed Foundation Team. Public publication records link him to work on efficient LLM inference, KV-cache compression, and model/model-expert merging.

Researcher at DeepSeek whose public homepage describes work on DeepSeek R1, V1, V2, V3, Math, Coder, and mixture-of-experts systems.

Researcher at Moonshot AI working on multimodal large language models; previously a key member of Alibaba's Qwen team and author of work including Kimi-VL, DeepSeek-VL, and Qwen technical reports.

Chunyuan Li is a multimodal AI researcher whose public homepage highlights work on large-scale language and vision training, including LLaVA, GroundingDINO, GLIP, GLIGEN, Florence, and Oscar.

Haoqi Fan is a computer vision researcher working on multimodal foundation models. His public homepage says he graduated from Carnegie Mellon University's Robotics Institute, and his OpenReview profile lists him as a researcher at Seed, ByteDance Seed.

Lin Yan is a ByteDance Seed researcher whose public OpenReview profile lists expertise in NLP and LLMs, with earlier work in recommender systems.

Kunchang Li is a PhD student at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Public sources list his research interests as multimodal large language models, video understanding, action recognition, and lightweight networks; he is also listed as an author on ByteDance Seed publications including the Seed1.5-VL technical report.

Public sources identify Yunhao Fang as a multimodal AI researcher with a master's degree from UC San Diego and a B.Eng. from Zhejiang University; his public GitHub activity includes the ByteDance-Seed AHN repository.

Public profiles list Zhengzhuo Xu as a PhD student at Tsinghua University (2023-2026), after an MS at Tsinghua (2020-2023). His publication record includes work on chart understanding, visual language modeling, and long-tailed recognition.

Hongxiang Hao is a researcher at ByteDance Inc. listed on OpenReview since November 2024. That profile lists prior research experience at Baidu, an M.S. from the University of the Chinese Academy of Sciences, and undergraduate study at Nankai University; public authorship also ties him to the Seed1.5-VL report and the Goku image-video generation project.

Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.

Publicly listed as a ByteDance researcher focused on large language models, with Tsinghua University education history, and as an author on ByteDance Seed's Seed1.5-VL and Seed1.5-Thinking reports.

Yulun Du is a Moonshot AI-affiliated researcher. Public profiles also show prior work and study at Carnegie Mellon University, including a Master of Language Technologies completed in 2020.

Heng Ji is a professor of computer science at the University of Illinois Urbana-Champaign and founding director of the Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). Her research focuses on natural language processing, information extraction, knowledge-enhanced large language models, and vision-language models.

Researcher and engineer focused on reinforcement learning and embodied intelligence; his public profile lists work spanning Huawei Noah's Ark Lab, Momenta, Moonshot AI, and XVI Robotics, and he is credited on Moonshot AI technical reports.

Research scientist at Alibaba working on multimodal learning and generation; previously a postdoctoral researcher at Carnegie Mellon University.

Research scientist at Moonshot AI focused on large multimodal models and large language model post-training.

Huabin Zheng is a research scientist at Moonshot AI. His homepage says he works on large language models, multi-agent systems, code generation, and game agents.

Jun Tang works on multimodal foundation models, open-source language models, and agent systems. His personal site highlights work on Qwen and Qwen3-VL alongside related multimodal research.

PhD student at Tsinghua University focusing on multimodal large language models, reasoning, and reinforcement learning.

Assistant Professor of Computer Science at the University of Hong Kong and director of XLANG Lab, focusing on natural language processing and embodied AI agents.

PhD student at Tsinghua University focusing on LLM reasoning, RLHF, and multimodal large language models; research intern at DeepSeek.

Yue Cao is a researcher working on multimodal large language models and computer vision. His public homepage lists previous time at DeepSeek and Apple and links to work including DeepSeek-VL2.

Qwen researcher and author on the Qwen2-VL and Qwen2.5-VL technical reports, with public profiles linking his work to multimodal and vision-language systems.

Research scientist at Google DeepMind working on multimodal generative models, visual generation, and image editing; previously completed a PhD at TU Munich.

Jiahao Liu works on multimodal large language models, reasoning systems, and continual learning. His public profiles connect him to the Qwen2.5-VL technical report and related open research work.

Pengyu Cheng is a research scientist at Moonshot AI. His homepage says he works on multimodal large language models, reinforcement learning, and agents, and previously interned at Microsoft Research.

Researcher at the Allen Institute for AI (Ai2) working on vision-language and multimodal AI, with a focus on reliable reasoning and understanding beyond text.

Research scientist at DeepSeek working on large language models, multimodal learning, and machine learning systems. He was previously an applied scientist at AWS AI Labs and earned a PhD in computer science from Johns Hopkins University.

Xi Zhang works on multimodal and vision-language model research. Public profiles connect him to Qwen2-VL and related open research projects.

Researcher focused on multimodal generative models and reinforcement learning; currently at ByteDance Seed and previously at DeepSeek.

Haibin Lin works on LLM infrastructure at ByteDance Seed, focusing on training systems for large language and multimodal models from pre-training to post-training.

Chengzhi Wei is credited on the Seed1.5-VL and Seed1.5-Thinking technical reports. An OpenReview profile under the reordered name 'Wei Chengzhi' lists a ByteDance researcher role starting in 2023 and MS study at the University of California, Los Angeles from 2020 to 2022.

DeepSeek researcher focused on NLP, code intelligence, and LLM reasoning, with public work spanning DeepSeek-Coder, DeepSeekMath, DeepSeek-V2, DeepSeek-V3, and DeepSeek-R1.

Jiahui Yu is a research scientist at Google DeepMind working on multimodal learning and large language models.

Noah A. Smith is a computer scientist and professor at the University of Washington, where he serves as Vice Provost for Artificial Intelligence and co-directs the OLMo open language modeling effort with Ai2. His research focuses on natural language processing, machine learning, and evaluation methodology.

Researcher at DeepSeek AI working on decision-making and post-training for large language models.

Public sources identify Jihao Liu as a researcher working on computer vision, multimodal learning, and vision-language topics. A public GitHub profile under `jihaonew` lists "CUHK, MMLab" and a personal website, and DBLP shows publications from 2020 to 2024 across image modeling, vision pretraining, and multimodal alignment.

Kaiyuan Zhang has a public research record in AI security and privacy. DBLP lists publications on backdoor attacks, mitigation, and model security across venues including IEEE S&P, NDSS, CVPR, ECCV, NeurIPS, and ICLR, and OpenReview lists a PhD affiliation with Purdue University Computer Science.

Public sources identify Zhipeng Chen as a ByteDance engineer with earlier Baidu experience and PhD training in Electronic Engineering at Tsinghua University. Speech recognition is listed as an expertise area, and he is also listed as an author on the Seed1.5-VL technical report.

Zhuofan Zheng is a ByteDance artificial intelligence researcher publicly listed on OpenReview, with undergraduate study at Tsinghua University from 2017 to 2021 and MS study there from 2021 to 2024. Zhuofan Zheng is also listed as an author of the Seed1.5-VL Technical Report.

Researcher at Moonshot AI. Public profile notes computer science and engineering study at HKUST from 2021 to 2025.

Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.

Lucas Beyer is an ML researcher at Google DeepMind in Zurich. His public homepage highlights prior work at Google Brain and a PhD at ETH Zurich.

Maarten Sap is an assistant professor at the University of Washington and a senior research scientist at the Allen Institute for AI. His work focuses on human-centered language technologies and social NLP.

Research scientist at Moonshot AI focused on multimodal large language models.

Senior director of natural language processing at the Allen Institute for AI and professor at the University of Washington. Her public profile highlights work in natural language processing, machine learning, computer vision, and AI ethics.

Algorithm expert at Alibaba Group working on computer vision, multimodal learning, and large language models.

Associate research scientist at Moonshot AI based in Beijing, China; previously worked as a postdoctoral researcher.

Zhibo Yang works on multimodal and vision-language systems. Public profiles connect him to the Qwen2.5-VL technical report and to an individual GitHub account that links back to his personal site.

Machine learning researcher at Moonshot AI and incoming assistant professor at Shanghai Jiao Tong University.

Public sources identify Liangqiang Chen as a researcher at ByteDance AI Foundation. His OpenReview profile lists a confirmed @bytedance.com email and links to the GitHub account chenup; he is also listed as an author on the Seed1.5-VL and Seed1.5-Thinking technical reports.

OpenReview lists Yuntao Li as a Researcher at ByteDance Inc. and a PhD student at Peking University from 2017 to 2022.

Public sources identify Xijin Zhang as a researcher with a confirmed @bytedance.com OpenReview profile. That profile lists MS studies at Tsinghua University from 2014 to 2017 and undergraduate studies at Xi'an University of Electronic Science and Technology from 2010 to 2014.

Jiashi Feng is a research lead at ByteDance whose public profiles and project pages span computer vision, deep learning, vision-language work, and 3D content creation.

Jian Yang is an Associate Professor at Beihang University whose research focuses on code intelligence, large language models, and AI agents. He worked with Alibaba Qwen from 2023 to July 2025.

Yang Fan is a research scientist at Alibaba Group. His homepage says he works on large language model post-training and deployment.

Ru Zhang is a researcher publicly linked to ByteDance through technical-report authorship and an OpenReview profile with a confirmed @bytedance.com email.

Publicly listed on dblp as Xiaoying Jia 0005 with affiliation ByteDance and an attached ORCID, with 2025 publications on high-performance LLM serving, model merging in LLM pre-training, and Seed Diffusion; also listed as an author on the Seed1.5-VL and Seed1.5-Thinking technical reports.

DeepSeek team member and co-author of the DeepSeek-V3, DeepSeek-V2, and DeepSeek LLM technical reports.

Research scientist at Meta in New York City and research advisor at the UCLA NLP group; previously completed a PhD in computer science at UCLA.

PhD student at The Hong Kong University of Science and Technology (Guangzhou) whose research interests include large language models, vision-language models, AI agents, and multimodal retrieval.

CEO of the Allen Institute for AI and professor of computer science at the University of Washington. His work spans computer vision, multimodal learning, reasoning, and embodied AI.

Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.

Xinlong Wang is a researcher working across computer vision, embodied AI, robotics, and machine learning. Public profiles link him to OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.

Public evidence links Mengfei Du to EmbSpatial-Bench, a benchmark for spatial understanding in embodied large vision-language models, and to authorship on ByteDance Seed's Seed1.5-VL Technical Report.

Public author records associate Shuai Peng with work on mathematical OCR and formula recognition, logical reasoning data synthesis, and multimodal reasoning; Shuai Peng is also listed as an author of the Seed1.5-VL technical report.

Public OpenReview activity for Youbin Wu shows work on large language models, multimodal reasoning, and visual generation, consistent with the Seed1.5-VL report authorship.

Google researcher whose public profile says he joined Google in September 2008 and has been with the Google Brain team since January 2015, with interests spanning information retrieval, learning to rank, machine learning, machine translation, and natural language processing.

Research scientist at Moonshot AI working on foundation models, multimodal large language models, and agents; previously worked at Huawei Noah's Ark Lab and studied at the Chinese University of Hong Kong.

Dongliang Wang is a research scientist at Moonshot AI whose public profiles highlight multimodal large language models. His homepage also notes earlier PhD work at Shanghai AI Lab and Shanghai Jiao Tong University.

Researcher focused on large language models and multimodal learning, with public profiles linking Keqin Chen to Beihang University and to Qwen vision-language model work.

Technical staff member at Moonshot AI whose public profile highlights work on web and app agents, multimodal systems, reinforcement learning, and LLMs.

Dikang Du is a research scientist at Moonshot AI. His homepage says he received a Ph.D. from Cornell University and works on natural language processing, machine learning, and multimodal learning.

Technical staff member at Moonshot AI working on general AI agents, reinforcement learning, and multimodal foundation models.

PhD student in computer science at the University of Hong Kong working in vision and machine intelligence.

Researcher in computer vision and multimodal learning. Public profile lists PhD study in computer science and engineering at HKUST under Qifeng Chen.

Public sources identify Shu Zhong as a researcher at ByteDance Inc. and list him as an author of the Seed1.5-VL Technical Report.

Christopher Clark is a researcher working on language models, efficient inference, and trustworthy NLP systems. His public profile highlights work at the intersection of NLP, efficiency, and model evaluation.

Research scientist on the Qwen team at Alibaba Group, focusing on foundation models and language agents. He received a PhD in computer science from the University of Illinois Urbana-Champaign.

Publicly listed coauthor of the Kimi-VL, Kimi K2.5, and Seed1.5-Thinking technical reports, with public work spanning vision-language, multimodal, and reasoning models.

Publicly listed as a coauthor on the Seed1.5-VL technical report and ByteDance Seed's ICML 2025 paper "Polybasic Speculative Decoding Through a Theoretical Perspective."

Publicly identifiable as Peibin Chen via an OpenReview profile with a confirmed @bytedance.com email. Public sources link this researcher to the MM 2024 poster "SimpliGuard: Robust Mesh Simplification In the Wild" and to authorship on the Seed1.5-VL Technical Report.

Antonio Torralba is the Delta Electronics Professor in the EECS Department at MIT and a member of CSAIL whose research focuses on computer vision, visual learning, and scene understanding.

PhD student in CSLT at Tsinghua University working on large language models, multimodal large language models, and speech-language models; publication context connects Jinbo Zhao to the Qwen2.5-VL technical report.

Wenhai Wang is a researcher working on visual perception foundation models, efficient learning, and multimodal large models. Public profiles list him with OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.

Chaoyi Deng is listed on Mingsheng Long's official Tsinghua University page as a Tsinghua undergraduate from 2019 to 2023 and a master's student from 2023 to 2026. Public sources also list Deng as a coauthor of the NeurIPS 2023 paper "Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning" and the ByteDance Seed "Seed1.5-VL Technical Report."

Alibaba senior technology leader and researcher associated with Qwen. Public profiles list him with Alibaba Group, and official Alibaba Cloud coverage identifies him as a chief technology officer leading large-model work.

Researcher on Alibaba's Qwen team focused on large language models and NLP, with public research profiles listing a Nankai University background.

Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.

Researcher at Alibaba Group working on natural language processing and multimodal AI.

Public OpenReview information lists Ke Shen as a researcher at Seed, ByteDance Inc. with expertise in language models, pretraining, and information retrieval.

Shipeng Yan is a co-author on ByteDance Seed's Seed1.5-Thinking and Seed1.5-VL technical reports. DBLP records additional publications on large-scale LLM training, self-rewarding preference optimization, and continual vision-language pretraining.

Shuangzhi Wu is listed as an author on ByteDance Seed technical reports and has a public DBLP publication record spanning NLP topics including machine translation, summarization, translation quality estimation, and related language modeling work.

Wanjun Zhong is a researcher in large language models whose public bibliography includes work on AGIEval, MemoryBank, tool-use benchmarking, knowledge editing, and code-generation reliability. Zhong is also listed as a coauthor on the Seed1.5-VL and Seed1.5-Thinking technical reports.

Zuquan Song is publicly listed as a ByteDance Seed author on Seed technical reports and on ByteDance Seed's OSDI 2025 publication "Understanding Stragglers in Large Model Training Using What-if Analysis." dblp also attributes 2024-2025 publications on LLM training infrastructure, checkpointing, faulty-machine detection, and GPU communication overlap to Zuquan Song.

Research scientist at the Allen Institute for AI (Ai2) whose work focuses on natural language understanding and commonsense reasoning.

Research scientist in Tongyi Lab whose official profile highlights post-training and multimodal large language models.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Publicly documented as a co-author of the Seed1.5-VL Technical Report and other DBLP-indexed work spanning multimodal reasoning and computer vision.

Jingqun Tang is a computer vision and multimodal AI researcher whose public publication record includes work on vision-language models, document image parsing, OCR and text-centric benchmarks, scene text recognition, and diffusion-based image editing.

Ke Wang is a ByteDance Seed researcher and corresponding author of WideSearch, a benchmark for broad information-seeking agents. He is also listed as an author of the Seed1.5-VL Technical Report.

Publicly indexed publications for Qingyao Shuai include ICCV 2023 work on unsupervised 3D object reconstruction and a 2024 large language model reasoning paper; Qingyao Shuai is also listed as an author on the Seed1.5-VL Technical Report.

Sihang Yuan is publicly listed as a coauthor on ByteDance Seed's Seed1.5-VL Technical Report and on DBLP-indexed 2025 multimodal research papers.

Tianheng Cheng is a computer vision researcher listed as a coauthor of the Seed1.5-VL Technical Report and on DBLP for prior vision publications including work on instance segmentation and open-vocabulary detection.

Public records link Xuehan Xiong to research in computer vision and video understanding, with DBLP-listed publications from 2010 to 2025 including work on face alignment, video recognition, video localization, and authorship on the Seed1.5-VL Technical Report.

Research Scientist at Moonshot AI whose public work focuses on large language models, multimodal models, and embodied AI; he previously earned a PhD from Zhejiang University and was a visiting student at Oxford.

Research scientist and writer behind Scientific Spaces whose public profile lists work on large language models and service on the Kimi team at Moonshot AI.

Research scientist in Alibaba DAMO Academy's Tongyi Lab working on multimodal learning, vision-language models, and embodied AI; author on the Qwen2-VL and Qwen2.5-VL technical reports.

Research scientist at Google DeepMind on the Gemini team, working on multimodal AI.

Researcher at DeepSeek and a first-year computer science PhD student at the University of Science and Technology of China; works on multimodal reasoning and world models; coauthor of Janus.

Researcher working on multimodal learning and vision-language systems, with public academic work on visual question answering and related topics.

Research scientist at DeepSeek and PhD student at the University of Illinois Urbana-Champaign working on multimodal foundation models, large language models, and embodied AI.

PhD student at the University of Hong Kong who worked as a research intern at Moonshot AI in 2025 and studies digital agents, computer-use agents, and multimodal intelligence.

Research scientist at Moonshot AI with public profiles covering large language models, diffusion models, and generative AI.

Computer science graduate from the University of Hong Kong who worked as a research intern at Moonshot AI on general-purpose computer-use agents.

Researcher at Moonshot AI with public homepage and GitHub profiles under the name Xixia Zhong.

Technical staff at Moonshot AI working on large language model reasoning, agents, and multimodal large models.

Research scientist at Google DeepMind based in Paris, focused on deep learning and computer vision.

Research scientist at Moonshot AI with public GitHub and Google Scholar profiles covering efficient inference and multimodal systems.

Research intern at DeepSeek and master's student at Tsinghua University working on large language models, multimodal models, and reinforcement learning.

Xiaohua Zhai is a researcher on the Google Research team in Zurich whose work focuses on large multimodal models and efficient deep learning.

AI researcher at Moonshot AI with a public homepage and Google Scholar profile spanning robust AI, computer vision, and multimodal systems.

Moonshot AI researcher working on large language models, coding agents, and multimodal safety; his public homepage also documents earlier study at Shanghai Jiao Tong University and Huazhong University of Science and Technology.

Researcher at Moonshot AI with a personal homepage and GitHub profile covering machine learning research.

Researcher at Moonshot AI focused on large language models, computational photography, and low-level computer vision; previously worked at Megvii and completed a PhD and postdoc at Tsinghua University.

Research scientist at Moonshot AI. Previously, he was a PhD student at Princeton and interned at Google and Amazon.

Research scientist at Moonshot AI working on multimodal AI agents, large multimodal models, video generation, speech, machine learning systems, and AI for science.

Mingxuan Wang is a researcher at ByteDance Seed. Public ByteDance Seed sources identify Wang Mingxuan as a Senior Researcher on the Doubao Seed Team, and official Seed publications list Mingxuan Wang as an author on reasoning and multimodal technical reports.

Yu Liu is publicly listed as a co-author on the 2025 Seed1.5-VL, MiniMax-01, and MiniMax-M1 technical reports.

Leonardo Beyer is a research scientist at Google DeepMind. His public homepage highlights work across representation learning, multimodal models, and large-scale machine learning systems.

Multimodal and omni-model engineer whose public profile lists Moonshot AI experience and Kimi-VL among recent projects.

Researcher at the University of Illinois Urbana-Champaign focused on vision-language models, multimodal large language models, and physical AI.

Research intern at DeepSeek and PhD student at Princeton University whose research interests include large language models and multimodal foundation models.

Research intern at DeepSeek and master's student at Renmin University of China working on multimodal large language models and AI agents.

Tongyi Lab researcher working on large language models, vision-language models, and reinforcement learning; public profiles connect Zheren Fu to the Qwen2-VL technical report.

Public evidence links Yue Ling to ByteDance research through an OpenReview profile that lists a researcher role, a confirmed @bytedance.com email, and expertise in large language models and VLMs. Yue Ling is also listed as an author of the Seed1.5-VL Technical Report on arXiv.

Machine learning researcher with a public homepage and GitHub profile covering AI research and engineering projects.

PhD student at The Chinese University of Hong Kong focused on multimodal reasoning, optical character recognition, and document parsing; coauthor of DeepSeek-VL.

Research scientist at the Allen Institute for AI working on multimodal large language models, embodied agents, and reasoning for robots and games.

Matt Deitke is a researcher at Ai2 whose public homepage and Google Scholar profile highlight work on multimodal learning, vision-language models, embodied AI, and open models.

Molly S. Lewis is an Assistant Professor of Psychology at Princeton University whose research examines how language is shaped by social and cultural structure.

Professor at the Technion and head of the CHIA Lab, with research spanning natural language processing, machine learning, and social-good applications.

Xiuye Gu is a researcher whose public work focuses on vision-language modeling and machine learning systems.

Research scientist at Kimi AI (Moonshot AI). Previously completed a PhD in computer science at the University of Wisconsin-Madison.

Member of Technical Staff at DeepSeek.

Research scientist at DeepSeek with public GitHub work on language models and AI systems.

Researcher affiliated with the Qwen team at Alibaba Group on Google Scholar and coauthor of the Qwen and Qwen3 technical reports.

Research scientist in Tongyi Lab whose official profile highlights work on efficient reinforcement learning, generalization, inference-time scaling, and reasoning for large language models.

Professor at the University of Washington whose public research spans NLP, machine learning, and semantic parsing; arXiv author results include OLMo and OLMo 2.

Guang Shi is publicly listed as a coauthor on ByteDance Seed papers including Seed1.5-VL Technical Report, Seed1.5-Thinking, Emerging Properties in Unified Multimodal Pretraining, and UI-TARS, supporting a conservative profile centered on vision-language models, multimodal pretraining, and GUI agents.

Jiaze Chen is publicly listed as a coauthor on ByteDance Seed's 2025 Seed-Thinking-v1.5 publication, the Seed1.5-VL technical report, and the Seed-Coder repository/report, indicating contributions across reasoning, vision-language, and code-focused foundation model work.

ByteDance Seed publicly identified Liang Xiang in December 2024 as Head of the Doubao Foundation Model Team. Public Seed and DBLP records also list him as a coauthor on work about LLM pre-training model merging and ByteDance LLM training infrastructure.

Shen Yan is publicly listed as a co-author on ByteDance Seed research publications including Seed1.5-VL, Seed1.5-Thinking, MME-CoT, and TC-MoE.

Public sources list Yanghua Peng as a coauthor on ByteDance Seed technical reports and on ByteDance engineering research artifacts related to distributed training and checkpointing for large models.

Public sources list Zeyu Wang as an author of ByteDance Seed's Seed1.5-VL Technical Report and the Seed publication "Emerging Properties in Unified Multimodal Pretraining" (BAGEL), and as a coauthor of the MiniMax-M1 technical report.

Maxwell Collins is a Research Scientist at Google DeepMind.

Researcher credited as a co-author on DeepSeek-V2, DeepSeek-V3, and Seed1.5-VL technical reports, and listed among the authors of the Nature paper on DeepSeek-R1.

Qwen researcher and co-lead whose work focuses on pretraining and post-training, multimodal models, agent systems, and large-scale model infrastructure.

Research scientist at Ai2 whose work focuses on natural language processing, semantic parsing, grounded language understanding, and question answering.

Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.

First-year PhD student at Shanghai Jiao Tong University focused on multimodal large language models, text-to-image generation, and image/video generation; coauthor of DeepSeek-VL2.

Yonggang Zhang is a researcher whose public OpenReview profile includes the DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding paper.

Chaorui Deng is publicly credited as an author on ByteDance Seed multimodal research publications, including the Seed1.5-VL technical report and the BAGEL paper on unified multimodal pretraining.

Jiahao Li is publicly listed as a coauthor on ByteDance Seed's Seed1.5-VL Technical Report and on the official ByteDance Seed publication page for UI-TARS, indicating work related to multimodal and GUI-agent research.

Public sources list Shihao Liang as a coauthor on ByteDance's UI-TARS project and the Seed1.5-VL technical report.

Research intern at DeepSeek and PhD student at Stanford University working on generative vision-language models, large language models, and large-scale training.

Research scientist at Moonshot AI focused on machine learning systems; public profiles note prior PhD study at the Max Planck Institute and Technical University of Munich.

Technical Staff at Moonshot AI.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

Technical staff member at Moonshot AI and machine learning researcher; public profiles note prior study at the Gaoling School of AI at Renmin University of China.

Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.

PhD student in Computer Science at the University of Hong Kong. His research interests include multimodal large language models and embodied AI, and he co-authored the Kimi-VL technical report.

Research Scientist at Moonshot AI.

Research scientist at Moonshot AI whose work focuses on large foundation models and multimodal models.

Independent researcher focused on multimodal learning, document intelligence, and efficient training; coauthor of Qwen2.5-VL and mPLUG-related vision-language systems.

Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.

Research scientist at Moonshot AI with public scholarly work on multimodal learning and computer vision.

Research scientist at Moonshot AI who previously studied at Tsinghua University and works on large foundation models.

Generative AI researcher at Moonshot AI with public work spanning computational imaging and AI systems.

Research scientist in Tongyi Lab and contributor to Qwen2-VL, with public work on multimodal large language models.

Researcher at Moonshot AI with public GitHub and scholarly profiles covering machine learning and AI systems.

Technical Staff at Moonshot AI.

Research scientist at Moonshot AI with public work on computer vision and multimodal models.

Research Scientist at Moonshot AI.

Technical Staff at Moonshot AI.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

Researcher at Google Research whose public work includes multimodal and vision-language modeling, with arXiv publications tied to PaliGemma and related transfer work.

Research scientist in Alibaba DAMO Academy's Tongyi Lab working on machine learning, computer vision, and multimodal large language models; author on the Qwen2-VL and Qwen2.5-VL technical reports.

Research scientist at Moonshot AI with public work on language models and reasoning.

PhD student at the University of Science and Technology of China focused on machine learning and multimodal understanding and generation; coauthor of Janus.

Research scientist at Moonshot AI and PhD student at Shanghai Jiao Tong University whose homepage highlights multimodal understanding, generation, large language models, and agents.

Research scientist at ByteDance Seed focused on multimodal representation learning, self-supervised learning, and diffusion models; coauthor of Janus and JanusFlow.

Research scientist in Tongyi Lab and maintainer of Qwen-VL, with public work on vision-language models.

Research Scientist at Moonshot AI.

Research scientist at Moonshot AI and the University of Wisconsin-Madison with public work on large language models and reasoning.

Research scientist at Moonshot AI whose public GitHub profile highlights work on multimodal large language models and agents.

Research Scientist at Moonshot AI.

Research scientist at Moonshot AI with public scholarly work on multimodal and long-context model research.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

PhD student at Shanghai Jiao Tong University working on multimodal large language models and image understanding and generation; coauthor of Janus.

Technical Staff at Moonshot AI.

Xuejing Liu is a researcher whose public OpenReview profile includes the Qwen2-VL and Qwen2.5-VL technical report papers.

Co-founder and chief executive officer of Moonshot AI.

Researcher at Moonshot AI with a public GitHub profile and work spanning machine learning systems.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

Research scientist at Moonshot AI whose public profile highlights work on multimodal generation, multimodal large language models, and efficient LLMs.

Research scientist at Moonshot AI with public GitHub projects spanning language models and multimodal systems.

Researcher at Moonshot AI with a public GitHub profile covering AI systems work.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

Researcher affiliated with Alibaba Group on Google Scholar and coauthor of the Qwen technical report.

Researcher and co-author of the Kimi-VL Technical Report and Kimi K2.5: Visual Agentic Intelligence.

PhD student at The University of Hong Kong focused on large multimodal models and data-centric AI, especially multimodal understanding and generation; coauthor of Janus.

Researcher at Moonshot AI with public GitHub projects spanning AI systems.

Research assistant at The University of Hong Kong focused on multimodal reasoning and generation, large language models, and embodied AI; coauthor of Janus.

Research scientist at Moonshot AI with public scholarly work on multimodal learning and generative models.

Co-founder and CTO of Moonshot AI, and co-author of the Kimi-VL and Kimi K2.5 technical reports.

PhD student at The University of Hong Kong focused on multimodal large language models, image and video understanding, generation, and editing; coauthor of Janus.

Research scientist at Moonshot AI with public scholarly work on language and multimodal models.

Research scientist at Moonshot AI with public work on multimodal learning and language models.

Xiaohan Ding is publicly credited as a coauthor on the Seed1.5-VL Technical Report and InternVL3 technical report, both focused on multimodal and vision-language models.

Publicly listed as a coauthor on the Seed1.5-VL Technical Report and the 2025 CoRR paper OVERLORD on scaling data loading for large foundation model training.

Alexander Kolesnikov is a Research Scientist at Google DeepMind exploring multimodal general intelligence.

Research scientist at Ai2 focused on multimodal machine learning, vision-language models, and understanding human-centered image variation.

Andrea Dafoe is a senior research scientist at Google DeepMind whose work focuses on frontier AI risks, international governance, and the societal impacts of advanced AI.

Senior research scientist at Google DeepMind.

Researcher and co-author of the Kimi-VL Technical Report.

Senior director of embodied AI at Ai2 and professor at the University of Washington working in robotics, computer vision, and machine learning.

Research scientist at Moonshot AI with public publications on multimodal learning and efficient large-model systems.

Research Scientist at Moonshot AI.

Research scientist in Tongyi Lab and a major contributor to Qwen2-VL, with public work on multimodal foundation models.

Research Scientist at Moonshot AI.

Research scientist at Moonshot AI working on multimodal large models and point-cloud perception and generation.

Research scientist at Moonshot AI with a public homepage covering prior academic work and research projects.

Research scientist at Google DeepMind working on deep learning, reinforcement learning, self-supervised learning, and robotics.

Rohit Saxena is a Research Scientist at Google DeepMind working on visual perception, multimodal learning, and language understanding.

PhD student at Stanford and research scientist at the Allen Institute for AI working on robotics, multimodal models, and embodied AI.

Researcher affiliated with Moonshot AI on Google Scholar and coauthor of the Kimi-VL technical report.

PhD student at The University of Hong Kong focused on large multimodal models, image and video generation, and multimodal understanding; coauthor of Janus.

Researcher at Moonshot AI. Public profile notes prior PhD study in computer science at the Chinese University of Hong Kong.

Staff software engineer at Google DeepMind working on post-training, alignment, multimodal models, and data filtering. He previously worked on hardware and software co-design for machine learning.

Researcher and co-author of the Kimi-VL Technical Report.

Research intern at NUS and Nanjing University working on machine learning and multimodal large language models; coauthor of DeepSeek-VL2.

Researcher working on multimodal and vision-language models, including DeepSeek-VL2 and related model optimization work.

Research Scientist at Moonshot AI.

Research scientist in Tongyi Lab and major contributor to Qwen2.5-VL, with public work on multimodal large language models.

Research intern in Tongyi Lab whose public profile highlights work on multimodal large language models and video understanding.

Research scientist in Tongyi Lab and technical lead of Qwen2-VL, with public work on vision-language models.

PhD student at Nanjing University and research intern at Alibaba Tongyi Lab working on multimodal large language models and visual understanding; coauthor of Qwen2.5-VL.

Zihan Liu is a research scientist at DeepSeek. His public homepage highlights work in multimodal learning, vision-language models, and large-scale machine learning.

Technical Staff at Moonshot AI.

Radu Soricut

Jifeng Dai

Sifan Zhou

Junyang Lin

Haoyu Lu

Renjie Zheng

Yuwen Xiong

Yiheng Xu

Feida Zhu

Can Huang

Weihao Yu

Hao Zhang

Jianhui Duan

Xinxing Zu

Rui Qian

Yujia Qin

Xingcheng Yao

Yifan Du

Chenwei Lou

Jianhua Zhu

Chenhui Gou

Tenglong Ao

Zilong Huang

Bencheng Liao

Joya Chen

Renrui Zhang

Rui Yang

Yining Ye

Yurui Ren

Zewei Sun

Yufeng Yuan

Xuefeng Xiao

Junting Lu

Xinchen Zhang

Shuai Bai

Jianyu Jiang

Yanwei Li

Kai Hua

Bairen Yi

Runxin Xu

Hao Yang

Chunyuan Li

Haoqi Fan

Lin Yan

Kunchang Li

Yunhao Fang

Zhengzhuo Xu

Hongxiang Hao

Jiabo Ye

Chenggang Li

Yulun Du

Heng Ji

Flood Sung

Wei Ding

Congcong Wang

Huabin Zheng

Jun Tang

Junzhe Pan

Tao Yu

Xiaoqian Shen

Yue Cao

Zesen Cheng

Andrea Steiner

Jiahao Liu

Pengyu Cheng

Sangho Lee

Wei Xiong

Xi Zhang

Yuxuan Ren

Haibin Lin

Chengzhi Wei

Daya Guo

Jiahui Yu

Noah A. Smith

Huazuo Gao

Jihao Liu

Kaiyuan Zhang

Zhipeng Chen

Zhuofan Zheng

Zihao Huang