Multimodal Models | Field

Radu Soricut is a Distinguished Scientist at Google DeepMind working on natural language processing and machine learning, with earlier Google Research and Google Translate work.

Xinyun Chen's homepage identifies her as an AI research scientist at Meta Superintelligence Labs, previously a staff research scientist at Google DeepMind. It also lists a PhD in Computer Science from UC Berkeley and a BS in Computer Science from Shanghai Jiao Tong University.

DeepMind researcher working on machine learning, computer vision, and structured learning from video and language.

Research Principal at Meta Superintelligence Labs. He previously led the strategic explorations team at OpenAI and is known for foundational work on score-based diffusion models.

David Dohan is a computer scientist at OpenAI studying scalable alignment of language models and generally intelligent reasoning systems. His personal site also notes prior work at Google Brain on foundation model programs, code generation, protein engineering, and scientific reasoning.

Chuanqi Tan's homepage says he received a PhD from Tsinghua University in July 2019, is currently focused on LLM research and applications, and is also a postdoctoral fellow at the University of Hong Kong.

Kevin Robinson is a research engineer at Google Research working on evaluations of language models and NLP systems. His Google Research profile says he previously worked as a special education teacher, a software engineer building visualization and analytics systems, and a researcher in K12 computer science education.

Jiahui Yu is a Research Lead at OpenAI leading the Perception team. His homepage notes prior co-leadership on Gemini Multimodal at Google DeepMind and work on deep learning and high-performance computing.

OpenAI's GPT-4 contributions page credits Ben Wang as attention architecture lead for long context. Public profiles identify him as a University of Pennsylvania undergraduate and an OpenAI researcher from 2021 to 2022.

Google researcher and founding member of the Gemini core team. Public pages reviewed say he previously oversaw Algorithms and Reasoning teams at OpenAI and earlier founded Kemvi, which was acquired by HubSpot.

Clément Farabet's homepage says he is building AI at Google DeepMind. It also describes prior leadership in AI infrastructure at NVIDIA and earlier deep learning platform work at Twitter.

Jingren Zhou is Chief Technology Officer of Alibaba Cloud. Public speaker biographies describe him as a computer scientist and entrepreneur whose work includes large-scale AI and cloud systems.

Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.

Aakanksha Chowdhery is a machine learning researcher based in New York City. She works on large-scale machine learning across pre-training, post-training, inference, and system efficiency, and is known for contributions such as PaLM, Pathways, and Gemini.

Yuntian Deng is a machine learning researcher whose public work spans language modeling, reasoning, and large multimodal systems.

Yale Song is an assistant professor in artificial intelligence at Yonsei University and is also affiliated with the Stanford AI Lab while working part-time with Adobe Research.

Furu Wei is a Distinguished Scientist and Chief Scientist of Microsoft Research Asia, listed on Microsoft Research and connected in LLMpeople to Microsoft technical reports including Kosmos, VALL-E, BitNet, and Multilingual E5.

Research scientist and engineer focused on machine learning, computer vision, and natural language processing.

Research scientist working on large language models, reasoning, agents, and reinforcement learning.

Li Dong is a Microsoft Research principal researcher focused on human language technologies and machine intelligence.

Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.

Shaohan Huang is a senior researcher in the General Artificial Intelligence Group at Microsoft Research Asia in Beijing. OpenReview lists him as a Microsoft researcher and a former master's student at Beihang University.

Research scientist at Google DeepMind working on post-training, large language model evaluation, and multimodal alignment.

Research Scientist at Google DeepMind in London working on large multimodal models, evaluation, agents, and computer vision; he completed a PhD at the University of Tuebingen and MPI for Intelligent Systems.

Olivier Bachem is a director and research scientist at Google DeepMind working on reinforcement learning from human feedback, language model post-training, and machine learning at scale. He earned his PhD at ETH Zurich, where he studied coresets and sampling methods for large-scale machine learning.

Alibaba Qwen report author whose DBLP profile identifies an Alibaba Group affiliation and Qwen technical report authorship.

Google researcher whose official profile says he joined Google in September 2008 and has been with Google Brain since January 2015, with research interests spanning information retrieval, machine learning, machine translation, and natural language processing.

Co-author of the BitNet b1.58 2B4T Technical Report; the paper's author note states that S. Ma is with Microsoft Research.

Chengzheng Xu is a research scientist at Google DeepMind whose public homepage highlights work on vision-language models, multimodal learning, and efficient large-scale machine learning.

Computer scientist and reinforcement learning researcher, Professor at University College London, and former Principal Research Scientist at DeepMind.

Senior Staff Research Scientist at Google DeepMind working on machine learning, with a focus on efficient inference and training algorithms for large language and vision-language models.

Research Scientist at Google DeepMind focused on agents, memory, and reasoning; completed a PhD at Stanford advised by Percy Liang.

Qazi Irfan is a research scientist at Google DeepMind. His public homepage highlights work spanning multimodal learning, visual reasoning, and efficient large-scale machine learning.

Rishabh Singh is a research scientist at Google DeepMind working on human-centered AI, programming systems, and AI for software and problem solving. His work spans program synthesis, code intelligence, education, and interactive AI systems.

Founding engineer at Anysphere, previously at Google Brain, UC Berkeley, and Scale AI, interested in machine learning, statistics, and systems.

Professor at ISTA working on cryptography and machine learning, with interests including privacy-preserving machine learning, large language models, and algorithmic fairness.

Zhifeng Chen's public homepage describes him as a distinguished software engineer at Google Brain focused on large-scale computer systems and machine learning applications.

Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.

Member of Technical Staff at Google DeepMind working on machine learning, natural language processing, and large language models.

VP of Research at Google DeepMind working on robotics and embodied intelligence, with expertise in machine learning, reinforcement learning, neuroscience, and computer vision.

Associate professor at the University of Virginia and Qwen contributor whose research focuses on personalization and recommender systems, online advertising, and AI systems.

Distinguished Scientist at Google Research and one of the inventors of the transformer architecture; his work also includes language models, speech recognition, and multi-agent reinforcement learning.

Researcher whose public homepage focuses on computer vision, multimodal foundation models, and embodied AI; publication context connects Shilong Liu to the Qwen2.5-Omni technical report.

Research director at Google working on music AI, multimodal generation, and human-AI interaction. He co-founded the Magenta project and has led widely used work on music generation with neural networks.

Jianwei Niu is a tenure-track research assistant professor in the School of Data Science at Lingnan University, Hong Kong. His research focuses on multimodal learning, computer vision, and embodied AI.

Public report authorship links Brennan Saeta to the Gemma 2: Improving Open Language Models at a Practical Size at Google.

Public report authorship links Jason Wei to the Gemma 3n Technical Report at Google.

Qinyu Chen is listed as an author of the DeepSeek technical report DeepSeek-V3 Technical Report.

Google Gemini report author listed on Gemini, Gemini 1.5, RecurrentGemma, and CodeGemma technical reports, with report-backed work on multimodal models, long-context models, efficient architectures, and code models.

Z.ai report author listed on GLM-Z1, GLM-4.5, GLM-4.1V/4.5V, and GLM-5 materials, with report-backed work on reasoning, coding, agentic, and multimodal models.

Z.ai report author listed on GLM-Z1, GLM-4.5, GLM-4.1V/4.5V, and GLM-5 materials, with report-backed work on reasoning, agentic, and multimodal models.

Aidan Clark's OpenReview profile shows publications on compute-optimal large language model training and high-fidelity speech synthesis. The profile also lists undergraduate studies at the University of California, Berkeley.

Research scientist at Google DeepMind interested in large language models and mathematical reasoning. He earned a Ph.D. in mathematics from Columbia University.

Carrie Cai is a machine learning researcher with interests in generative modeling, reinforcement learning, and deep learning theory.

Professor of Linguistics and, by courtesy, Computer Science at Stanford University whose research spans natural language semantics, pragmatics, and AI; he directs CSLI.

Johan Schalkwyk is a speech and language researcher whose public profile highlights work on speech recognition, multilingual systems, conversational AI, and large language models.

Research scientist at Google DeepMind and PhD student at Imperial College London. His public site highlights interests in deep learning, reinforcement learning, and multimodal models, with work spanning Gemini, large-scale reinforcement learning, and self-driving.

Research scientist at Google DeepMind working on multimodal large language models. He completed a PhD at Tsinghua University and was a visiting PhD student at UC San Diego.

Yun-Hsuan Sung is a machine learning researcher focused on multimodal learning, robotics, and representation learning.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Research scientist at Google DeepMind.

Aäron van den Oord is a Google DeepMind researcher known for generative and sequence-model research.

HyoukJoong Lee is a research scientist at Google DeepMind. His public work includes long-context and multimodal model research, including Gemini 1.5 and Gemini Diffusion.

Z.ai researcher focused on multimodal large language models and computer vision, with interests in large-model training and post-training.

Researcher whose public Google Scholar profile lists Google Research affiliation and publications on multimodal, generative, and reasoning-focused models.

Research scientist at Google DeepMind working on efficient, adaptive systems that learn on the job and collaborate with people. She completed a Ph.D. in machine learning at Mila and Universite de Montreal.

Staff research scientist at Google DeepMind and associate professor at Purdue University working on sequential decision making, machine learning, and algorithms.

Research scientist at Google DeepMind.

Professor of computing science at the University of Alberta and Canada CIFAR AI Chair with public work on reinforcement learning, optimization, and scalable machine learning.

Eugene N. Ie is a Google DeepMind researcher with public work on machine learning and multimodal language models.

Research scientist at Google DeepMind working on machine learning and large-scale multimodal models.

Tony G. Cai is a researcher at Google DeepMind and a computer science PhD student at Columbia University. His public research interests include large language models, reinforcement learning, optimization, and robotics.

Junnan Li is a report-backed author in the LLMpeople atlas, connected through 3 technical reports.

Co-author of "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"; the paper's author notes list Wenhui Wang with Microsoft Research.

Xi Chen is listed as an author of the Z.ai technical report GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

Researcher at Moonshot AI and co-author of the Kimi K2.5 report on visual agentic intelligence.

Co-author of GLM-4.1V-Thinking and GLM-4.5V, multimodal reasoning models trained with scalable reinforcement learning.

Research scientist at Google DeepMind in Mountain View working on machine learning, reinforcement learning, and robotics.

Founder and CEO of Google DeepMind, leading AI research and product development; his work spans AI, neuroscience, game playing, and structural biology.

Research Director at Google working on machine learning, production systems, and sociotechnical AI.

Chief Scientist at Google DeepMind and Vice President of Research leading Gemini, with work spanning scalable sequence learning, large language models, games, and robotics.

Rohan Anil is a research scientist at Google DeepMind. His public homepage highlights work on large language models, efficient machine learning systems, and multimodal AI.

Research scientist at Google DeepMind in London working on agentic reasoning, efficient inference, and large-scale post-training, with a background in high-dimensional statistics and theory.

Senior Staff Research Scientist at Google DeepMind and CTO of the Gemini app, with work spanning speech, language, vision, and large-scale AI systems.

Vice President of Engineering and Research at Google and site lead for the Google Center in Israel; he also leads Search, Research, and AI for Crisis Response.

Dongxu Li is a report-backed author in the LLMpeople atlas, connected through 2 technical reports.

Steven Hoi is a report-backed author in the LLMpeople atlas, connected through 2 technical reports.

Yaru Hao is a report-backed author in the LLMpeople atlas, connected through 2 technical reports.

Public report authorship links Zhang Zhang to the GLM-5: Thinking, Coding, and Agentic Intelligence at Z.ai.

Zhiliang Peng is a report-backed author in the LLMpeople atlas, connected through 2 technical reports.

Researcher at BIGAI and coauthor of the Emu3: Next-Token Prediction is All You Need.

Senior staff research scientist at Google working on algorithms for decision making under uncertainty and online learning.

Professor of Computer Science at the Hebrew University of Jerusalem and Visiting Faculty Researcher at Google, with work spanning algorithms, algorithmic economics, and AI-related decision systems.

Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.

Research scientist at Z.ai focused on multimodal understanding and generation, reinforcement learning, AI agents, and end-to-end models. He received a bachelor's degree from Tsinghua University and a master's degree from Peking University.

James Manyika is a Google leader whose public work focuses on research, technology, and society.

Linjie Li is a research scientist at Alibaba Group and a contributor to the Qwen2.5-Omni Technical Report.

Senior Staff Research Scientist at Google DeepMind working on language modeling, speech recognition, machine translation, and multimodal understanding.

Researcher at Google Research whose public work includes multimodal and vision-language modeling, with arXiv publications tied to PaliGemma and related transfer work.

Second-year PhD student at Peking University focused on audio-language foundation models, trustworthy AI, and embodied AI; coauthor of Qwen2-Audio.

VP at Google DeepMind working on deep learning, computer vision, and language understanding.

Research engineer at Alibaba Group working on audio and multimodal foundation models, multimodal RL, and speech processing; coauthor of Qwen2.5-Omni.

Machine learning engineer and researcher interested in large language models and multimodal audio-language systems; coauthor of Qwen2-Audio.

Research scientist at Z.ai with research interests in multimodal understanding and generation, large language models, and reinforcement learning. He received a bachelor's degree from the University of Science and Technology of China and a master's degree from Tsinghua University.

VP of Research at Google DeepMind and Professor of Information Engineering at the University of Cambridge, known for work in probabilistic machine learning and Bayesian statistics.

Alexander Vladymyrov is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Alex Beutel is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Anthony Meng Huat Tiong is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Astra Sharma is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Boyang Li is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Chen Li is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Gabriel Murphy is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Geng Ji is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Jason Choi is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Jiaxuan Wang is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Jinguo Zhu is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Junqi Zhao is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Kate Lee is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Kun Yi is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Limin Wu is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Lin Song is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Lisa Luu is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Mandy Guo is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Pascale Fung is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Rhomni St. John is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Robin Lester is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Shibo Wang is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Sijie Zhao is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Weisheng Wang is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Wenhu Chen is a report-backed author in the LLMpeople atlas, connected through Kosmos-G: Generating Images in Context with Multimodal Large Language Models.

Wenjing Li is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Wenliang Dai is a report-backed author in the LLMpeople atlas, connected through InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

Xiaohan Ding is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Xichen Pan is a report-backed author in the LLMpeople atlas, connected through Kosmos-G: Generating Images in Context with Multimodal Large Language Models.

Yilin Wu is listed as an author of the Google technical report Gemini: A Family of Highly Capable Multimodal Models.

Ying Shan is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Yixiao Ge is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Yu Sun is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Yuying Ge is a report-backed author in the LLMpeople atlas, connected through SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

Zhenkai Zhu is listed as an author of the Google technical report Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Research scientist at Google DeepMind working on the theoretical foundations of machine learning.

Adrian Ibarz is a Google DeepMind researcher whose public work spans machine learning, reasoning, and large multimodal models.

Research scientist and founder of rabinovich.ai, with work spanning multimodal generative models, visual perception, and immersive experiences.

Anmol Kalra is a research scientist at Google DeepMind. His public homepage presents his work and publications in machine learning and AI systems.

Research scientist at Google DeepMind working on large language models and natural language processing.

Software engineer at Google DeepMind and PhD student at Cornell working on machine learning for social impact, with interests in LLMs, generative models, and optimization.

Chris McLeavey is a research scientist at Google DeepMind working on generalist multimodal models at the intersection of language and vision.

Research scientist at Google DeepMind working on large language models and multimodal models. He earned a PhD in computer science from Stanford University.

Assistant Professor in the Department of Computer Science and Technology at the University of Cambridge, with research focused on making AI systems safer, more efficient, and more robust.

Research scientist at Google DeepMind focused on human-computer interaction, accessibility, and interfaces for AI systems.

Research scientist at Google DeepMind working on multimodal language models and long-context machine learning systems.

Jianfei Chen is an assistant professor at Monash University. His research spans computer vision, machine learning, multimodality, and trustworthy AI.

Director of Product Management at Google DeepMind leading ML and AI platforms, model developer experiences, and workflows that power the Gemini app and API.

Principal statistician at Google DeepMind whose work spans causal inference, statistics, and machine learning.

Distinguished professor emeritus of electrical engineering, computer science, and mathematics at UC Berkeley. His research focuses on numerical linear algebra, parallel computing, and communication-avoiding algorithms.

Julien Perolat is a research scientist at Google DeepMind whose public homepage highlights work on game theory, multi-agent learning, reinforcement learning, and responsible AI.

Research scientist at Google DeepMind working on large-scale multimodal models.

Google researcher whose publications include the Gemini technical report.

Researcher working on multimodal foundation models, including Qwen3-Omni and related speech-language systems.

Postdoctoral researcher at UC Berkeley and Berkeley AI Research interested in natural language processing, machine learning, and human-computer interaction.

Research scientist in Tongyi Lab and technical lead of Qwen2.5-Omni, with public work on end-to-end speech understanding and generation.

Mahesh Shanmugam is a research scientist at Google DeepMind whose public homepage highlights work on multimodal representation learning, self-supervised learning, and generative models.

Mark Bosma is a senior research scientist at Google DeepMind. His public homepage highlights work in machine learning, reinforcement learning, and neural networks.

Research scientist at Google DeepMind in Switzerland working on large multimodal models and generative AI.

Researcher and engineer focused on machine learning, distributed systems, and applied algorithms; his personal site also highlights interests in psychology, neuroscience, and evolutionary biology.

Research scientist at Google DeepMind and PhD student at Stanford University. His homepage highlights work on machine learning, reinforcement learning, language models, and recommendation systems.

Assistant Professor of Computer Science at the University of Southern California and incoming part-time Visiting Faculty Researcher at Google DeepMind; her research combines linguistic structure and machine learning for natural language processing.

Research scientist at Google DeepMind working on large language models, multimodal language models, and computer vision, according to his public OpenReview profile.

MohammadHassan Moghimi is a senior staff software engineer at Google DeepMind whose work focuses on multimodal models for vision and natural language, including parameter-efficient tuning, adaptation, and evaluation.

Research scientist at Google DeepMind working on multimodal machine learning, reinforcement learning, and mathematical optimization.

Research scientist at Google DeepMind in London. He completed a PhD at the University of Oxford, where his work focused on natural language processing and computational argumentation.

Ravi Seethapathy is a research scientist at Google DeepMind. His public homepage presents work at the intersection of machine learning, science, and large-scale AI systems.

Senior research scientist at Google DeepMind with public work on machine learning evaluation, uncertainty, and reliability.

Research scientist at Google DeepMind working on large language models and multimodal models. He completed a PhD in computer vision and machine learning at the University of Oxford.

Research scientist at Google DeepMind working on multimodal, multilingual, and efficient machine learning.

Research scientist at Google DeepMind in New York working on vision-language and multimodal large language models. He is completing a PhD in computer science at Carnegie Mellon University.

Research scientist at Google DeepMind in Mountain View working on large multimodal foundation models and agents. He received a PhD from the Chinese University of Hong Kong.

Radu Soricut

Xinyun Chen

Jean-Baptiste Alayrac

Yang Song

David Dohan

Chuanqi Tan

Kevin Robinson

Jiahui Yu

Ben Wang

Vedant Misra

Clement Farabet

Jingren Zhou

Jiabo Ye

Aakanksha Chowdhery

Yuntian Deng

Yale Song

Furu Wei

Mohammad Norouzi

Yuhuai Wu

Li Dong

Mingkun Yang

Shaohan Huang

Yueting Wang

Matthias Minderer

Olivier Bachem

Peng Wang

Yonghui Wu

Shuming Ma

Chengzheng Xu

David Silver

Hanie Sedghi

Kelvin Guu

Qazi Irfan

Rishabh Singh

Will Isaacs

Vladislav Kolesnikov

Zhifeng Chen

Koray Kavukcuoglu

Yifeng Lu

Raia Hadsell

Hongning Wang

Noam Shazeer

Shilong Liu

Douglas Eck

Jianwei Niu

Brennan Saeta

Jason Wei

Qinyu Chen

Donald W. McFadden

Yipeng Wang

Zihan Jiang

Aidan Clark

Aitor Lewkowycz

Carrie Cai

Christopher Potts

Johan Schalkwyk

Sharat Muralidharan

Yiang Gu

Yun-Hsuan Sung

Yuanzhi Zhu

Sebastian Faust

Aäron van den Oord

HyoukJoong Lee

Ling Chen

Yuan Cao

Amelie Royer

Branislav Kveton

C. Le Lan

Dale Schuurmans

Eugene N. Ie

Tiago Cai

Tony G. Cai

Junnan Li

Wenhui Wang

Xi Chen

Yichi Zhang

Yuxuan Hu

Andrew M. Dai

Demis Hassabis

D. Sculley