Atlas / Fields / Detail
Multimodal Models
Researchers connected to this field in the public atlas.
Radu Soricut
Google Gemini
Research scientist focused on machine learning and natural language understanding, with work spanning machine translation, semantic parsing, and large-scale language modeling.
Jiabo Ye
Alibaba Qwen
Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.
Aakanksha Chowdhery
Google Gemini
Aakanksha Chowdhery is a machine learning researcher based in New York City. She works on large-scale machine learning across pre-training, post-training, inference, and system efficiency, and is known for contributions such as PaLM, Pathways, and Gemini.
Yuntian Deng
Google Gemini
Yuntian Deng is a machine learning researcher whose public work spans language modeling, reasoning, and large multimodal systems.
Yale Song
Alibaba Qwen
Yale Song is an assistant professor in artificial intelligence at Yonsei University and is also affiliated with the Stanford AI Lab while working part-time with Adobe Research.
Chuanqi Tan
Alibaba Qwen / Z.ai
Chuanqi Tan works on LLM research and applications. His personal homepage says he earned a PhD from Tsinghua University in 2019 and is also a postdoctoral fellow at The University of Hong Kong.
Mohammad Norouzi
Google Gemini
Research scientist and engineer focused on machine learning, computer vision, and natural language processing.
Yuhuai Wu
Google Gemini
Research scientist working on large language models, reasoning, agents, and reinforcement learning.
Jiahui Yu
Google Gemini
Jiahui Yu is a research scientist at Google DeepMind working on multimodal learning and large language models.
Xinyun Chen
Google Gemini / Meta AI
Xinyun Chen is a machine learning researcher at Meta FAIR. Her work spans large language models, code generation, mathematical reasoning, and efficient inference.
Mingkun Yang
Alibaba Qwen
Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.
Yueting Wang
Google Gemini
Research scientist at Google DeepMind working on post-training, large language model evaluation, and multimodal alignment.
Olivier Bachem
Google Gemini
Olivier Bachem is a director and research scientist at Google DeepMind working on reinforcement learning from human feedback, language model post-training, and machine learning at scale. He earned his PhD at ETH Zurich, where he studied coresets and sampling methods for large-scale machine learning.
David Silver
Google Gemini
Computer scientist and reinforcement learning researcher, Professor at University College London, and former Principal Research Scientist at DeepMind.
Hanie Sedghi
Google Gemini
Senior Staff Research Scientist at Google DeepMind working on machine learning, with a focus on efficient inference and training algorithms for large language and vision-language models.
Rishabh Singh
Google Gemini
Rishabh Singh is a research scientist at Google DeepMind working on human-centered AI, programming systems, and AI for software and problem solving. His work spans program synthesis, code intelligence, education, and interactive AI systems.
Will Isaacs
Google Gemini
Founding engineer at Anysphere, previously at Google Brain, UC Berkeley, and Scale AI, interested in machine learning, statistics, and systems.
Vladislav Kolesnikov
Google Gemini
Professor at ISTA working on cryptography and machine learning, with interests including privacy-preserving machine learning, large language models, and algorithmic fairness.
Jean-Baptiste Alayrac
Google Gemini / Meta AI
Research scientist at Meta working on multimodal machine learning and AI. Previously worked on multimodal learning at Google DeepMind and earned a PhD in computer vision and machine learning from Ecole des Ponts ParisTech.
Koray Kavukcuoglu
Google Gemini
Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.
Yifeng Lu
Google Gemini
Member of Technical Staff at Google DeepMind working on machine learning, natural language processing, and large language models.
Raia Hadsell
Google Gemini
VP of Research at Google DeepMind working on robotics and embodied intelligence, with expertise in machine learning, reinforcement learning, neuroscience, and computer vision.
Yonghui Wu
ByteDance Seed / Google Gemini
Google researcher whose public profile says he joined Google in September 2008 and has been with the Google Brain team since January 2015, with interests spanning information retrieval, learning to rank, machine learning, machine translation, and natural language processing.
Hongning Wang
Alibaba Qwen
Associate professor at the University of Virginia and Qwen contributor whose research focuses on personalization and recommender systems, online advertising, and AI systems.
Noam Shazeer
Google Gemini
Distinguished Scientist at Google Research and one of the inventors of the transformer architecture; his work also includes language models, speech recognition, and multi-agent reinforcement learning.
Shilong Liu
Alibaba Qwen
Researcher whose public homepage focuses on computer vision, multimodal foundation models, and embodied AI; publication context connects Shilong Liu to the Qwen2.5-Omni technical report.
Douglas Eck
Google Gemini
Research director at Google working on music AI, multimodal generation, and human-AI interaction. He co-founded the Magenta project and has led widely used work on music generation with neural networks.
Jianwei Niu
Alibaba Qwen
Jianwei Niu is a tenure-track research assistant professor in the School of Data Science at Lingnan University, Hong Kong. His research focuses on multimodal learning, computer vision, and embodied AI.
Aitor Lewkowycz
Google Gemini
Research scientist at Google DeepMind interested in large language models and mathematical reasoning. He earned a Ph.D. in mathematics from Columbia University.
Jingren Zhou
Moonshot AI / Alibaba Qwen
Alibaba senior technology leader and researcher associated with Qwen. Public profiles list him with Alibaba Group, and official Alibaba Cloud coverage identifies him as a chief technology officer leading large-model work.
Carrie Cai
Google Gemini
Carrie Cai is a machine learning researcher with interests in generative modeling, reinforcement learning, and deep learning theory.
Christopher Potts
Google Gemini
Professor of Linguistics and, by courtesy, Computer Science at Stanford University whose research spans natural language semantics, pragmatics, and AI; he directs CSLI.
Johan Schalkwyk
Google Gemini
Johan Schalkwyk is a speech and language researcher whose public profile highlights work on speech recognition, multilingual systems, conversational AI, and large language models.
Sharat Muralidharan
Google Gemini
Research scientist at Google DeepMind and PhD student at Imperial College London. His public site highlights interests in deep learning, reinforcement learning, and multimodal models, with work spanning Gemini, large-scale reinforcement learning, and self-driving.
Yiang Gu
Google Gemini
Research scientist at Google DeepMind working on multimodal large language models. He completed a PhD at Tsinghua University and was a visiting PhD student at UC San Diego.
Yun-Hsuan Sung
Google Gemini
Yun-Hsuan Sung is a machine learning researcher focused on multimodal learning, robotics, and representation learning.
Zhifeng Chen
Google Gemini / Z.ai
Distinguished software engineer at Google Brain focused on large-scale computer systems and machine learning applications.
Sebastian Faust
Google Gemini
Research scientist at Google DeepMind.
Yuanzhi Zhu
Alibaba Qwen
Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.
Matthias Minderer
Google Gemini
Research Scientist at Google DeepMind in London working on large multimodal models, evaluation, agents, and computer vision; he completed a PhD at the University of Tuebingen and MPI for Intelligent Systems.
Aäron van den Oord
Google Gemini
Aäron van den Oord is a Google DeepMind researcher known for generative and sequence-model research.
HyoukJoong Lee
Google Gemini
HyoukJoong Lee is a research scientist at Google DeepMind. His public work includes long-context and multimodal model research, including Gemini 1.5 and Gemini Diffusion.
Ling Chen
Z.ai
Z.ai researcher focused on multimodal large language models and computer vision, with interests in large-model training and post-training.
Chengzheng Xu
Google Gemini
Chengzheng Xu is a research scientist at Google DeepMind whose public homepage highlights work on vision-language models, multimodal learning, and efficient large-scale machine learning.
Kelvin Guu
Google Gemini
Research Scientist at Google DeepMind focused on agents, memory, and reasoning; completed a PhD at Stanford advised by Percy Liang.
Qazi Irfan
Google Gemini
Qazi Irfan is a research scientist at Google DeepMind. His public homepage highlights work spanning multimodal learning, visual reasoning, and efficient large-scale machine learning.
Vedant Misra
Google Gemini
Research engineer focused on frontier multimodal AI systems; a founding member of Google's Gemini core team who previously helped start OpenAI's multimodal team.
Alexander Vladymyrov
Google Gemini
Senior Staff Research Scientist at Google DeepMind working on AI for science, imaging, geometry processing, and differentiable simulation; previously at Adobe and NVIDIA.
Alex Beutel
Google Gemini
Research scientist at Google with public work on machine learning systems, recommendation, fairness, and safety.
Amelie Royer
Google Gemini
Research scientist at Google DeepMind working on efficient, adaptive systems that learn on the job and collaborate with people. She completed a Ph.D. in machine learning at Mila and Universite de Montreal.
Branislav Kveton
Google Gemini
Staff research scientist at Google DeepMind and associate professor at Purdue University working on sequential decision making, machine learning, and algorithms.
C. Le Lan
Google Gemini
Research scientist at Google DeepMind.
Clement Farabet
Google Gemini
Research scientist at Google DeepMind whose public profile also lists prior AI infrastructure leadership at NVIDIA, founding Mesosphere, and earlier research roles at FAIR.
Dale Schuurmans
Google Gemini
Professor of computing science at the University of Alberta and Canada CIFAR AI Chair with public work on reinforcement learning, optimization, and scalable machine learning.
Eugene N. Ie
Google Gemini
Eugene N. Ie is a Google DeepMind researcher with public work on machine learning and multimodal language models.
Tiago Cai
Google Gemini
Research scientist at Google DeepMind working on machine learning and large-scale multimodal models.
Tony G. Cai
Google Gemini
Tony G. Cai is a researcher at Google DeepMind and a computer science PhD student at Columbia University. His public research interests include large language models, reinforcement learning, optimization, and robotics.
Peng Wang
Alibaba Qwen
Researcher affiliated with the Qwen team at Alibaba Group on Google Scholar and coauthor of the Qwen and Qwen3 technical reports.
Brennan Saeta
Google Gemini
Research scientist at Google DeepMind working on language model reasoning and AI safety. He previously worked as a member of technical staff at Anthropic.
Donald W. McFadden
Google Gemini
Research scientist at Google DeepMind focused on long-context and memory-augmented models. His public profile describes work on architectures and algorithms that help models reason over real-world data.
Jason Wei
Google Gemini / OpenAI
Researcher at OpenAI working on reasoning and scalable oversight, with prior work on chain-of-thought prompting, instruction tuning, and aligning language models with human preferences.
Qinyu Chen
DeepSeek / Alibaba Qwen
Research scientist at Qwen, Alibaba, whose public OpenReview profile lists work on vision-language models and large language models.
Yipeng Wang
Z.ai
Research scientist at Z.ai focused on multimodal understanding and generation, large language models, and reinforcement learning. He works on pre-training, post-training, and evaluation of multimodal models.
Zihan Jiang
Z.ai
Research scientist at Z.ai focused on multimodal understanding and generation, reinforcement learning, AI agents, and end-to-end models. He received a bachelor's degree from Tsinghua University and a master's degree from UCLA.
Andrew M. Dai
Google Gemini
Research scientist at Google DeepMind in Mountain View working on machine learning, reinforcement learning, and robotics.
David Dohan
Google Gemini / OpenAI
Research engineer focused on large-scale machine learning and AI systems. He has worked at OpenAI and publishes writing and projects on his personal website.
Demis Hassabis
Google Gemini
Founder and CEO of Google DeepMind, leading AI research and product development; his work spans AI, neuroscience, game playing, and structural biology.
D. Sculley
Google Gemini
Research Director at Google working on machine learning, production systems, and sociotechnical AI.
Oriol Vinyals
Google Gemini
Chief Scientist at Google DeepMind and Vice President of Research leading Gemini, with work spanning scalable sequence learning, large language models, games, and robotics.
Rohan Anil
Google Gemini
Rohan Anil is a research scientist at Google DeepMind. His public homepage highlights work on large language models, efficient machine learning systems, and multimodal AI.
Sebastian Borgeaud
Google Gemini
Research scientist at Google DeepMind in London working on agentic reasoning, efficient inference, and large-scale post-training, with a background in high-dimensional statistics and theory.
Vincent Vanhoucke
Google Gemini
Senior Staff Research Scientist at Google DeepMind and CTO of the Gemini app, with work spanning speech, language, vision, and large-scale AI systems.
Yossi Matias
Google Gemini
Vice President of Engineering and Research at Google and site lead for the Google Center in Israel; he also leads Search, Research, and AI for Crisis Response.
Alina Beygelzimer
Google Gemini
Senior staff research scientist at Google working on algorithms for decision making under uncertainty and online learning.
Avinatan Hassidim
Google Gemini
Professor of Computer Science at the Hebrew University of Jerusalem and Visiting Faculty Researcher at Google, with work spanning algorithms, algorithmic economics, and AI-related decision systems.
Ben Wang
Google Gemini / OpenAI
Researcher at OpenAI working on frontier AI across coding, reasoning, post-training, and multimodal systems, with earlier work in accelerated computing and machine learning.
Hang Zhang
Alibaba Qwen
Researcher at Alibaba Group working on multimodal large language models; public profile and publication context connect Hang Zhang to the Qwen2-VL technical report.
Hao Xu
Z.ai
Research scientist at Z.ai focused on multimodal understanding and generation, reinforcement learning, AI agents, and end-to-end models. He received a bachelor's degree from Tsinghua University and a master's degree from Peking University.
James Manyika
Google Gemini
James Manyika is a Google leader whose public work focuses on research, technology, and society.
Linjie Li
Alibaba Qwen
Linjie Li is a research scientist at Alibaba Group and a contributor to the Qwen2.5-Omni Technical Report.
Melvin Johnson
Google Gemini
Senior Staff Research Scientist at Google DeepMind working on language modeling, speech recognition, machine translation, and multimodal understanding.
Nan Ding
Google Gemini
Researcher at Google Research whose public work includes multimodal and vision-language modeling, with arXiv publications tied to PaliGemma and related transfer work.
Qingyang Zhang
Alibaba Qwen
Second-year PhD student at Peking University focused on audio-language foundation models, trustworthy AI, and embodied AI; coauthor of Qwen2-Audio.
Quoc V. Le
Google Gemini
VP at Google DeepMind working on deep learning, computer vision, and language understanding.
Xiaoyu Hu
Alibaba Qwen
Research engineer at Alibaba Group working on audio and multimodal foundation models, multimodal RL, and speech processing; coauthor of Qwen2.5-Omni.
Yang Song
OpenAI / Alibaba Qwen
Research scientist and GPT-4 coauthor known for work on generative modeling, diffusion methods, and machine learning systems.
Yinghao Li
Alibaba Qwen
Machine learning engineer and researcher interested in large language models and multimodal audio-language systems; coauthor of Qwen2-Audio.
Yuan Cao
Google Gemini / Moonshot AI
Research scientist at Moonshot AI with public scholarly work on multimodal and generative models.
Zhenyang Wu
Z.ai
Research scientist at Z.ai with research interests in multimodal understanding and generation, large language models, and reinforcement learning. He received a bachelor's degree from the University of Science and Technology of China and a master's degree from Tsinghua University.
Zoubin Ghahramani
Google Gemini
VP of Research at Google DeepMind and Professor of Information Engineering at the University of Cambridge, known for work in probabilistic machine learning and Bayesian statistics.
Adam Lewkowycz
Google Gemini
Research scientist at Google DeepMind working on the theoretical foundations of machine learning.
Adrian Ibarz
Google Gemini
Adrian Ibarz is a Google DeepMind researcher whose public work spans machine learning, reasoning, and large multimodal models.
Aidan Clark
Google Gemini
Research scientist at Anthropic focused on building trustworthy AI systems and studying the effects of scaling; his publication list includes the Gemini technical report and related frontier-model work.
Alan Rabinovich
Google Gemini
Research scientist and founder of rabinovich.ai, with work spanning multimodal generative models, visual perception, and immersive experiences.
Anmol Kalra
Google Gemini
Anmol Kalra is a research scientist at Google DeepMind. His public homepage presents his work and publications in machine learning and AI systems.
Arun Narayanan
Google Gemini
Research scientist at Google DeepMind working on large language models and natural language processing.
Bhupendra Gupta
Google Gemini
Software engineer at Google DeepMind and PhD student at Cornell working on machine learning for social impact, with interests in LLMs, generative models, and optimization.
Chris McLeavey
Google Gemini
Chris McLeavey is a research scientist at Google DeepMind working on generalist multimodal models at the intersection of language and vision.
Danny Zhou
Google Gemini
Research scientist at Google DeepMind working on large language models and multimodal models. He earned a PhD in computer science from Stanford University.
David Krueger
Google Gemini
Assistant Professor in the Department of Computer Science and Technology at the University of Cambridge, with research focused on making AI systems safer, more efficient, and more robust.
David Uthus
Google Gemini
Research scientist at Google DeepMind focused on human-computer interaction, accessibility, and interfaces for AI systems.
Hao Ma
Google Gemini
Research scientist at Google DeepMind working on multimodal language models and long-context machine learning systems.
Jianfei Chen
Alibaba Qwen
Jianfei Chen is an assistant professor at Monash University. His research spans computer vision, machine learning, multimodality, and trustworthy AI.
Joaquin Ferrer
Google Gemini
Director of Product Management at Google DeepMind leading ML and AI platforms, model developer experiences, and workflows that power the Gemini app and API.
Jonathan Toulis
Google Gemini
Principal statistician at Google DeepMind whose work spans causal inference, statistics, and machine learning.
Joseph W. Demmel
Google Gemini
Distinguished professor emeritus of electrical engineering, computer science, and mathematics at UC Berkeley. His research focuses on numerical linear algebra, parallel computing, and communication-avoiding algorithms.
Julien Perolat
Google Gemini
Julien Perolat is a research scientist at Google DeepMind whose public homepage highlights work on game theory, multi-agent learning, reinforcement learning, and responsible AI.
Kelly Huang
Google Gemini
Research scientist at Google DeepMind working on large-scale multimodal models.
Kenrick Cato
Google Gemini
Google researcher whose publications include the Gemini technical report.
Kexuan Wei
Alibaba Qwen
Researcher working on multimodal foundation models, including Qwen3-Omni and related speech-language systems.
Lehou Cheng
Google Gemini
Postdoctoral researcher at UC Berkeley and Berkeley AI Research interested in natural language processing, machine learning, and human-computer interaction.
Linjun Yang
Alibaba Qwen
Research scientist in Tongyi Lab and technical lead of Qwen2.5-Omni, with public work on end-to-end speech understanding and generation.
Mahesh Shanmugam
Google Gemini
Mahesh Shanmugam is a research scientist at Google DeepMind whose public homepage highlights work on multimodal representation learning, self-supervised learning, and generative models.
Mark Bosma
Google Gemini
Mark Bosma is a senior research scientist at Google DeepMind. His public homepage highlights work in machine learning, reinforcement learning, and neural networks.
Mateusz Malinowski
Google Gemini
Research scientist at Google DeepMind in Switzerland working on large multimodal models and generative AI.
Matt Hoffman
Google Gemini
Researcher and engineer focused on machine learning, distributed systems, and applied algorithms; his personal site also highlights interests in psychology, neuroscience, and evolutionary biology.
Miao Du
Google Gemini
Research scientist at Google DeepMind and PhD student at Stanford University. His homepage highlights work on machine learning, reinforcement learning, language models, and recommendation systems.
Mina Lee
Google Gemini
Assistant Professor of Computer Science at the University of Southern California and incoming part-time Visiting Faculty Researcher at Google DeepMind; her research combines linguistic structure and machine learning for natural language processing.
Moe Drammeh
Google Gemini
Research scientist at Google DeepMind working on large language models, multimodal language models, and computer vision, according to his public OpenReview profile.
MohammadHassan Moghimi
Google Gemini
MohammadHassan Moghimi is a senior staff software engineer at Google DeepMind whose work focuses on multimodal models for vision and natural language, including parameter-efficient tuning, adaptation, and evaluation.
Natalie Bergas
Google Gemini
Research scientist at Google DeepMind working on multimodal machine learning, reinforcement learning, and mathematical optimization.
Paul Welbl
Google Gemini
Research scientist at Google DeepMind in London. He completed a PhD at the University of Oxford, where his work focused on natural language processing and computational argumentation.
Ravi Seethapathy
Google Gemini
Ravi Seethapathy is a research scientist at Google DeepMind. His public homepage presents work at the intersection of machine learning, science, and large-scale AI systems.
Rebecca Roelofs
Google Gemini
Senior research scientist at Google DeepMind with public work on machine learning evaluation, uncertainty, and reliability.
Rory Pilgrim
Google Gemini
Research scientist at Google DeepMind working on large language models and multimodal models. He completed a PhD in computer vision and machine learning at the University of Oxford.
Sasha Seneviratne
Google Gemini
Research scientist at Google DeepMind working on multimodal, multilingual, and efficient machine learning.
Siyao Guo
Google Gemini
Research scientist at Google DeepMind in New York working on vision-language and multimodal large language models. He is completing a PhD in computer science at Carnegie Mellon University.
Yong Cheng
Google Gemini
Research scientist at Google DeepMind in Mountain View working on large multimodal foundation models and agents. He received a PhD from the Chinese University of Hong Kong.
Zhenkai Zhu
Google Gemini
Postdoctoral scholar at Stanford HAI and incoming Assistant Professor at the University of Southern California, with research on large-scale machine learning, multimodal reasoning, and efficient training.