Atlas / Fields / Detail
Multimodal Large Language Models
Researchers connected to this field in the public atlas.
Radu Soricut
Google Gemini
Research scientist focused on machine learning and natural language understanding, with work spanning machine translation, semantic parsing, and large-scale language modeling.
Jifeng Dai
DeepSeek / MiniMax
Researcher focused on computer vision, multimodal learning, and generative AI. His public homepage says he is currently with Stepfun, after serving as a principal scientist at SenseTime Research and a researcher at Microsoft Research Asia, and that he earned a PhD in computer science from Tsinghua University.
Asterios Katsamanis
Meta AI
Researcher at Apple working on speech, audio, and multimodal machine learning; previously a senior research scientist at SRI International and also worked at Google.
Ksenia Konyushkova
Google Gemini
Ksenia Konyushkova is a research scientist at Google DeepMind in London working on computer vision and embodied AI. Before DeepMind, she co-founded Microverses and held postdoctoral roles at Imperial College London and UC Berkeley. She earned her PhD from EPFL under Pascal Fua.
Sebastian Gehrmann
Google Gemini / Mistral AI
Sebastian Gehrmann is a staff research scientist at Mistral AI working on language generation, information extraction, and the evaluation of generative systems.
Hao Yang
DeepSeek / Meta AI
Researcher at Moonshot AI working on multimodal large language models; previously a key member of Alibaba's Qwen team and author of work including Kimi-VL, DeepSeek-VL, and Qwen technical reports.
Yuchen Ge
Google Gemini
Yuchen Ge is a research scientist at Google DeepMind whose work focuses on vision-language models and multimodal machine learning.
Yoon Kim
Mistral AI
Research scientist at Mistral AI working on natural language processing and large language models; previously an assistant professor at MIT.
Lechao Xiao
Mistral AI
Research scientist at Mistral AI in Paris focused on multimodal and embodied AI. His public homepage lists prior research roles at Google DeepMind and Meta AI.
Jiahui Yu
Google Gemini
Jiahui Yu is a research scientist at Google DeepMind working on multimodal learning and large language models.
Huazuo Gao
DeepSeek
Researcher at DeepSeek AI working on decision-making and post-training for large language models.
Armand Joulin
Google Gemini
Armand Joulin is a research scientist at Google whose work spans natural language processing, machine learning, deep learning, and computer vision.
Pablo Sprechmann
Google Gemini
Pablo Sprechmann is a research scientist at Google DeepMind whose work spans representation learning, reinforcement learning, and machine learning for football tactics. Before DeepMind, he was a postdoctoral researcher at New York University working with Yann LeCun. He previously completed doctoral research under Guillermo Sapiro.
Mingze Li
Meta AI / Alibaba Qwen
Researcher at Alibaba Group exploring the math and science of large language models; incoming assistant professor at Nanyang Technological University.
Chenguang Zhu
Meta AI
Research scientist at Meta AI focused on vision-language models, large language models, and agents; public work includes the multimodal foundation model Chameleon.
Jun-Hyuk Ahn
Mistral AI
Jun-Hyuk Ahn is a Research Scientist at Mistral AI working on multimodal and video generation models. Previously he was a research engineer at DeepMind, where his public work focused on generative models and world models.
Karen Simonyan
Google Gemini
Karen Simonyan is VP of Research at Google DeepMind in London, where he leads the multimodal generative AI team. He is known for major contributions including the VGG network and work spanning systems such as AlphaGo, WaveNet, AlphaStar, and Gemini.
Xiangkun Wang
DeepSeek
Research intern at DeepSeek and undergraduate student at Tsinghua University focusing on multimodal large language models, agents, and embodied AI.
Zezhou Wang
DeepSeek
Research intern at DeepSeek and master's student at Tsinghua University working on large language models, reinforcement learning, and multimodal understanding and generation.
Armand Joulin
Meta AI
Armand Joulin is a researcher and the cofounder and chief scientist of Mistral AI. Public arXiv records also list him as an author of LLaMA: Open and Efficient Foundation Language Models.
Jaehoon Lee
Google Gemini
Jaehoon Lee is a researcher at Google DeepMind. His work covers practical and foundational aspects of large language models, together with deep learning theory and reinforcement learning.
Corey Lynch
Google Gemini
Corey Lynch is a research scientist at Google DeepMind working on embodied AI and robotics. He previously cofounded Ikonos.
Jonathan Tompson
Google Gemini
Jonathan Tompson is a research scientist working on robotics, perception, and embodied AI. His public profile highlights work on computer vision, simulation, reinforcement learning, and robot intelligence.
Lewis Houghton
Google Gemini
Software engineer at Google DeepMind working on model architecture and engineering for general-purpose language models.
Vivek Natarajan
Google Gemini
Research scientist at Google DeepMind working on multimodal medical AI and personalized health applications.
Jinghong Yuan
DeepSeek
PhD student at UC San Diego researching reasoning, planning, and multimodal foundation models; publication context connects Jinghong Yuan to Janus-Pro.
Nicholas Crane
Meta AI
Research scientist at Meta working on computer vision and multimodal foundation models with an emphasis on robustness, trustworthiness, and alignment.
Jiaxuan Fan
DeepSeek
Jiaxuan Fan is a machine learning researcher at DeepSeek. Her interests include data-centric AI, model efficiency, and multimodal learning.
Angela Fan
Meta AI / Mistral AI
Generative AI Research Scientist at Meta and founder of Baobab AI Labs, with work spanning multilingual models, open-source LLMs, machine translation, and story generation.
Binyuan Hui
DeepSeek / Alibaba Qwen
AI researcher whose public work includes large language models, vision-language models, and multimodal systems. His public profile notes prior work as a senior algorithm expert at Alibaba and co-authorship of Qwen technical reports.
Matthias Minderer
Google Gemini
Research Scientist at Google DeepMind in London working on large multimodal models, evaluation, agents, and computer vision; he completed a PhD at the University of Tuebingen and MPI for Intelligent Systems.
Julian Schrittwieser
Google Gemini
Julian Schrittwieser is a Google DeepMind researcher known for reinforcement learning and game-playing systems.
Mike Lewis
Meta AI
Mike Lewis is a natural language processing researcher whose public work includes multimodal language modeling and large-scale pretraining.
Xiaoze Liu
DeepSeek
Research intern at DeepSeek and PhD student at Carnegie Mellon University interested in machine learning, agents, language, vision, robotics, and healthcare.
Xinyu Li
DeepSeek
Research intern at DeepSeek and undergraduate student at Tsinghua University working on vision-language models, inference-time scaling, and reinforcement learning.
Zhihuan Liu
DeepSeek
Research intern at DeepSeek and PhD student at Shanghai Jiao Tong University working on large language models, reasoning, agents, and reinforcement learning.
Alaaeldin El-Nouby
Meta AI
Alaaeldin El-Nouby is a machine learning researcher whose public work includes multimodal and vision-language models.
Christopher Pal
Meta AI
Christopher Pal is a professor and AI researcher whose public work spans deep learning, multimodal learning, and large language models.
Hongxia Yang
DeepSeek
External advisor at DeepSeek and former Corporate Vice President and Chief Scientist at Microsoft Research Asia.
Michael Uthus
Google Gemini
Michael Uthus works on frontier model safety and evaluation at Google DeepMind.
Piotr Padlewski
Google Gemini
Piotr Padlewski is a researcher working on efficient language and multimodal models, with publications including Gemma 3n and EdgeMark.
Sami Stigzelius
Google Gemini
Machine learning researcher at Google DeepMind focused on multimodal foundation models and post-training.
Srujana Merugu
Meta AI
Research scientist at Meta AI focused on multimodal and embodied AI, with interests in computer vision, deep learning, and decision making.
Xiaodong Zhang
Mistral AI
Research scientist at Mistral AI whose homepage highlights work on large language models, agents, multimodal understanding, and scaling.
Jie Zhou
DeepSeek / Moonshot AI
Jie Zhou is a Moonshot AI contributor and co-author of Kimi k1.5: Scaling Reinforcement Learning with LLMs.
Jason Wei
Google Gemini / OpenAI
Researcher at OpenAI working on reasoning and scalable oversight, with prior work on chain-of-thought prompting, instruction tuning, and aligning language models with human preferences.
Khaled Saeed
Meta AI
Khaled Saeed is a Research Scientist at Meta working on efficient multimodal reasoning and AI systems.
Saswato R. Das
Mistral AI
Saswato R. Das is a postdoctoral researcher at Mistral AI working on computer vision and multimodal foundation models.
Fei Xia
Google Gemini / Mistral AI
Senior Research Scientist at Google DeepMind working on Gemini, embodied AI, and multimodal foundation models for robotics and perception.
David Dohan
Google Gemini / OpenAI
Research engineer focused on large-scale machine learning and AI systems. He has worked at OpenAI and publishes writing and projects on his personal website.
Louis Martin
Meta AI / Mistral AI
Louis Martin is a scientist at Meta AI and a PhD student at McGill University and Mila. His research spans natural language processing and machine learning.
Orhan Firat
Google Gemini
Research scientist at Google Research whose public work spans multilingual and large-scale language modeling; arXiv author results include the PaLM paper.
Sebastian Borgeaud
Google Gemini
Research scientist at Google DeepMind in London working on agentic reasoning, efficient inference, and large-scale post-training, with a background in high-dimensional statistics and theory.
Vincent Vanhoucke
Google Gemini
Senior Staff Research Scientist at Google DeepMind and CTO of the Gemini app, with work spanning speech, language, vision, and large-scale AI systems.
Adam Casson
Google Gemini
Research scientist at Google DeepMind working on large language models in London. His public site lists interests in efficient inference, evaluation, multi-agent systems, and interpretability, and notes earlier work on code intelligence at Graphcore.
Amjad Almahairi
Meta AI / Mistral AI
Research scientist at Meta working on machine learning, with interests in generative AI, multimodality, and helping models develop world understanding.
Andy Brohan
Google Gemini
Research scientist at Google DeepMind working on general robotic intelligence, robot learning, and real-world datasets for improved robot dexterity and understanding.
Azade Nova
Google Gemini
Research scientist at Meta whose interests include natural language processing, multimodal AI, social computing, and computational social science.
Dylan Cope
Google Gemini
Research engineer at Google DeepMind focused on large language models, long-context systems, and efficient inference; previously worked on speech and generative models.
Faisal Azhar
Meta AI
Faisal Azhar is a PhD candidate in computer science at Stanford University. His work focuses on multimodal systems that unify text, image, and speech, together with efficient training and inference for large-scale machine learning.
Fang Xia
Google Gemini
Fang Xia is a Research Scientist at Google DeepMind working on bringing AI into the physical world through robotics and embodied intelligence.
Myungjae Ahn
Google Gemini
Myungjae Ahn is a postdoctoral researcher at Google DeepMind whose work focuses on multimodal AI, including language, speech, vision, and robotics.
Peter Florence
Google Gemini
Research scientist at Google DeepMind and co-founder of Waypoint, working on robot learning.
Rishabh Kabra
Google Gemini
Rishabh Kabra is a research scientist at Google DeepMind. His public homepage highlights work on machine learning systems and large-scale language model research.
Shang Yang
DeepSeek / MiniMax
Researcher focused on reinforcement learning, large language model reasoning, and multimodal foundation models; coauthor of Janus-Pro and MiniMax-M1.
Xiangyu Yue
Google Gemini
Research scientist at Google DeepMind working on multimodal large language models and efficient language modeling.
Abhijit Guha Roy
Google Gemini
Google researcher whose publications include the Gemma 3n technical report.
Alberto Mario Cadeddu
Meta AI
Senior AI research scientist at Meta and affiliate researcher at MIT working on computer vision and machine learning.
Aleks Hartholz
Google Gemini
Google researcher whose publications include the Gemma 3n, Gemma 3, and Gemma 2 technical reports.
Anelia Angelova
Google Gemini
Anelia Angelova works on robotics, computer vision, and machine learning, and her public bio notes more than four years at Google DeepMind before becoming VP of AI at Humane.
Carl Vondrick
Google Gemini
Professor of computer science at Columbia University whose public research focuses on computer vision, video understanding, robotics, and machine learning.
Denis Kocisky
Mistral AI
Research Scientist at Mistral AI whose work spans language models and multimodal systems; previously held research roles at Meta and DeepMind.
Elizabeth Cole
Google Gemini
Researcher at Google DeepMind with public publications on language modeling, multimodal systems, and speech generation, including Gemma 3n, CT5, and ELLA.
Fei-Fei Li
Meta AI
Computer scientist known for work in computer vision, machine learning, and human-centered AI.
Frederik Ebert
Google Gemini
Google researcher whose publications include the Gemma 3n technical report.
Geneviève Dorkenwald
Meta AI
Research scientist at FAIR working on multimodal systems.
Graham Neubig
Mistral AI
Computer scientist at Carnegie Mellon University whose work spans machine learning, natural language processing, and human language technologies. His public homepage lists recent work including Pixtral and collaborations with Mistral AI.
Jascha Sohl-Dickstein
Mistral AI
Chief scientist at Mistral AI whose work focuses on variational methods, generative models, diffusion, and large language models.
Jon Lamprecht
Mistral AI
Research scientist at Mistral AI focused on multimodal representation learning, multimodal language models, and efficient inference; previously worked at Google and DeepMind.
Kira Radinsky
Google Gemini
Researcher working on multimodal and large language models, including Gemma 3n.
Luke M. Zettlemoyer
Meta AI
Professor in computer science and engineering at the University of Washington, scientist at the Allen Institute for Artificial Intelligence, and co-director of the UW NLP group.
Madhu Krishna
Meta AI
Research scientist at Meta working on multimodal reasoning, vision-language models, multimodal generation, and compression. His homepage highlights a background spanning machine learning, computer vision, and NLP.
Nathan Schuh
Google Gemini
Research scientist at Google DeepMind focused on scaling frontier models and advancing the Gemma family of open models.
Nat Levine
Google Gemini
Research scientist at Google DeepMind interested in reasoning and multimodal understanding in machine learning and AI systems.
Pankaj Doshi
Google Gemini
Research scientist at Google DeepMind whose work spans sequential decision making, multiagent systems, and responsible AI.
Rogerio Feris
Mistral AI
Rogerio Feris is an AI researcher at Mistral AI. His work spans computer vision, multimodal AI, distributed machine learning, and AI systems.
Saurav Belkhale
Google Gemini
Saurav Belkhale is a researcher at Google DeepMind working on dexterous robot manipulation at the intersection of control, computer vision, and machine learning.
Sébastien Bubeck
Meta AI
Vice president of GenAI at Microsoft AI and a long-time machine learning researcher known for work on the foundations of reinforcement learning and bandits.
Sergio de Cesare
Google Gemini
Researcher working on multimodal foundation models, including Gemma 3n.
Szymon Migacz
Mistral AI
Szymon Migacz is a software engineer at Mistral AI working on large-scale AI systems, with prior experience at NVIDIA.
Tao Ge
Google Gemini
Research scientist at Google DeepMind working on large language models, machine translation, and natural language processing.
Tianhe Yu
Meta AI
Research scientist at Meta working on embodied AI, robotics, and reinforcement learning.
Udit Sodhi
Meta AI
Research scientist at Meta whose public work covers embodied AI, language agents, and multimodal systems; his arXiv author results include the Chameleon multimodal model paper.
Urvashi Khandelwal
Mistral AI
Senior research scientist at Mistral AI working on domain knowledge, factuality, efficiency, and personalization in large language models.
Young-Min Kim
Google Gemini
Staff research scientist at Google DeepMind in Mountain View working on multimodal language model pretraining and post-training.
Zhitao Ying
Google Gemini
Zhitao Ying is a Research Scientist at Google DeepMind.