Multimodal Large Language Models | Field

Radu Soricut is a Distinguished Scientist at Google DeepMind working on natural language processing and machine learning, with earlier Google Research and Google Translate work.

Tulsee Doshi is a Senior Director of Product Management at Google DeepMind and currently leads product for Gemini Model. She previously served as Head of Product for Responsible AI at Google and holds both an M.S. and a Ph.D. in Symbolic Systems from Stanford.

Jifeng Dai is a tenured associate professor in the Department of Electronic Engineering at Tsinghua University. His homepage says his current research focuses on agentic AI and continual learning, and lists prior roles at Shanghai AI Lab, SenseTime Research, and Microsoft Research Asia.

Ksenia Konyushkova is a research scientist at Google DeepMind in London working on computer vision, embodied AI, and reinforcement learning. Her personal homepage describes earlier roles at Google Research in Zurich and postdoctoral work after EPFL.

Staff research scientist at Alibaba's Qwen Team and initiator of OpenDevin, focused on foundation models, reasoning models, coding agents, and computer-use agents.

Research scientist at Meta AI working on natural language processing and AI safety. His homepage says he completed a PhD at Facebook AI Research and Inria focused on text simplification and accessibility.

Hao Yang works on multimodal data infrastructure at Moonshot.ai. He previously worked at ByteDance ICVG and Microsoft Research Asia, and received BS and PhD degrees from Tsinghua University.

David Dohan is a computer scientist at OpenAI studying scalable alignment of language models and generally intelligent reasoning systems. His personal site also notes prior work at Google Brain on foundation model programs, code generation, protein engineering, and scientific reasoning.

Vahid Noroozi is an applied research scientist at NVIDIA. His NVIDIA author profile says his work focuses on deep learning for speech and natural language processing and that he received a PhD in computer science from the University of Illinois Chicago. His homepage says he previously worked on post-training large language models at Google DeepMind after earlier multimedia and neuroscience research at TU Delft and the Max Planck Institute for Biological Cybernetics.

Kevin Robinson is a research engineer at Google Research working on evaluations of language models and NLP systems. His Google Research profile says he previously worked as a special education teacher, a software engineer building visualization and analytics systems, and a researcher in K12 computer science education.

Amjad Almahairi is a researcher at Anyscale. His OpenReview profile lists work spanning LLMs, VLLMs, generative models, and deep learning, with earlier roles at Facebook and Element AI.

Sebastian Gehrmann leads Responsible AI in the office of the CTO at Bloomberg and works on natural language generation, model evaluation, and interpretability.

Principal scientist and senior manager at IBM Research's MIT-IBM Watson AI Lab. His public homepage emphasizes computer vision, multimodal AI, and augmenting large language models with memory for enterprise use.

Staff Research Scientist at Google DeepMind. Public Google profiles describe earlier work at Google Brain and Microsoft Research and research spanning machine learning, graph mining, and unstructured data analytics.

Jiahui Yu is a Research Lead at OpenAI leading the Perception team. His homepage notes prior co-leadership on Gemini Multimodal at Google DeepMind and work on deep learning and high-performance computing.

Bhuwan Dhingra is an associate professor of computer science at Duke University and is also affiliated with Google DeepMind. His public Duke and lab profiles say he leads the AI for Language Technologies lab, co-directs Pratt at TUNL, is a member of Duke AI Health, works on natural language processing, multimodal learning, and trustworthy AI, and received a PhD in computer science from Carnegie Mellon University in 2019.

ATHENA Research Center's profile describes Athanasios (Nassos) Katsamanis as a principal researcher there since 2019, focusing on multimodal speech processing, multimodal human-computer interaction, and human behavior analysis.

Lechao Xiao's OpenReview profile lists him as a researcher at Google DeepMind. His homepage says his current focus is scaling-centric machine learning and lists interests in deep learning theory, generalization, optimization, training dynamics, kernels, and Gaussian processes.

Researcher at DeepSeek AI working on decision-making and post-training for large language models.

Corey Lynch is a research scientist at Google DeepMind working on embodied AI and robotics. He previously cofounded Ikonos.

Yuchen Ge is a research scientist at Google DeepMind whose work focuses on vision-language models and multimodal machine learning.

Recent public bios describe Angela Fan as a researcher at Meta working on large language models, machine translation, multilingual generation, and story generation.

Research scientist at Mistral AI working on natural language processing and large language models; previously an assistant professor at MIT.

Senior Staff Research Scientist and Tech Lead Manager at Google DeepMind Robotics, focused on embodied agents and foundation models for robot decision-making.

Luyao Yuan is a research scientist at FAIR at Meta. Her homepage says her research aims to build AI systems that can see, learn, reason, and interact like humans, and that she completed a PhD in EECS at MIT advised by Antonio Torralba after earlier research with Song Han at MIT and Jiajun Wu at Stanford.

Szymon Migacz is a researcher at NVIDIA. His OpenReview profile lists NVIDIA as his affiliation since 2015, identifies deep learning as his expertise, and records University of Warsaw degrees in computer science.

Pablo Sprechmann is a research scientist at Google DeepMind whose work spans representation learning, reinforcement learning, and machine learning for football tactics. Before DeepMind, he was a postdoctoral researcher at New York University working with Yann LeCun. He previously completed doctoral research under Guillermo Sapiro.

Research Scientist at Google DeepMind in London working on large multimodal models, evaluation, agents, and computer vision; he completed a PhD at the University of Tuebingen and MPI for Intelligent Systems.

Public research profiles show Armand Joulin as an author on work in natural language processing, information retrieval, and computer vision.

Research scientist at Meta AI focused on vision-language models, large language models, and agents; public work includes the multimodal foundation model Chameleon.

Research intern at DeepSeek and undergraduate student at Tsinghua University focusing on multimodal large language models, agents, and embodied AI.

Research intern at DeepSeek and PhD student at Carnegie Mellon University interested in machine learning, agents, language, vision, robotics, and healthcare.

Research intern at DeepSeek and undergraduate student at Tsinghua University working on vision-language models, inference-time scaling, and reinforcement learning.

Research intern at DeepSeek and master's student at Tsinghua University working on large language models, reinforcement learning, and multimodal understanding and generation.

Research intern at DeepSeek and PhD student at Shanghai Jiao Tong University working on large language models, reasoning, agents, and reinforcement learning.

Jascha Sohl-Dickstein is a member of the technical staff at Anthropic. His public site highlights work on diffusion models, overparameterized neural networks, learned optimizers, and large language models, and notes prior roles at Google Brain and Google DeepMind.

Jaehoon Lee is a researcher at Google DeepMind. His work covers practical and foundational aspects of large language models, together with deep learning theory and reinforcement learning.

Jonathan Tompson is a research scientist working on robotics, perception, and embodied AI. His public profile highlights work on computer vision, simulation, reinforcement learning, and robot intelligence.

Software engineer at Google DeepMind working on model architecture and engineering for general-purpose language models.

Research scientist at Google DeepMind working on multimodal medical AI and personalized health applications.

Public report authorship links Jie Zhou to the MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention at MiniMax.

Armand Joulin is listed as an author of the Meta AI technical report Llama 2: Open Foundation and Fine-Tuned Chat Models.

Public report authorship links Jason Wei to the Gemma 3n Technical Report at Google.

PhD student at UC San Diego researching reasoning, planning, and multimodal foundation models; publication context connects Jinghong Yuan to Janus-Pro.

Research scientist at Meta working on computer vision and multimodal foundation models with an emphasis on robustness, trustworthiness, and alignment.

Jiaxuan Fan is a machine learning researcher at DeepSeek. Her interests include data-centric AI, model efficiency, and multimodal learning.

Sébastien Bubeck's public homepage says he works on AI at OpenAI, after earlier work on convex optimization, online algorithms, and adversarial robustness.

Julian Schrittwieser is a Google DeepMind researcher known for reinforcement learning and game-playing systems.

Mike Lewis is a natural language processing researcher whose public work includes multimodal language modeling and large-scale pretraining.

Karén Simonyan is Chief Scientist at Microsoft AI. Public Microsoft sources describe him as a co-founder and former Chief Scientist of Inflection and credit him on recent Microsoft AI model work.

Kanishka Rao is listed in the author list for the Google DeepMind report 'Gemini Robotics: Bringing AI into the Physical World.'

Alaaeldin El-Nouby is a machine learning researcher whose public work includes multimodal and vision-language models.

Christopher Pal is a professor and AI researcher whose public work spans deep learning, multimodal learning, and large language models.

Michael Uthus works on frontier model safety and evaluation at Google DeepMind.

Piotr Padlewski is a researcher working on efficient language and multimodal models, with publications including Gemma 3n and EdgeMark.

Machine learning researcher at Google DeepMind focused on multimodal foundation models and post-training.

Research scientist at Meta AI focused on multimodal and embodied AI, with interests in computer vision, deep learning, and decision making.

Research scientist at Mistral AI whose homepage highlights work on large language models, agents, multimodal understanding, and scaling.

Khaled Saeed is a Research Scientist at Meta working on efficient multimodal reasoning and AI systems.

Saswato R. Das is a postdoctoral researcher at Mistral AI working on computer vision and multimodal foundation models.

Research scientist at Google Research whose public work spans multilingual and large-scale language modeling; arXiv author results include the PaLM paper.

Research scientist at Google DeepMind in London working on agentic reasoning, efficient inference, and large-scale post-training, with a background in high-dimensional statistics and theory.

Senior Staff Research Scientist at Google DeepMind and CTO of the Gemini app, with work spanning speech, language, vision, and large-scale AI systems.

Jiaxuan Li is listed as an author of the Google technical report Gemma 3n Technical Report.

Public report authorship links Mingxing Zhang to the Gemma 3n Technical Report at Google.

Mingze Li is listed as an author of the Qwen technical report Qwen3 Technical Report.

Public report authorship links Sebastian Goodman to the Gemma 3n Technical Report at Google.

Public report authorship links Shang Yang to the MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention at MiniMax.

Public report authorship links Su Wang to the Gemma 3n Technical Report at Google.

Research scientist at Google DeepMind working on large language models in London. His public site lists interests in efficient inference, evaluation, multi-agent systems, and interpretability, and notes earlier work on code intelligence at Graphcore.

Research scientist at Google DeepMind working on general robotic intelligence, robot learning, and real-world datasets for improved robot dexterity and understanding.

Research engineer at Google DeepMind focused on large language models, long-context systems, and efficient inference; previously worked on speech and generative models.

Faisal Azhar is a PhD candidate in computer science at Stanford University. His work focuses on multimodal systems that unify text, image, and speech, together with efficient training and inference for large-scale machine learning.

Fang Xia is a Research Scientist at Google DeepMind working on bringing AI into the physical world through robotics and embodied intelligence.

Myungjae Ahn is a postdoctoral researcher at Google DeepMind whose work focuses on multimodal AI, including language, speech, vision, and robotics.

Research scientist at Google DeepMind and co-founder of Waypoint, working on robot learning.

Rishabh Kabra is a research scientist at Google DeepMind. His public homepage highlights work on machine learning systems and large-scale language model research.

Research scientist at Google DeepMind working on multimodal large language models and efficient language modeling.

Public report authorship links Albert Webson to the Gemma 3n Technical Report at Google.

Public report authorship links Andrew Webb to the Gemma 3n Technical Report at Google.

Public report authorship links Ankur Handa to the Gemma 3n Technical Report at Google.

Public report authorship links Arezoo Rajabi to the Gemma 3n Technical Report at Google.

Bruno Lefaudeux is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Public report authorship links Caroline Pantofaru to the Gemma 3n Technical Report at Google.

Public report authorship links David Hong to the Gemma 3n Technical Report at Google.

Denis Kocisky is listed as an author of the Mistral AI technical report Pixtral 12B.

Duc Pham is listed as an author of the Mistral AI technical report Pixtral 12B.

Public report authorship links Elad Segal to the Gemma 3n Technical Report at Google.

Public report authorship links Eric Chu to the Gemma 3n Technical Report at Google.

Public report authorship links Fei Xia to the Gemma 3n Technical Report at Google.

Hongxia Yang is listed as an author of the DeepSeek technical report Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling.

Public report authorship links Huiyu Wang to the Gemma 3n Technical Report at Google.

Jaime Carbonell is listed as an author of the Mistral AI technical report Pixtral 12B.

Public report authorship links Jeffrey Ding to the Gemma 3n Technical Report at Google.

Public report authorship links Jing Yu Koh to the Gemma 3n Technical Report at Google.

Jon Lamprecht is listed as an author of the Mistral AI technical report Pixtral 12B.

Jules Ponce is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Jun-Hyuk Ahn is listed as an author of the Mistral AI technical report Pixtral 12B.

Public report authorship links Kevin Albrecht to the Gemma 3n Technical Report at Google.

Public report authorship links Laurent Mouchere to the Gemma 3n Technical Report at Google.

Public report authorship links Limin Zhu to the Gemma 3n Technical Report at Google.

Public report authorship links Livio Baldini Soares to the Gemma 3n Technical Report at Google.

Public report authorship links Maciej Abramczyk to the Gemma 3n Technical Report at Google.

Public report authorship links Marcus Hutter to the Gemma 3n Technical Report at Google.

Public report authorship links Michal Matena to the Gemma 3n Technical Report at Google.

Mingyang Chen is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Public report authorship links Mohammad Sadegh Sharifi to the Gemma 3n Technical Report at Google.

Public report authorship links Noor Alabdulmohsin to the Gemma 3n Technical Report at Google.

Public report authorship links Oliver Groth to the Gemma 3n Technical Report at Google.

Public report authorship links Olivia Watkins to the Gemma 3n Technical Report at Google.

Public report authorship links Oscar Klimovskikh to the Gemma 3n Technical Report at Google.

Paul A. Crook is listed as an author of the Mistral AI technical report Pixtral 12B.

Public report authorship links Philip Torr to the Gemma 3n Technical Report at Google.

Public report authorship links Pooja Rao to the Gemma 3n Technical Report at Google.

Public report authorship links Po-Sen Huang to the Gemma 3n Technical Report at Google.

Public report authorship links Qiaochu Chen to the Gemma 3n Technical Report at Google.

Public report authorship links Qimin Chen to the Gemma 3n Technical Report at Google.

Public report authorship links Roman Ring to the Gemma 3n Technical Report at Google.

Public report authorship links Sai Praneeth Karimireddy to the Gemma 3n Technical Report at Google.

Public report authorship links Samy Bengio to the Gemma 3n Technical Report at Google.

Public report authorship links Shakti Sharma to the Gemma 3n Technical Report at Google.

Public report authorship links Sid Mittal to the Gemma 3n Technical Report at Google.

Public report authorship links Stephanie Houde to the Gemma 3n Technical Report at Google.

Public report authorship links Stephan Rabanser to the Gemma 3n Technical Report at Google.

Public report authorship links Sunita Chandrasekaran to the Gemma 3n Technical Report at Google.

Public report authorship links Surabhi Swaroop to the Gemma 3n Technical Report at Google.

Public report authorship links Vikas Sindhwani to the Gemma 3n Technical Report at Google.

Public report authorship links Vinitha Jeyakumar to the Gemma 3n Technical Report at Google.

Public report authorship links Weixuan Wang to the Gemma 3n Technical Report at Google.

Public report authorship links Wenxin Zou to the Gemma 3n Technical Report at Google.

Wesley H. Tiong is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Xin Wang is listed as an author of the Mistral AI technical report Pixtral 12B.

Public report authorship links Yinghui Xu to the Gemma 3n Technical Report at Google.

Yuchen Yang is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Google researcher whose publications include the Gemma 3n technical report.

Senior AI research scientist at Meta and affiliate researcher at MIT working on computer vision and machine learning.

Google researcher whose publications include the Gemma 3n, Gemma 3, and Gemma 2 technical reports.

Anelia Angelova works on robotics, computer vision, and machine learning, and her public bio notes more than four years at Google DeepMind before becoming VP of AI at Humane.

Professor of computer science at Columbia University whose public research focuses on computer vision, video understanding, robotics, and machine learning.

Researcher at Google DeepMind with public publications on language modeling, multimodal systems, and speech generation, including Gemma 3n, CT5, and ELLA.

Computer scientist known for work in computer vision, machine learning, and human-centered AI.

Google researcher whose publications include the Gemma 3n technical report.

Research scientist at FAIR working on multimodal systems.

Computer scientist at Carnegie Mellon University whose work spans machine learning, natural language processing, and human language technologies. His public homepage lists recent work including Pixtral and collaborations with Mistral AI.

Researcher working on multimodal and large language models, including Gemma 3n.

Professor in computer science and engineering at the University of Washington, scientist at the Allen Institute for Artificial Intelligence, and co-director of the UW NLP group.

Research scientist at Meta working on multimodal reasoning, vision-language models, multimodal generation, and compression. His homepage highlights a background spanning machine learning, computer vision, and NLP.

Research scientist at Google DeepMind focused on scaling frontier models and advancing the Gemma family of open models.

Research scientist at Google DeepMind interested in reasoning and multimodal understanding in machine learning and AI systems.

Research scientist at Google DeepMind whose work spans sequential decision making, multiagent systems, and responsible AI.

Saurav Belkhale is a researcher at Google DeepMind working on dexterous robot manipulation at the intersection of control, computer vision, and machine learning.

Researcher working on multimodal foundation models, including Gemma 3n.

Research scientist at Google DeepMind working on large language models, machine translation, and natural language processing.

Research scientist at Meta working on embodied AI, robotics, and reinforcement learning.

Research scientist at Meta whose public work covers embodied AI, language agents, and multimodal systems; his arXiv author results include the Chameleon multimodal model paper.

Senior research scientist at Mistral AI working on domain knowledge, factuality, efficiency, and personalization in large language models.

Staff research scientist at Google DeepMind in Mountain View working on multimodal language model pretraining and post-training.

Zhitao Ying is a Research Scientist at Google DeepMind.

Radu Soricut

Tulsee Doshi

Jifeng Dai

Ksenia Konyushkova

Binyuan Hui

Louis Martin

Hao Yang

David Dohan

Vahid Noroozi

Kevin Robinson

Amjad Almahairi

Sebastian Gehrmann

Rogerio Feris

Azade Nova

Jiahui Yu

Bhuwan Dhingra

Asterios Katsamanis

Lechao Xiao

Huazuo Gao

Corey Lynch

Yuchen Ge

Angela Fan

Yoon Kim

Fei Xia

Luyao Yuan

Szymon Migacz

Pablo Sprechmann

Matthias Minderer

Armand Joulin

Chenguang Zhu

Xiangkun Wang

Xiaoze Liu

Xinyu Li

Zezhou Wang

Zhihuan Liu

Jascha Sohl-Dickstein

Jaehoon Lee

Jonathan Tompson

Lewis Houghton

Vivek Natarajan

Jie Zhou

Armand Joulin

Jason Wei

Jinghong Yuan

Nicholas Crane

Jiaxuan Fan

Sébastien Bubeck

Julian Schrittwieser

Mike Lewis

Karen Simonyan

Kanishka Rao

Alaaeldin El-Nouby

Christopher Pal

Michael Uthus

Piotr Padlewski

Sami Stigzelius

Srujana Merugu

Xiaodong Zhang

Khaled Saeed

Saswato R. Das

Orhan Firat

Sebastian Borgeaud

Vincent Vanhoucke

Jiaxuan Li

Mingxing Zhang

Mingze Li

Sebastian Goodman

Shang Yang

Su Wang

Adam Casson

Andy Brohan

Dylan Cope

Faisal Azhar

Fang Xia

Myungjae Ahn

Peter Florence

Rishabh Kabra

Xiangyu Yue

Albert Webson

Andrew Webb