Atlas / Fields / Detail
Multimodal Language Models
Researchers connected to this field in the public atlas.
Ruoming Pang
Apple
Research scientist working on large-scale deep learning for speech translation, spoken language understanding, question answering, healthcare, and multimodal models.
Yinfei Yang
Apple
Research scientist at Apple focused on natural language processing and machine learning.
Yann LeCun
Apple
Yann LeCun is Chief AI Scientist at Meta, Silver Professor at New York University, and a pioneer of modern deep learning and computer vision.
Haoxuan You
Apple
Research scientist on Apple Foundation Models whose work focuses on machine learning systems, multimodal foundation models, and AI agents.
Max Schwarzer
Apple
Max Schwarzer is a reinforcement learning researcher whose work focuses on scaling and sample-efficient RL. He completed a PhD at Mila, later interned in Apple's machine learning research group, and was an author on Apple's MM1 multimodal pre-training report.
Samia Touileb
Cohere
Associate Professor in Natural Language Processing at the University of Bergen whose work focuses on bias and fairness in NLP, information extraction, summarization, and under-resourced languages.
Jiahui Yu
Google Gemini
Jiahui Yu is a research scientist at Google DeepMind working on multimodal learning and large language models.
Danny Driess
Google Gemini
Danny Driess is a research scientist at Google DeepMind whose work focuses on general AI, robot learning, and multimodal foundation models.
Awni Hannun
Apple
Researcher and engineer working on machine learning, software, and hardware systems.
Sachin Kumar
Apple
Sachin Kumar is a researcher at Apple and incoming assistant professor at UC San Diego. His work focuses on natural language processing, efficient and multilingual language models, and machine learning systems.
Arianna Bisazza
Cohere
Associate professor of natural language processing at the University of Groningen and research scientist at Cohere Labs, with work spanning machine translation, multilingual models, and multimodal language understanding.
Ari Holtzman
Cohere
Assistant professor of computer science at the University of Chicago studying language generation, dialogue systems, and aligning models with human preferences.
Jitendra Malik
Apple
Jitendra Malik is a computer vision and machine learning researcher at UC Berkeley whose public homepage and Google Scholar profile highlight work on image understanding, robotics, and foundation models.
Jonas Geiping
Apple
Machine learning researcher at Apple Machine Learning Research working at the intersection of optimization, privacy, and security.
Munsina Sundaram
Cohere
Machine learning engineer and researcher based in the San Francisco Bay Area whose interests include multilingual and multimodal machine learning, responsible AI, and applications in healthcare and education.
Teddy Karrer
Google Gemini
Teddy Karrer is a research scientist working on embodied AI, multimodal reasoning, and machine learning for interactive systems. His public profile highlights robotics, decision making, and intelligent agents.
Yuyin Zhou
NVIDIA
Assistant Professor of Computer Science and Engineering at UC Santa Cruz working on multimodal learning, computer vision, and medical image analysis.
Zhaowen Wang
NVIDIA
Research manager at NVIDIA working on large-scale distributed pretraining, synthetic data, multimodal LLMs, and computer vision.
Floris Weers
Apple
Research scientist at Apple working on efficient and multilingual language modeling, speech and language systems, and large language models.
Andy Zeng
Google Gemini
Andy Zeng is a Research Scientist at Google DeepMind. His public research interests include robot learning, computer vision, graphics, and personalized 3D content generation.
Brandon McKinzie
Apple
Senior research scientist at Apple working on large multimodal foundation models, with prior work on large language models at MosaicML.
Neil Houlsby
Apple
Neil Houlsby works on adaptation of large language models, transfer learning, parameter-efficient fine-tuning, and inference efficiency.
Scott Reed
Google Gemini
Research scientist at Google DeepMind working on language, vision, action, and robotics; previously on the Google Brain team and a co-creator of the first text-to-image GAN.
Angela Fan
Apple
Research scientist at Apple working at the intersection of natural language processing, machine learning, and AI, with a focus on building more intelligent, robust, and reliable systems.
Jianfeng Gao
NVIDIA
Jianfeng Gao is a researcher in natural language processing and multimodal foundation models whose public homepage and Google Scholar profile highlight work on dialogue systems, retrieval, reasoning, and vision-language models.
Nicolas Heess
Google Gemini
Nicolas Heess is a research scientist at Google DeepMind whose work focuses on machine learning, reinforcement learning, and robotics.
Peter Grasch
Apple
Research scientist at Apple focused on state-of-the-art machine learning and computer vision methods.
Zirui Wang
Apple
Senior researcher at Apple working on large models, multimodal learning, and speech processing, according to his personal site.
Raza Habib
Cohere
Research scientist and engineer focused on multimodal and multilingual language models, with public work on translation, retrieval, and agent systems.
Yusuke M. Asano
Google Gemini
Research scientist at Google whose work spans computer vision, multimodal learning, and large embodied models, including PaLM-E.
Aida Amini
Cohere
Researcher focused on grounded language understanding, question answering, semantic parsing, and natural language inference.
Ishan Misra
Apple
Ishan Misra is a Research Scientist at Apple whose work spans computer vision, multimodal learning, and large foundation models. He has contributed to Apple Intelligence foundation model research.
Johnny Mao
Google Gemini
Senior research scientist at Google DeepMind working on machine learning.
Marc G. Bellemare
Google Gemini
Principal research scientist at Google DeepMind and professor of computer science at McGill University.
Masoud Alizadeh
Apple
Research scientist at Apple specializing in multilingual and multimodal generative models.
Montserrat Gonzalez Arenas
Google Gemini
Montserrat Gonzalez Arenas is a research engineer at Google Research whose public work focuses on robot learning and mobile manipulation, including robotic table wiping, waste sorting, and RT-Trajectory for robot task generalization.
Nan Du
Apple
Research scientist at Apple Foundation Models working on large language models and multimodal systems; previously a research scientist at Google and Meta.
Paria Hafezi
Apple
Research scientist and engineer at Apple working on foundation models for speech and language, with interests in explainability and interpretability.
Thaddeus Culhane
NVIDIA
Research scientist at NVIDIA working on multimodal AI, especially language and vision models.
Zhengfeng Lai
Apple
Zhengfeng Lai is an AI/ML engineer at Apple working on generative AI and multimodal learning. He is also a PhD student at Cornell University whose interests include multimodal learning, model reasoning, and interpretability, and he has previously interned at Apple, Google, and Meta.
Fei Xia
Google Gemini / Mistral AI
Senior Research Scientist at Google DeepMind working on Gemini, embodied AI, and multimodal foundation models for robotics and perception.
Tom Gunter
Apple
Research scientist at Apple Intelligence working on computer vision, machine learning, and natural language processing.
Xiang Kong
Apple
Distinguished scientist at Apple working on large language models and multimodal foundation models; previously held research roles at ByteDance AI Lab and MBZUAI.
Alex Beutel
Apple
Machine learning researcher at Apple focused on responsible and human-centered AI.
Bing Ren
Apple
Researcher working on on-device and foundation language models, including Apple Intelligence models.
Bowen Zhang
Apple
Research scientist at Apple working on large language models, vision-language models, and model scaling.
Dhruti Shah
Apple
Researcher working on machine learning, vision and language, computer vision, diffusion, and generative AI.
Harsha Nori
Apple
Harsha Nori is a Senior Principal Researcher at Microsoft Research whose work focuses on machine learning, artificial intelligence, and healthcare applications.
Jean-Philippe Fauconnier
Apple
Research scientist at Apple Foundation Models working on generative AI, large language models, and multimodal models.
Jin Xu
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Lukas Haas
Apple
Research scientist at Apple working on machine learning and AI, previously at Google Brain and Stanford, and a co-author of the Apple Intelligence Foundation Language Models reports.
Ming Lei
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Philipp Dufter
Apple
Research scientist at Apple Foundation Models with interests in natural language processing, structured generation, controllable generation, and algorithmic efficiency.
Raman Chopra
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Tengyun Huang
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Wojciech Zaremba
Apple
Wojciech Zaremba is an AI researcher and entrepreneur, and a co-founder of OpenAI.
Xianzhi Du
Apple
Research scientist at Apple working on language and vision-language modeling, AI agents, and post-training.
Yash Jernite
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Zhe Gan
Apple
Machine learning researcher at Apple working on large multimodal foundation models, video generation, and vision-language systems.
Aditya Siddhant
Cohere
Member of Technical Staff at Cohere Labs working on multilingual and multimodal language technologies.
Afshin Dehghan
Apple
Research scientist at Apple focused on computer vision, multimodal learning, and robotics.
Aleksei Timofeev
Apple
Research scientist whose public OpenReview profile lists work on multimodal representation learning, speech synthesis, and personalized voice generation.
Alexander Toshev
Apple
Computer vision and machine learning scientist at Apple whose work includes multimodal understanding and robotics, following earlier leadership roles at Google.
Amin Jalali
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Andy Yao
NVIDIA
Research scientist at NVIDIA with public publications on multimodal language models and visual instruction tuning, including NVLM, VILA, and Video2Flow.
Anton Belyi
Apple
Research scientist at Apple and adjunct professor at MIPT working in computer vision, image processing, and machine learning.
Caiming Xiong
NVIDIA
Vice President of AI Research and General Manager of AI Platforms at NVIDIA.
Cengiz Oztireli
Google Gemini
Senior staff research scientist at Google DeepMind and affiliated lecturer at Cambridge working on computer vision, machine learning, and computer graphics.
Daria Buchsbaum
Google Gemini
Daria Buchsbaum is a PhD student at Georgia Tech and a Research Scientist Intern at Google DeepMind.
Forrest Huang
Apple
Research scientist at Apple Foundation Models working on efficient training and multimodal language models.
Futang Peng
Apple
Research scientist at Apple focusing on understanding and generating text and images.
Greg Yang
Apple
AI researcher and deep learning theorist whose public work includes tensor programs and maximal update parameterization. He coauthored Apple's 2025 Apple Intelligence foundation language models technical report.
Hong-You Chen
Apple
AI and machine learning engineer at Apple working on multimodal foundation models; previously worked at Snap and the University of Southern California.
Hongyu He
Apple
Research scientist at Apple focused on computer vision, machine learning, and multimodal understanding.
Jose A. Arenas
Google Gemini
Staff software engineer at Google focused on machine learning and systems.
Keen You
Apple
Research scientist at Apple specializing in post-training, reinforcement learning, and AI agents.
Louis Borry
Google Gemini
Louis Borry is a PhD student at Google DeepMind working on embodied language models and grounded language understanding.
Mikel Arza
Google Gemini
Research scientist at Google DeepMind focused on robotics and machine learning, especially reinforcement learning and language models.
Mingfei Gao
Apple
Researcher working on machine learning, optimization, and sequential data.
Qiaozi Gao
Google Gemini
Qiaozi Gao is a Stanford PhD student whose work spans vision and language, machine learning, and robotics, with research internships at Google and Google DeepMind.
Ran Tian
NVIDIA
Research scientist at NVIDIA working on multimodal language models and vision-language research, with public publications including NVLM, VILA, and Visual Role Play.
Rulin Shao
NVIDIA
PhD student at UCLA and research intern at NVIDIA, working on multimodal reasoning, vision-language models, and embodied AI.
Sam Wiseman
Apple
Sam Wiseman is an assistant professor of computer science at New York University whose research focuses on natural language processing and machine learning, including controllable generation, summarization, and learning from human feedback.
Seb Noury
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report and work on the MLX framework.
Sergey Ioffe
Apple
Machine learning researcher whose work spans neural networks and statistics, and a co-author of Apple's Foundation Language Models report.
Shyamal Anadkat
NVIDIA
Research scientist at NVIDIA working on AI agents, multimodal systems, and robotics, including the NVLM project.
Soroosh Mariooryad
Apple
Senior research scientist at Apple with public publications spanning speech, audio, and language modeling, including work on speech language models and MemoryLLM.
Tanmay Shah
Apple
Apple researcher whose publications include the Apple Intelligence Foundation Language Models technical report.
Thomas Blankevoort
Google Gemini
Thomas Blankevoort is a Research Scientist at Google DeepMind whose work focuses on efficient neural networks and machine learning systems.
Tom Small
Apple
Researcher working on foundation language models and efficient inference, including Apple Intelligence models.
Weizhu Chen
NVIDIA
Distinguished scientist and managing director at Microsoft Research working on natural language processing and large language models.