Chameleon: Mixed-Modal Early-Fusion Foundation Models

Hao Yang works on multimodal data infrastructure at Moonshot.ai. He previously worked at ByteDance ICVG and Microsoft Research Asia, and received BS and PhD degrees from Tsinghua University.

Mingze Li is listed as an author of the Qwen technical report Qwen3 Technical Report.

Armand Joulin is listed as an author of the Meta AI technical report Llama 2: Open Foundation and Fine-Tuned Chat Models.

Faisal Azhar is a PhD candidate in computer science at Stanford University. His work focuses on multimodal systems that unify text, image, and speech, together with efficient training and inference for large-scale machine learning.

Bruno Lefaudeux is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Senior AI research scientist at Meta and affiliate researcher at MIT working on computer vision and machine learning.

Jules Ponce is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Research scientist at Meta working on multimodal reasoning, vision-language models, multimodal generation, and compression. His homepage highlights a background spanning machine learning, computer vision, and NLP.

Research scientist at FAIR working on multimodal systems.

Mingyang Chen is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Professor in computer science and engineering at the University of Washington, scientist at the Allen Institute for Artificial Intelligence, and co-director of the UW NLP group.

Research scientist at Meta AI focused on multimodal and embodied AI, with interests in computer vision, deep learning, and decision making.

Christopher Pal is a professor and AI researcher whose public work spans deep learning, multimodal learning, and large language models.

Research scientist at Meta whose public work covers embodied AI, language agents, and multimodal systems; his arXiv author results include the Chameleon multimodal model paper.

Research scientist at Meta AI focused on vision-language models, large language models, and agents; public work includes the multimodal foundation model Chameleon.

Alaaeldin El-Nouby is a machine learning researcher whose public work includes multimodal and vision-language models.

Research scientist at Meta working on embodied AI, robotics, and reinforcement learning.

Luyao Yuan is a research scientist at FAIR at Meta. Her homepage says her research aims to build AI systems that can see, learn, reason, and interact like humans, and that she completed a PhD in EECS at MIT advised by Antonio Torralba after earlier research with Song Han at MIT and Jiajun Wu at Stanford.

Wesley H. Tiong is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

ATHENA Research Center's profile describes Athanasios (Nassos) Katsamanis as a principal researcher there since 2019, focusing on multimodal speech processing, multimodal human-computer interaction, and human behavior analysis.

Research scientist at Meta working on computer vision and multimodal foundation models with an emphasis on robustness, trustworthiness, and alignment.

Yuchen Yang is listed as an author of the Meta AI technical report Chameleon: Mixed-Modal Early-Fusion Foundation Models.

Khaled Saeed is a Research Scientist at Meta working on efficient multimodal reasoning and AI systems.

Computer scientist known for work in computer vision, machine learning, and human-centered AI.

Canonical link

Hao Yang

Mingze Li

Armand Joulin

Faisal Azhar

Bruno Lefaudeux

Alberto Mario Cadeddu

Jules Ponce

Madhu Krishna

Geneviève Dorkenwald

Mingyang Chen

Luke M. Zettlemoyer

Srujana Merugu

Christopher Pal

Udit Sodhi

Chenguang Zhu

Alaaeldin El-Nouby

Tianhe Yu

Luyao Yuan

Wesley H. Tiong

Asterios Katsamanis

Nicholas Crane

Yuchen Yang

Khaled Saeed

Fei-Fei Li