Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Computer scientist and engineer credited on OpenAI's GPT-4 public contributions page; OpenAI's 2016 team update says he previously led Dropbox's core file sync team after earlier work in Pieter Abbeel's Berkeley robotics lab.

Yeyun Gong is a researcher and engineering leader focused on multimodal large language models, grounding, and large-scale knowledge systems. His homepage lists selected work including Qwen2-Audio.

Research intern at Alibaba Group focused on multimodal understanding and generation, large multimodal models, and reinforcement learning; coauthor of Qwen2-Audio.

Research scientist in Tongyi Lab and technical lead of Qwen2-Audio, with public work on audio-language models.

Yushi Hu is a senior research engineer at Shanghai AI Laboratory and a founding member of OpenMMLab. Public arXiv records also list him as a coauthor of Qwen2-Audio.

Chao Zhang is an applied scientist in the Alibaba Foundation Model team. His public profile notes a PhD in computer science from the University of Illinois Urbana-Champaign and research interests in NLP, large language models, reasoning, and multimodal generation.

Researcher whose arXiv author results include Qwen-Audio and related audio-language modeling work.

Canonical link

Jie Tang

Yeyun Gong

Mingyang Shang

Yaqi Wang

Yushi Hu

Chao Zhang

Hongyin Luo