Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Public profiles say he completed a Georgia Tech PhD in 2025 after earlier study at Rice and USTC, and his current work focuses on bringing frontier AI to everyday devices.

Xin Dong's homepage says he leads a research team on LLM training at Seed at ByteDance. It also states that he earned a Harvard PhD in 2023 and previously worked at NVIDIA, Meta, and Tencent.

Shizhe Diao develops methods to scale post-training and reinforcement learning for large language models and AI agents.

Matthijs Van keirsbilck is a Senior Research Scientist at NVIDIA working on neural network architecture design, structural sparsity, quantization, and training dynamics.

Hanrong Ye is a research scientist at NVIDIA Research in Santa Clara working on multi-task, multi-media, and multimodality models for machine understanding and generation. He earned a Ph.D. from HKUST, a master's degree from Peking University, and a B.S. from Sun Yat-sen University.

NVIDIA Research and Wonmin Byeon's personal site identify him as a researcher at NVIDIA Research in California. Public site materials describe interests in computer vision, robotics, recurrent and state-space models, sequence learning, and spatio-temporal learning.

OpenReview identifies Yashaswi Karnati as a researcher at NVIDIA. His personal homepage describes prior work across intelligent transportation, climate science, data compression, and healthcare, and records completed degrees from the University of Florida and IIT (ISM) Dhanbad.

Works on high-performance LLM inference and AutoDeploy at NVIDIA; previously led efficient-AI work at OmniML and earned graduate degrees at MIT CSAIL.

Nikolaus Binder is a senior research scientist at NVIDIA whose public research profile focuses on quasi-Monte Carlo methods, photorealistic image synthesis, ray tracing, and rendering algorithms.

NVIDIA's public author page identifies Maksim Khadkevich as a Senior Software Engineering Manager specializing in distributed inference systems and large language models. arXiv public sources also list him as a coauthor of Nemotron-Flash.

NVIDIA Research identifies Alexander Keller as a senior director of research, formerly chief scientist at mental images and previously a professor at Ulm University. His research interests are at the intersection of graphics, communications, and machine learning.

NVIDIA's research page describes Jan Kautz as vice president of Learning and Perception Research, working across computer vision, machine learning, computational photography, and geometric vision.

Official Georgia Tech and NVIDIA DLER pages list Yingyan Celine Lin as a Georgia Tech associate professor and a visiting professor collaborating with NVIDIA's deep learning research group.

Pavlo Molchanov leads deep learning efficiency work at NVIDIA Research, with public profiles covering LLM and VLM efficiency, model compression, adaptive inference, and earlier computer vision research.

Canonical link

Yonggan Fu

Xin Dong

Shizhe Diao

Matthijs Van keirsbilck

Hanrong Ye

Wonmin Byeon

Yashaswi Karnati

Lucas Liebenwein

Nikolaus Binder

Maksim Khadkevich

Alexander Keller

Jan Kautz

Yingyan Celine Lin

Pavlo Molchanov