updated 2 public sources
reinforcement learningRLHFLLM post-trainingreasoning models

Current frame

Researcher in RLHF, LLM post-training, and reinforcement learning systems

Extended note

Public sources link Chi Zhang to ByteDance Seed authorship on Seed1.5-Thinking and to the disambiguated DBLP profile "Chi Zhang 0022" with ORCID 0000-0001-7374-1940. That DBLP record includes HybridFlow: A Flexible and Efficient RLHF Framework, Laminar: A Scalable Asynchronous RL Post-Training Framework, and earlier papers on deep reinforcement learning systems.