Atlas / Fields / Detail
Alignment and RLHF
Researchers connected to this field in the public atlas.
Amanda Askell
Anthropic / OpenAI
Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.
Deep Ganguli
Anthropic
Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.
Tom Henighan
Anthropic
Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.
Sandipan Kundu
Anthropic
Sandipan Kundu is a member of technical staff at Anthropic. His public The Org profile says he previously held postdoctoral positions at Johns Hopkins and Cornell, worked and studied at the University of Texas at Austin, and earned a master's degree in physics from the Indian Institute of Technology Kanpur.
Sheer El-Showk
Anthropic
Sheer El-Showk is a member of technical staff at Anthropic. His public The Org profile says he is also CTO at Lore AI and founder of Nascent AI, previously worked as a senior software engineer at Coiled and held postdoctoral research fellowships at CERN and the CEA, and earned physics and mathematical physics degrees from UC Berkeley and the University of Amsterdam.
Neel Nanda
Anthropic
Neel Nanda is an independent researcher focused on mechanistic interpretability and understanding neural networks. He previously worked on Anthropic's model diffing team, did the ML Alignment & Theory Scholars program, studied mathematics at the University of Cambridge, and is known for interpretability tooling such as TransformerLens.
Saurav Kadavath
Anthropic
Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.
Anna Goldie
Anthropic
Founder and CEO of Ricursive Intelligence, a frontier AI lab targeting chip design; previously a Senior Staff Research Scientist at Google DeepMind and an early employee at Anthropic.
Liane Lovitt
Anthropic
Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.
Nelson Elhage
Anthropic
Nelson Elhage is an engineer and researcher at Anthropic, where he works on the pretraining team after earlier work on reverse-engineering large language models. He previously worked at Stripe and Ksplice/Oracle on systems software and is known for open-source systems projects such as livegrep and reptyr.
Kamal Ndousse
Anthropic
Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.
Chris Olah
Anthropic
Christopher Olah is a co-founder of Anthropic whose public writing focuses on interpretability, neural network circuits, and deep learning visualization. His homepage notes earlier work at OpenAI and Google Brain.
Noemi Mercado
Anthropic
Researcher at Anthropic whose public homepage and scholarly profile connect cognitive science research with AI.
Jack Clark
Anthropic / OpenAI
Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.
Azalia Mirhoseini
Anthropic
Research scientist at Anthropic working on machine learning systems and AI; previously worked on machine learning systems, compilers, and sustainability at Google.
Samuel R. Bowman
Anthropic
Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Dario Amodei
Anthropic / OpenAI
Co-founder and CEO of Anthropic.
Shauna Kravec
Anthropic
Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.
Zac Hatfield-Dodds
Anthropic
Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.
Andy Jones
Anthropic
Anthropic researcher working on machine learning and AI-assisted science; previously built tools for learning from text, images, and tabular data.
Nova DasSarma
Anthropic
Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.
Tom Conerly
Anthropic
Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.
Robert Lasenby
Anthropic
Research scientist at Anthropic working on reasoning and geometry-aware machine learning.
Jared D. Kaplan
Anthropic
Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.
Yuntao Bai
Anthropic
Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.
Jackson Kernion
Anthropic
Member of Anthropic's Interpretability team, where he works on understanding how large language models work.
Anna Chen
Anthropic
Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.
Dawn Drain
Anthropic
Dawn Drain is an Anthropic-affiliated researcher in the United States. Public sources list her as a coauthor of Anthropic's helpful and harmless assistant paper and show 2022 software engineering publication credits including work on code completion and automated repair with large language models.
Jared Mueller
Anthropic
Jared Mueller is affiliated with Anthropic. Public materials list him as an Anthropic participant at the 2023 Economics of Robots Conference, and the linked arXiv paper lists him as a coauthor on Anthropic's Constitutional AI work.
Joshua Landau
Anthropic
Joshua Landau is affiliated with Anthropic. Public Anthropic research materials list him as a coauthor of Measuring Progress on Scalable Oversight for Large Language Models, and the linked arXiv paper lists him as a coauthor of Constitutional AI.
Christopher Olah
Anthropic
Research scientist at Anthropic known for mechanistic interpretability work, including early research on feature visualization and circuits in neural networks.
Carroll Wainwright
Anthropic
Carroll "Max" Wainwright is a founder and AI advisor at Metaculus. Public OpenAI materials also list him among contributors to ChatGPT, GPT-4, and GPT-4o-era work, and a 2025 OpenAI filing describes him as a former OpenAI alignment researcher.
Catherine Olsson
Anthropic
Catherine Olsson is an AI alignment researcher and writer whose public website and Anthropic author page describe work on AI safety, interpretability, and building helpful, harmless assistants.
Ben Mann
Anthropic
A public Anthropic/AWS presentation describes Ben Mann as an Anthropic co-founder and former GPT-3 and API engineer at OpenAI. The previously attached benmann.com homepage is now a parked domain rather than a personal research site.
Timothy Telleen-Lawton
Anthropic
Timothy Telleen-Lawton is an independent researcher focused on inspiring and scaling collective intelligence. His public LessWrong profile says he previously served as Head of Procurement at Anthropic starting in 2021, and also lists prior work at CFAR and GiveWell.
Kamile Lukosuite
Anthropic
AI governance researcher at the Centre for the Governance of AI and former Anthropic resident researcher, with interests in language models, AI safety, scalable oversight, and evaluations.
Sam McCandlish
Anthropic
Sam McCandlish is listed as an author of the Anthropic technical report Collective Constitutional AI: Aligning a Language Model with Public Input.
Nicholas Joseph
Anthropic
Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.
Tom Brown
Anthropic
Research scientist at Anthropic working on model behavior and interpretability.
Scott Johnston
Anthropic
Software engineer at Anthropic working on infrastructure, tooling, model behavior, and multimodal systems.
Stanislav Fort
Anthropic
Member of Technical Staff at Anthropic whose work focuses on understanding, evaluating, and improving large language models, with emphasis on reasoning, safety, and generalization.
Tristan Hume
Anthropic
Member of technical staff at Anthropic working on AI systems and alignment, with published work on RLHF and constitutional methods for harmless assistants.
Josh Landau
Anthropic
Josh Landau is listed as an author of the Anthropic technical report Collective Constitutional AI: Aligning a Language Model with Public Input.
Herbie Bradley
Anthropic
Computer scientist and machine learning researcher with public work spanning AI systems and alignment-related research.
Jamie Kerr
Anthropic
Researcher working on AI safety and alignment, including Constitutional AI.
Sam Ringer
Anthropic
Member of technical staff at Anthropic working on large language model training, evaluation, and interpretability.
Sheer El Showk
Anthropic
Research scientist at Anthropic working on machine learning, causality, and computational biology.
Danny Hernandez
Anthropic
Public Anthropic research pages list Danny Hernandez as a co-author on alignment and scaling-law work.