Alignment and RLHF | Field

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Researcher at Anthropic whose public homepage and scholarly profile connect cognitive science research with AI.

Research scientist at Anthropic working on machine learning systems and AI; previously worked on machine learning systems, compilers, and sustainability at Google.

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Anthropic researcher working on machine learning and AI-assisted science; previously built tools for learning from text, images, and tabular data.

Research scientist known for mechanistic interpretability and deep learning visualization, previously at Google Brain and OpenAI.

Research scientist at Anthropic working on reasoning and geometry-aware machine learning.

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Independent researcher working on the theoretical foundations of AI, especially inductive biases, scaling laws, and approximate Bayesian updating. His public homepage notes prior research roles at Anthropic and OpenAI.

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Research scientist at Anthropic known for mechanistic interpretability work, including early research on feature visualization and circuits in neural networks.

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Catherine Olsson is an AI alignment researcher and writer whose public website and Anthropic author page describe work on AI safety, interpretability, and building helpful, harmless assistants.

AI governance researcher at the Centre for the Governance of AI and former Anthropic resident researcher, with interests in language models, AI safety, scalable oversight, and evaluations.

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Co-founder and head of alignment science at Anthropic.

CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.

Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.

Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.

Research scientist at Anthropic interested in understanding and steering AI systems.

Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.

Researcher interested in neural networks and their potential to achieve general intelligence. His public homepage notes prior roles as a cofounder at Anthropic, researcher at OpenAI, and member of the startup team at Stripe.

Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.

Research scientist at Anthropic working on model behavior and interpretability.

Software engineer at Anthropic working on infrastructure, tooling, model behavior, and multimodal systems.

Member of Technical Staff at Anthropic whose work focuses on understanding, evaluating, and improving large language models, with emphasis on reasoning, safety, and generalization.

Member of technical staff at Anthropic working on AI systems and alignment, with published work on RLHF and constitutional methods for harmless assistants.

Research scientist working on scalable systems and machine learning; her public homepage notes previous work at Anthropic and current work on Gemini at Google DeepMind.

Research scientist at Anthropic focused on alignment, reasoning, agents, and complex systems.

Anthropic researcher working on the economics of AI and scaling laws.

Computer scientist and machine learning researcher with public work spanning AI systems and alignment-related research.

Researcher working on AI safety and alignment, including Constitutional AI.

Member of technical staff at Anthropic working on large language model training, evaluation, and interpretability.

Research scientist at Anthropic working on machine learning, causality, and computational biology.

Liane Lovitt

Samuel R. Bowman

Noemi Mercado

Azalia Mirhoseini

Jack Clark

Shauna Kravec

Zac Hatfield-Dodds

Andy Jones

Chris Olah

Robert Lasenby

Amanda Askell

Jared D. Kaplan

Yuntao Bai

Sam McCandlish

Jackson Kernion

Christopher Olah

Kamal Ndousse

Catherine Olsson

Kamile Lukosuite

Ethan Perez

Nicholas Schiefer

Deep Ganguli

Dario Amodei

Nova DasSarma

Anna Chen

Saurav Kadavath

Tom Conerly

Ben Mann

Nicholas Joseph

Tom Brown

Scott Johnston

Stanislav Fort

Tristan Hume

Anna Goldie

Carroll Wainwright

Danny Hernandez

Herbie Bradley

Jamie Kerr

Sam Ringer

Sheer El Showk