Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

Assistant Professor of Philosophy at The University of Hong Kong and Research Fellow at Anthropic, working in ethics, epistemology, and social and political philosophy.

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Co-founder and head of alignment science at Anthropic.

CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.

Member of technical staff at Anthropic working on deep learning, mechanistic interpretability, and AI safety.

Canonical link

Samuel Marks

Jack Clark

Simon Goldstein

Amanda Askell

Kamal Ndousse

Jan Leike

Ethan Perez

Nicholas Schiefer

Deep Ganguli

Dario Amodei

Alex Tamkin

Beth Barnes

Jared Kaplan

Wes Gurnee