Atlas / Reports / Detail
Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Alignment and Safety
Connected researchers
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Jack Clark
Anthropic / OpenAI
Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.
Simon Goldstein
Anthropic
Assistant Professor of Philosophy at The University of Hong Kong and Research Fellow at Anthropic, working in ethics, epistemology, and social and political philosophy.
Amanda Askell
Anthropic / OpenAI
Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.
Kamal Ndousse
Anthropic
Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.
Jan Leike
Anthropic
Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Deep Ganguli
Anthropic
Co-founder and head of alignment science at Anthropic.
Dario Amodei
Anthropic / OpenAI
CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.
Alex Tamkin
Anthropic
Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.
Beth Barnes
Anthropic
President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.
Jared Kaplan
Anthropic
Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.
Wes Gurnee
Anthropic
Member of technical staff at Anthropic working on deep learning, mechanistic interpretability, and AI safety.
Tom Henighan
Anthropic
Profile still being enriched.
Aengus Lynch
Anthropic
Profile still being enriched.
Jacob Hilton
Anthropic
Profile still being enriched.
William Saunders
Anthropic
Profile still being enriched.
Will McCrostie
Anthropic
Profile still being enriched.
Yanda Chen
Anthropic
Profile still being enriched.
Avital Oliver
Anthropic
Profile still being enriched.
Cameron Raymond
Anthropic
Profile still being enriched.
Dylan Hadfield-Menell
Anthropic
Profile still being enriched.
Jules Christmann
Anthropic
Profile still being enriched.