LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Alignment and Safety report from Anthropic with 7 connected researchers in the LLMpeople atlas.

AnthropicUndated7 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2501.18837

Canonical link

https://arxiv.org/abs/2501.18837

Connected researchers

Liane Lovitt portrait
Researcher 2 reports

Liane Lovitt

Anthropic

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Anthropic
5 likes
Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Alex Tamkin portrait
Researcher 3 reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Beth Barnes portrait
Researcher 2 reports

Beth Barnes

Anthropic

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Anthropic
Alexey Nazarov portrait
Researcher 1 reports

Alexey Nazarov

Anthropic

Member of technical staff at Anthropic focused on safe and reliable AI.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms