LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Auditing language models for hidden objectives

Alignment and Safety report from Anthropic with 11 connected researchers in the LLMpeople atlas.

AnthropicUndated11 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2507.11473

Canonical link

https://arxiv.org/abs/2507.11473

Connected researchers

Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States
Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

AnthropicOpenAI
David Bau portrait
Researcher 3 reports

David Bau

Anthropic

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Anthropic
United States
Sören Mindermann portrait
Researcher 3 reports

Sören Mindermann

Anthropic

Research scientist at Anthropic working on machine learning and AI safety.

Anthropic
Jan Leike portrait
Researcher 2 reports

Jan Leike

Anthropic

Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.

Anthropic
Josh Batson portrait
Researcher 2 reports

Josh Batson

Anthropic

Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.

Anthropic
Henry Sleight portrait
Researcher 1 reports

Henry Sleight

Anthropic

PhD student at the University of Oxford working on AI safety, including scalable oversight and interpretability.

Anthropic
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Owain Evans portrait
Researcher 1 reports

Owain Evans

Anthropic

Assistant Professor of Computer Science at the University of Oxford whose research spans generalization, reasoning, and large language model agents.

Anthropic
United Kingdom
Scott Emmons portrait
Researcher 1 reports

Scott Emmons

Anthropic

Member of Technical Staff at Anthropic working on AI control, hidden objectives, alignment, and evaluations, with a background in language models, efficient training, and scientific machine learning.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms