LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Auditing language models for hidden objectives

Alignment and Safety

AnthropicUndated13 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2507.11473

Canonical link

https://arxiv.org/abs/2507.11473

Connected researchers

Profile Reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States 5
Profile Reports

Amanda Askell

Anthropic / OpenAI

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

AnthropicOpenAI
Unknown 7
Profile Reports

David Bau

Anthropic

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Anthropic
United States 3
Profile Reports

Sören Mindermann

Anthropic

Research scientist at Anthropic working on machine learning and AI safety.

Anthropic
Unknown 3
Profile Reports

Jan Leike

Anthropic

Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.

Anthropic
Unknown 2
Profile Reports

Josh Batson

Anthropic

Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.

Anthropic
Unknown 2
Profile Reports

Henry Sleight

Anthropic

PhD student at the University of Oxford working on AI safety, including scalable oversight and interpretability.

Anthropic
Unknown 1
Profile Reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Unknown 8
Profile Reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Unknown 8
Profile Reports

Owain Evans

Anthropic

Assistant Professor of Computer Science at the University of Oxford whose research spans generalization, reasoning, and large language model agents.

Anthropic
United Kingdom 1
Profile Reports

Scott Emmons

Anthropic

Member of Technical Staff at Anthropic working on AI control, hidden objectives, alignment, and evaluations, with a background in language models, efficient training, and scientific machine learning.

Anthropic
Unknown 1
Profile Reports

Benjamin Lermen

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Chenyan Zhang

Anthropic

Profile still being enriched.

Anthropic
Unknown 1

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.