Atlas / Reports / Detail
Auditing language models for hidden objectives
Alignment and Safety
Connected researchers
Samuel R. Bowman
Anthropic
Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.
Amanda Askell
Anthropic / OpenAI
Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.
David Bau
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Sören Mindermann
Anthropic
Research scientist at Anthropic working on machine learning and AI safety.
Jan Leike
Anthropic
Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.
Josh Batson
Anthropic
Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.
Henry Sleight
Anthropic
PhD student at the University of Oxford working on AI safety, including scalable oversight and interpretability.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Owain Evans
Anthropic
Assistant Professor of Computer Science at the University of Oxford whose research spans generalization, reasoning, and large language model agents.
Scott Emmons
Anthropic
Member of Technical Staff at Anthropic working on AI control, hidden objectives, alignment, and evaluations, with a background in language models, efficient training, and scientific machine learning.
Benjamin Lermen
Anthropic
Profile still being enriched.
Chenyan Zhang
Anthropic
Profile still being enriched.