Interpretability

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Co-founder and head of alignment science at Anthropic.

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.

Researcher in alignment science at Anthropic focused on AI safety and alignment.

Emeritus Professor of Cognitive Robotics at Imperial College London whose public work focuses on artificial intelligence, robotics, and consciousness.

Computer scientist and robotics researcher whose public work focuses on reinforcement learning, imitation learning, and large-scale AI systems.

Alignment science researcher at Anthropic whose work focuses on black-box evaluations, white-box evaluations, and AI risk.

Associate Professor in the Technion Faculty of Data and Decision Sciences and a visiting research professor at Google working on natural language processing and machine learning.

Samuel Marks

David Duvenaud

Nora Belrose

David Bau

Josh Batson

Ethan Perez

Nicholas Schiefer

Deep Ganguli

Alex Tamkin

Buck Shlegeris

Jared Kaplan

Alex Turner

Murray Shanahan

Pieter Abbeel

Stephen Casper

Yonatan Belinkov