Atlas / Reports / Detail
Tracing the thoughts of a large language model
Interpretability
Connected researchers
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
David Duvenaud
Anthropic
Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.
Nora Belrose
Anthropic
Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.
David Bau
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Josh Batson
Anthropic
Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Deep Ganguli
Anthropic
Co-founder and head of alignment science at Anthropic.
Alex Tamkin
Anthropic
Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.
Buck Shlegeris
Anthropic
Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.
Jared Kaplan
Anthropic
Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.
Alex Turner
Anthropic
Researcher in alignment science at Anthropic focused on AI safety and alignment.
Murray Shanahan
Anthropic
Emeritus Professor of Cognitive Robotics at Imperial College London whose public work focuses on artificial intelligence, robotics, and consciousness.
Pieter Abbeel
Anthropic
Computer scientist and robotics researcher whose public work focuses on reinforcement learning, imitation learning, and large-scale AI systems.
Aengus Lynch
Anthropic
Profile still being enriched.
Nikhil Prakash
Anthropic
Profile still being enriched.
Will McCrostie
Anthropic
Profile still being enriched.
Andy Zou
Anthropic
Profile still being enriched.
Brian C. Smith
Anthropic
Profile still being enriched.
Canal Yuen
Anthropic
Profile still being enriched.
Carl Vondrick
Anthropic
Profile still being enriched.
David Janz
Anthropic
Profile still being enriched.
Dion Lampris
Anthropic
Profile still being enriched.
Henk Tillman
Anthropic
Profile still being enriched.