Atlas / Reports / Detail
Many-shot Jailbreaking
Alignment and Safety
Connected researchers
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Samuel R. Bowman
Anthropic
Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.
Jack Clark
Anthropic / OpenAI
Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.
Jared D. Kaplan
Anthropic
Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.
Dan Hendrycks
Anthropic
AI safety researcher and director of the Center for AI Safety; advisor to xAI and Scale AI, previously an advisor to OpenAI and Anthropic.
Carina Kauf
Anthropic
Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Deep Ganguli
Anthropic
Co-founder and head of alignment science at Anthropic.
Nova DasSarma
Anthropic
Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.
Anna Chen
Anthropic
Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.
Saurav Kadavath
Anthropic
Research scientist at Anthropic interested in understanding and steering AI systems.
Tom Conerly
Anthropic
Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.
Esin Durmus
Anthropic
Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.
Rylan Schaeffer
Anthropic
Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.
Sandipan Kundu
Anthropic
Profile still being enriched.
David McDougall
Anthropic
Profile still being enriched.
Mantas Mazeika
Anthropic
Profile still being enriched.