Atlas / Reports / Detail
Many-shot Jailbreaking
Alignment and Safety report from Anthropic with 17 connected researchers in the LLMpeople atlas.
Connected researchers
Jack Clark
Anthropic / OpenAI
Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.
Anna Chen
Anthropic
Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.
Nova DasSarma
Anthropic
Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.
Deep Ganguli
Anthropic
Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.
Saurav Kadavath
Anthropic
Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.
Tom Conerly
Anthropic
Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.
Jared D. Kaplan
Anthropic
Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.
Sandipan Kundu
Anthropic
Sandipan Kundu is a member of technical staff at Anthropic. His public The Org profile says he previously held postdoctoral positions at Johns Hopkins and Cornell, worked and studied at the University of Texas at Austin, and earned a master's degree in physics from the Indian Institute of Technology Kanpur.
Samuel R. Bowman
Anthropic
Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Esin Durmus
Anthropic
Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.
Rylan Schaeffer
Anthropic
Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.
Carina Kauf
Anthropic
Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.
Mantas Mazeika
Anthropic
Mantas Mazeika is listed as an author of the Anthropic technical report Many-shot Jailbreaking.
David McDougall
Anthropic
David McDougall is listed as an author of the Anthropic technical report Many-shot Jailbreaking.
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Dan Hendrycks
Anthropic
Dan Hendrycks is the executive director of the Center for AI Safety and an advisor to xAI and Scale AI. His public homepage also says he received his PhD in AI from UC Berkeley and highlights contributions including GELU, robustness benchmarks, and MMLU.