LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Many-shot Jailbreaking

Alignment and Safety report from Anthropic with 17 connected researchers in the LLMpeople atlas.

Anthropic2024-02-1217 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2402.03206

Canonical link

https://arxiv.org/abs/2402.03206

Connected researchers

Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

AnthropicOpenAI
Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.

Anthropic
Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Anthropic
United States
Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.

Anthropic
Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Anthropic
Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.

Anthropic
Sandipan Kundu portrait
Researcher 2 reports

Sandipan Kundu

Anthropic

Sandipan Kundu is a member of technical staff at Anthropic. His public The Org profile says he previously held postdoctoral positions at Johns Hopkins and Cornell, worked and studied at the University of Texas at Austin, and earned a master's degree in physics from the Indian Institute of Technology Kanpur.

Anthropic
Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Esin Durmus portrait
Researcher 1 reports

Esin Durmus

Anthropic

Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.

Anthropic
Rylan Schaeffer portrait
Researcher 1 reports

Rylan Schaeffer

Anthropic

Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.

Anthropic
Carina Kauf portrait
Researcher 1 reports

Carina Kauf

Anthropic

Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.

Anthropic
Mantas Mazeika portrait
Researcher 1 reports

Mantas Mazeika

Anthropic

Mantas Mazeika is listed as an author of the Anthropic technical report Many-shot Jailbreaking.

Anthropic
David McDougall portrait
Researcher 1 reports

David McDougall

Anthropic

David McDougall is listed as an author of the Anthropic technical report Many-shot Jailbreaking.

Anthropic
Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Dan Hendrycks portrait
Researcher 1 reports

Dan Hendrycks

Anthropic

Dan Hendrycks is the executive director of the Center for AI Safety and an advisor to xAI and Scale AI. His public homepage also says he received his PhD in AI from UC Berkeley and highlights contributions including GELU, robustness benchmarks, and MMLU.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms