LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Many-shot Jailbreaking

Alignment and Safety report from Anthropic with 14 connected researchers in the LLMpeople atlas.

Anthropic2024-02-1214 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2402.03206

Canonical link

https://arxiv.org/abs/2402.03206

Connected researchers

Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States
Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

AnthropicOpenAI
Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.

Anthropic
Dan Hendrycks portrait
Researcher 1 reports

Dan Hendrycks

Anthropic

AI safety researcher and director of the Center for AI Safety; advisor to xAI and Scale AI, previously an advisor to OpenAI and Anthropic.

Anthropic
Carina Kauf portrait
Researcher 1 reports

Carina Kauf

Anthropic

Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.

Anthropic
Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.

Anthropic
Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Research scientist at Anthropic interested in understanding and steering AI systems.

Anthropic
Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.

Anthropic
Esin Durmus portrait
Researcher 1 reports

Esin Durmus

Anthropic

Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.

Anthropic
Rylan Schaeffer portrait
Researcher 1 reports

Rylan Schaeffer

Anthropic

Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms