LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Many-shot Jailbreaking

Alignment and Safety

Anthropic2024-02-1217 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2402.03206

Canonical link

https://arxiv.org/abs/2402.03206

Connected researchers

Profile Reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Unknown 6
Profile Reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States 5
Profile Reports

Jack Clark

Anthropic / OpenAI

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

AnthropicOpenAI
Unknown 7
Profile Reports

Jared D. Kaplan

Anthropic

Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.

Anthropic
Unknown 6
Profile Reports

Dan Hendrycks

Anthropic

AI safety researcher and director of the Center for AI Safety; advisor to xAI and Scale AI, previously an advisor to OpenAI and Anthropic.

Anthropic
Unknown 1
Profile Reports

Carina Kauf

Anthropic

Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.

Anthropic
Unknown 1
Profile Reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Unknown 8
Profile Reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Unknown 6
Profile Reports

Nova DasSarma

Anthropic

Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.

Anthropic
Unknown 5
Profile Reports

Anna Chen

Anthropic

Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.

Anthropic
Unknown 4
Profile Reports

Saurav Kadavath

Anthropic

Research scientist at Anthropic interested in understanding and steering AI systems.

Anthropic
Unknown 4
Profile Reports

Tom Conerly

Anthropic

Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.

Anthropic
Unknown 4
Profile Reports

Esin Durmus

Anthropic

Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.

Anthropic
Unknown 1
Profile Reports

Rylan Schaeffer

Anthropic

Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.

Anthropic
Unknown 1
Profile Reports

Sandipan Kundu

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

David McDougall

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Mantas Mazeika

Anthropic

Profile still being enriched.

Anthropic
Unknown 1

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.