LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Constitutional AI: Harmlessness from AI Feedback

Alignment and RLHF report from Anthropic with 35 connected researchers in the LLMpeople atlas.

Anthropic2022-12-1535 researchers
Field
Alignment and RLHF
Organization
Anthropic
arXiv
2212.08073

Canonical link

https://arxiv.org/abs/2212.08073

Connected researchers

Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States
Noemi Mercado portrait
Researcher 1 reports

Noemi Mercado

Anthropic

Researcher at Anthropic whose public homepage and scholarly profile connect cognitive science research with AI.

Anthropic
Azalia Mirhoseini portrait
Researcher 1 reports

Azalia Mirhoseini

Anthropic

Research scientist at Anthropic working on machine learning systems and AI; previously worked on machine learning systems, compilers, and sustainability at Google.

Anthropic
Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

AnthropicOpenAI
Shauna Kravec portrait
Researcher 3 reports

Shauna Kravec

Anthropic

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Anthropic
United States
Zac Hatfield-Dodds portrait
Researcher 3 reports

Zac Hatfield-Dodds

Anthropic

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Anthropic
Chris Olah portrait
Researcher 2 reports

Chris Olah

Anthropic

Research scientist known for mechanistic interpretability and deep learning visualization, previously at Google Brain and OpenAI.

Anthropic
Robert Lasenby portrait
Researcher 1 reports

Robert Lasenby

Anthropic

Research scientist at Anthropic working on reasoning and geometry-aware machine learning.

Anthropic
Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

AnthropicOpenAI
Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.

Anthropic
Yuntao Bai portrait
Researcher 4 reports

Yuntao Bai

Anthropic

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Anthropic
Sam McCandlish portrait
Researcher 3 reports

Sam McCandlish

Anthropic

Independent researcher working on the theoretical foundations of AI, especially inductive biases, scaling laws, and approximate Bayesian updating. His public homepage notes prior research roles at Anthropic and OpenAI.

Anthropic
Jackson Kernion portrait
Researcher 3 reports

Jackson Kernion

Anthropic

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Anthropic
Christopher Olah portrait
Researcher 1 reports

Christopher Olah

Anthropic

Research scientist at Anthropic known for mechanistic interpretability work, including early research on feature visualization and circuits in neural networks.

Anthropic
Kamal Ndousse portrait
Researcher 5 reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Kamile Lukosuite portrait
Researcher 1 reports

Kamile Lukosuite

Anthropic

AI governance researcher at the Centre for the Governance of AI and former Anthropic resident researcher, with interests in language models, AI safety, scalable oversight, and evaluations.

Anthropic
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Dario Amodei portrait
Researcher 5 reports

Dario Amodei

Anthropic / OpenAI

CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.

AnthropicOpenAI
Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.

Anthropic
Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.

Anthropic
Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Research scientist at Anthropic interested in understanding and steering AI systems.

Anthropic
Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms