LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Alignment and RLHF report from Anthropic with 31 connected researchers in the LLMpeople atlas.

Anthropic2022-04-1231 researchers
Field
Alignment and RLHF
Organization
Anthropic
arXiv
2204.05862

Canonical link

https://arxiv.org/abs/2204.05862

Connected researchers

Dario Amodei portrait
Researcher 5 reports

Dario Amodei

Anthropic / OpenAI

Co-founder and CEO of Anthropic.

AnthropicOpenAI
United States
Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.

AnthropicOpenAI
United States
Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

AnthropicOpenAI
Yuntao Bai portrait
Researcher 4 reports

Yuntao Bai

Anthropic

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Anthropic
Andy Jones portrait
Researcher 2 reports

Andy Jones

Anthropic

Anthropic researcher working on machine learning and AI-assisted science; previously built tools for learning from text, images, and tabular data.

Anthropic
Kamal Ndousse portrait
Researcher 5 reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.

Anthropic
Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Anthropic
Dawn Drain portrait
Researcher 2 reports

Dawn Drain

Anthropic

Dawn Drain is an Anthropic-affiliated researcher in the United States. Public sources list her as a coauthor of Anthropic's helpful and harmless assistant paper and show 2022 software engineering publication credits including work on code completion and automated repair with large language models.

Anthropic
Stanislav Fort portrait
Researcher 2 reports

Stanislav Fort

Anthropic

Member of Technical Staff at Anthropic whose work focuses on understanding, evaluating, and improving large language models, with emphasis on reasoning, safety, and generalization.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Anthropic
United States
Tom Henighan portrait
Researcher 3 reports

Tom Henighan

Anthropic

Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.

Anthropic
Nicholas Joseph portrait
Researcher 3 reports

Nicholas Joseph

Anthropic

Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.

Anthropic
Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.

Anthropic
Jackson Kernion portrait
Researcher 3 reports

Jackson Kernion

Anthropic

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Anthropic
Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Anthropic
Sheer El-Showk portrait
Researcher 2 reports

Sheer El-Showk

Anthropic

Sheer El-Showk is a member of technical staff at Anthropic. His public The Org profile says he is also CTO at Lore AI and founder of Nascent AI, previously worked as a senior software engineer at Coiled and held postdoctoral research fellowships at CERN and the CEA, and earned physics and mathematical physics degrees from UC Berkeley and the University of Amsterdam.

Anthropic
Nelson Elhage portrait
Researcher 3 reports

Nelson Elhage

Anthropic

Nelson Elhage is an engineer and researcher at Anthropic, where he works on the pretraining team after earlier work on reverse-engineering large language models. He previously worked at Stripe and Ksplice/Oracle on systems software and is known for open-source systems projects such as livegrep and reptyr.

Anthropic
Zac Hatfield-Dodds portrait
Researcher 3 reports

Zac Hatfield-Dodds

Anthropic

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Anthropic
Danny Hernandez portrait
Researcher 1 reports

Danny Hernandez

Anthropic

Public Anthropic research pages list Danny Hernandez as a co-author on alignment and scaling-law work.

Anthropic
Tristan Hume portrait
Researcher 2 reports

Tristan Hume

Anthropic

Member of technical staff at Anthropic working on AI systems and alignment, with published work on RLHF and constitutional methods for harmless assistants.

Anthropic
Scott Johnston portrait
Researcher 2 reports

Scott Johnston

Anthropic

Software engineer at Anthropic working on infrastructure, tooling, model behavior, and multimodal systems.

Anthropic
Shauna Kravec portrait
Researcher 3 reports

Shauna Kravec

Anthropic

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Anthropic
United States
Liane Lovitt portrait
Researcher 2 reports

Liane Lovitt

Anthropic

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Anthropic
5 likes

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms