LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Alignment and RLHF

Anthropic2022-04-1231 researchers
Field
Alignment and RLHF
Organization
Anthropic
arXiv
2204.05862

Canonical link

https://arxiv.org/abs/2204.05862

Connected researchers

Profile Reports

Liane Lovitt

Anthropic

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Anthropic
Unknown 2
Profile Reports

Jack Clark

Anthropic / OpenAI

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

AnthropicOpenAI
Unknown 7
Profile Reports

Shauna Kravec

Anthropic

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Anthropic
United States 3
Profile Reports

Zac Hatfield-Dodds

Anthropic

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Anthropic
Unknown 3
Profile Reports

Andy Jones

Anthropic

Anthropic researcher working on machine learning and AI-assisted science; previously built tools for learning from text, images, and tabular data.

Anthropic
Unknown 2
Profile Reports

Chris Olah

Anthropic

Research scientist known for mechanistic interpretability and deep learning visualization, previously at Google Brain and OpenAI.

Anthropic
Unknown 2
Profile Reports

Amanda Askell

Anthropic / OpenAI

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

AnthropicOpenAI
Unknown 7
Profile Reports

Jared D. Kaplan

Anthropic

Anthropic co-founder and Chief Science Officer. Formerly a physicist at Johns Hopkins, he helped develop scaling laws for neural language models and works on the science and safety of large AI systems.

Anthropic
Unknown 6
Profile Reports

Yuntao Bai

Anthropic

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Anthropic
Unknown 4
Profile Reports

Sam McCandlish

Anthropic

Independent researcher working on the theoretical foundations of AI, especially inductive biases, scaling laws, and approximate Bayesian updating. His public homepage notes prior research roles at Anthropic and OpenAI.

Anthropic
Unknown 3
Profile Reports

Jackson Kernion

Anthropic

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Anthropic
Unknown 3
Profile Reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Unknown 5
Profile Reports

Catherine Olsson

Anthropic

Catherine Olsson is an AI alignment researcher and writer whose public website and Anthropic author page describe work on AI safety, interpretability, and building helpful, harmless assistants.

Anthropic
Unknown 2
Profile Reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Unknown 6
Profile Reports

Dario Amodei

Anthropic / OpenAI

CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.

AnthropicOpenAI
Unknown 5
Profile Reports

Nova DasSarma

Anthropic

Research scientist at Anthropic interested in understanding neural networks and applying that understanding to alignment.

Anthropic
Unknown 5
Profile Reports

Anna Chen

Anthropic

Researcher working on AI safety and adversarial evaluation, including Anthropic many-shot jailbreaking research.

Anthropic
Unknown 4
Profile Reports

Saurav Kadavath

Anthropic

Research scientist at Anthropic interested in understanding and steering AI systems.

Anthropic
Unknown 4
Profile Reports

Tom Conerly

Anthropic

Software engineer at Anthropic, previously at Google, with public writing on language models, agents, and reinforcement learning.

Anthropic
Unknown 4
Profile Reports

Ben Mann

Anthropic

Researcher interested in neural networks and their potential to achieve general intelligence. His public homepage notes prior roles as a cofounder at Anthropic, researcher at OpenAI, and member of the startup team at Stripe.

Anthropic
Unknown 3
Profile Reports

Nicholas Joseph

Anthropic

Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.

Anthropic
Unknown 3
Profile Reports

Tom Brown

Anthropic

Research scientist at Anthropic working on model behavior and interpretability.

Anthropic
Unknown 3
Profile Reports

Scott Johnston

Anthropic

Software engineer at Anthropic working on infrastructure, tooling, model behavior, and multimodal systems.

Anthropic
Unknown 2
Profile Reports

Stanislav Fort

Anthropic

Member of Technical Staff at Anthropic whose work focuses on understanding, evaluating, and improving large language models, with emphasis on reasoning, safety, and generalization.

Anthropic
Unknown 2

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.