LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Collective Constitutional AI: Aligning a Language Model with Public Input

Alignment and RLHF report from Anthropic with 25 connected researchers in the LLMpeople atlas.

Anthropic2023-10-0325 researchers
Field
Alignment and RLHF
Organization
Anthropic
arXiv
2310.01835

Canonical link

https://arxiv.org/abs/2310.01835

Connected researchers

Dario Amodei portrait
Researcher 5 reports

Dario Amodei

Anthropic / OpenAI

Co-founder and CEO of Anthropic.

AnthropicOpenAI
United States
Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.

AnthropicOpenAI
United States
Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

AnthropicOpenAI
Yuntao Bai portrait
Researcher 4 reports

Yuntao Bai

Anthropic

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Anthropic
Andy Jones portrait
Researcher 2 reports

Andy Jones

Anthropic

Anthropic researcher working on machine learning and AI-assisted science; previously built tools for learning from text, images, and tabular data.

Anthropic
Kamal Ndousse portrait
Researcher 5 reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.

Anthropic
Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Anthropic
Nicholas Joseph portrait
Researcher 3 reports

Nicholas Joseph

Anthropic

Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.

Anthropic
Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.

Anthropic
Jackson Kernion portrait
Researcher 3 reports

Jackson Kernion

Anthropic

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Anthropic
Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Anthropic
Nelson Elhage portrait
Researcher 3 reports

Nelson Elhage

Anthropic

Nelson Elhage is an engineer and researcher at Anthropic, where he works on the pretraining team after earlier work on reverse-engineering large language models. He previously worked at Stripe and Ksplice/Oracle on systems software and is known for open-source systems projects such as livegrep and reptyr.

Anthropic
Zac Hatfield-Dodds portrait
Researcher 3 reports

Zac Hatfield-Dodds

Anthropic

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Anthropic
Catherine Olsson portrait
Researcher 2 reports

Catherine Olsson

Anthropic

Catherine Olsson is an AI alignment researcher and writer whose public website and Anthropic author page describe work on AI safety, interpretability, and building helpful, harmless assistants.

Anthropic
Tom Brown portrait
Researcher 3 reports

Tom Brown

Anthropic

Research scientist at Anthropic working on model behavior and interpretability.

Anthropic
Sam McCandlish portrait
Researcher 3 reports

Sam McCandlish

Anthropic

Sam McCandlish is listed as an author of the Anthropic technical report Collective Constitutional AI: Aligning a Language Model with Public Input.

Anthropic
Ben Mann portrait
Researcher 3 reports

Ben Mann

Anthropic

A public Anthropic/AWS presentation describes Ben Mann as an Anthropic co-founder and former GPT-3 and API engineer at OpenAI. The previously attached benmann.com homepage is now a parked domain rather than a personal research site.

Anthropic
Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.

Anthropic
Jared Mueller portrait
Researcher 2 reports

Jared Mueller

Anthropic

Jared Mueller is affiliated with Anthropic. Public materials list him as an Anthropic participant at the 2023 Economics of Robots Conference, and the linked arXiv paper lists him as a coauthor on Anthropic's Constitutional AI work.

Anthropic
Joshua Landau portrait
Researcher 2 reports

Joshua Landau

Anthropic

Joshua Landau is affiliated with Anthropic. Public Anthropic research materials list him as a coauthor of Measuring Progress on Scalable Oversight for Large Language Models, and the linked arXiv paper lists him as a coauthor of Constitutional AI.

Anthropic
Timothy Telleen-Lawton portrait
Researcher 2 reports

Timothy Telleen-Lawton

Anthropic

Timothy Telleen-Lawton is an independent researcher focused on inspiring and scaling collective intelligence. His public LessWrong profile says he previously served as Head of Procurement at Anthropic starting in 2021, and also lists prior work at CFAR and GiveWell.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Herbie Bradley portrait
Researcher 1 reports

Herbie Bradley

Anthropic

Computer scientist and machine learning researcher with public work spanning AI systems and alignment-related research.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms