LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / People / Detail

Tom Conerly

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Researcher at Anthropic1 organizations4 reports

Profile status: updated

Tom Conerly portrait
Suggest a correction
Suggest a source

Trust signals

Profile completeness59%
Public sources4
Official sources1
Last reviewedJun 8, 2026
Scholar profile Structured work
updated 4 public sources
AI safetyinterpretabilitylanguage model calibration

Current frame

Researcher at Anthropic

Work

Anthropic Role not listed

Public links

dblp DBLP

Organizations

core Anthropic

Reports

Alignment and RLHF Collective Constitutional AI: Aligning a Language Model with Public Input Alignment and RLHF Constitutional AI: Harmlessness from AI Feedback Alignment and Safety Many-shot Jailbreaking Alignment and RLHF Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Official and primary sources

Tom Conerly DBLP profile Official source · dblp · DBLP

Supporting sources

Scaling Laws and Interpretability of Learning from Repeated Data Supporting source · report · arXiv Language Models (Mostly) Know What They Know Supporting source · report · arXiv Many-shot Jailbreaking Supporting source · report · arXiv

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms