LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / People / Detail

Nova DasSarma

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Researcher at Anthropic1 organizations5 reports

Profile status: updated

Nova DasSarma portrait
Suggest a correction
Suggest a source

Trust signals

Profile completeness59%
Public sources3
Official sources1
Last reviewedJun 8, 2026
Scholar profile Structured work
updated 3 public sources
AI safetylanguage model evaluationmodel behavior

Current frame

Researcher at Anthropic

Work

Anthropic Role not listed

Public links

dblp DBLP

Organizations

core Anthropic

Reports

Alignment and RLHF Collective Constitutional AI: Aligning a Language Model with Public Input Alignment and RLHF Constitutional AI: Harmlessness from AI Feedback Alignment and Safety Many-shot Jailbreaking Alignment and Safety Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Alignment and RLHF Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Official and primary sources

Nova DasSarma DBLP profile Official source · dblp · DBLP

Supporting sources

Discovering Language Model Behaviors with Model-Written Evaluations Supporting source · report · ACL Anthology Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Supporting source · report · arXiv

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms