Anthropic drops flagship AI safety pledge from policy
Anthropic has dropped the central commitment of its Responsible Scaling Policy, eliminating the pledge to halt AI training until safety measures are verified in advance. The new RSP v3 replaces this unconditional guarantee with a conditional system.
Anthropic has dropped the central commitment of its Responsible Scaling Policy, eliminating the pledge to halt AI training until safety measures are verified in advance. The new RSP v3 replaces this unconditional guarantee with a conditional system: the company will only delay development if leadership believes it has a significant lead over competitors AND the risks are catastrophic. The shift represents a stark retreat from the safety-first positioning that defined Anthropic since it introduced the RSP in 2023.
Anthropic says the change reflects reality. Chief Science Officer Jared Kaplan stated: "We felt that it wouldn't actually help anyone for us to stop training AI models." The company also acknowledged that its original theory failed in practice—capability thresholds proved "far more ambiguous than anticipated," and government regulation has moved slower than expected. These pressures come as Anthropic has raised $30 billion in funding at a $380 billion valuation and grown revenue 10x, intensifying the company's drive to compete with faster-moving rivals like OpenAI.
The move triggers concern among AI safety experts. Chris Painter from METR, an AI evaluation organization, warned the announcement signals Anthropic is "shifting into triage mode"—prioritizing speed over precaution. Notably, Anthropic says certain safeguards remain: it continues to block the use of its Claude AI for autonomous weapons and mass surveillance, even as the Pentagon applies pressure.
Sources
- T2
- T2
- T2
Stay informed. The best AI coverage, delivered weekly.