Tech: Anthropic Weakens Its Safety Pledge In The Wake Of The Pentagon's...
Two stories about the Claude maker Anthropic broke on Tuesday that, when combined, arguably paint a chilling picture. First, US Defense Secretary Pete Hegseth is reportedly pressuring Anthropic to yield its AI safeguards and give the military unrestrained access to its Claude AI chatbot. The company then chose the same day that the Hegseth news broke to drop its centerpiece safety pledge.
On Tuesday, Anthropic said it was modifying its Responsible Scaling Policy (RSP) to lower safety guardrails. Up until now, the company's core pledge has been to stop training new AI models unless specific safety guidelines can be guaranteed in advance. This policy, which set hard tripwires to halt development, was a big part of Anthropic's pitch to businesses and consumers.
“Two and a half years later, our honest assessment is that some parts of this theory of change have played out as we hoped, but others have not,” Anthropic wrote. Now, its updated policy approaches safety relatively, rather than with strict red lines.
Anthropic's quotes in an interview with Time sound reasonable enough in a vacuum. "We felt that it wouldn't actually help anyone for us to stop training AI models," Jared Kaplan, Anthropic's chief science officer, told Time. "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments… if competitors are blazing ahead."
But you could also read those quotes as the latest example of a hot startup’s ethics becoming grayer as its valuation rises. (Remember Google’s old “Don’t be evil” mantra that it later removed from its code of conduct?) The latest versions of Claude have drawn widespread praise, especially in coding. In February, Anthropic raised $30 billion in new investments. It now has a valuation of $380 billion. (Speaking of the competition Kaplan referred to, rival OpenAI is currently valued at over $850 billion.)
In place of Anthropic's previous tripwires, it will implement new "Risk Reports" and "Frontier Safety Roadmaps." These disclosure models are designed to provide transparency to the public in place of those hard lines in the sand.
Anthropic says the change was motivated by a "collective action problem" stemming from the competitive AI landscape and the US's anti-regulatory approach. "If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe,"
Source: Engadget