Major AI Safety Upheaval: OpenAI Dissolves Safety Team as Anthropic Battles Pentagon Over Weapons Guardrails

Key Developments

The AI safety landscape experienced dramatic upheaval this week as two major developments signal a troubling shift in the industry’s approach to safety governance.

OpenAI quietly disbanded its mission alignment team—the group specifically tasked with ensuring AI development remains safe and trustworthy—scattering its members across different divisions. This follows the company’s earlier decision to drop its commitment to safety from its mission statement, along with pledges to remain “unconstrained” by profit motives.

Simultaneously, Anthropic is embroiled in a legal battle with the Pentagon after Defense Secretary Pete Hegseth designated the company as a “national security supply chain risk.” The designation, which prohibits Defense Department contractors from using Anthropic’s technology, came after the company refused to strip safety guardrails preventing its AI from being used for autonomous weapons and mass domestic surveillance. Hearings began today in San Francisco.

In a positive development, OpenAI announced a $1 billion investment in AI safety projects, while Anthropic released its open-source Automated Alignment Agent (A3), which reduces safety failure rates on issues like political bias and jailbreak attempts.

Industry Context

These developments represent a concerning trend among leading AI companies scaling back dedicated safety commitments despite growing capabilities. Anthropic, previously considered the most safety-conscious major lab, recently dropped its central pledge to never train AI systems without guaranteed adequate safety measures.

New research reveals critical vulnerabilities in current safety approaches. Studies show that removing triggering language from harmful prompts increases attack success rates from 5.38% to 86.79%, indicating safety evaluations may be fundamentally flawed.

Practical Implications

For AI builders and users, these changes signal increased uncertainty around safety standards and regulatory compliance. The Pentagon’s aggressive stance suggests governments may take increasingly hardline approaches to AI governance, potentially forcing companies to choose between safety principles and market access.

The dissolution of dedicated safety teams at major labs means safety considerations may become more distributed and potentially diluted across organisations.

Open Questions

How will the Anthropic-Pentagon legal battle resolve, and what precedent will it set for government pressure on AI safety? Can financial commitments like OpenAI’s $1 billion pledge effectively replace dedicated organisational structures? Most critically, are current safety evaluation methods adequate for increasingly capable AI systems?

Source: Multiple sources