OpenAI and Anthropic Partner Briefly on AI Safety Tests Amid Fierce Competition

August 28, 2025

254

OpenAI and Anthropic Partner Briefly on AI Safety Tests Amid Fierce Competition

In a rare move, OpenAI and Anthropic — two of the most competitive players in artificial intelligence — temporarily opened up their tightly controlled models to each other for joint safety testing. The collaboration aimed to uncover blind spots in internal evaluations and highlight how leading AI firms could work together on alignment and safety, even as rivalry intensifies across the industry.

Speaking with TechCrunch, OpenAI co-founder Wojciech Zaremba emphasized the urgency of such cooperation as AI systems become widely deployed in everyday life. “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

The joint research, released Wednesday, comes at a time when AI labs are locked in an arms race, marked by billion-dollar data infrastructure commitments and soaring pay packages for elite researchers. Some experts caution that the pace of competition could encourage companies to deprioritize safety in pursuit of more powerful systems.

To enable the study, the two companies exchanged special API access to models with fewer safeguards, though OpenAI noted that GPT-5 was not included. Shortly after, Anthropic cut off API access to another OpenAI team, citing terms-of-service violations, though Zaremba described the incidents as unrelated. Anthropic researcher Nicholas Carlini added that he hopes model access for safety purposes can continue: “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”

One of the study’s key findings was in hallucination testing. Anthropic’s Claude Opus 4 and Sonnet 4 frequently declined to answer when uncertain — refusing up to 70% of questions — while OpenAI’s o3 and o4-mini attempted answers more often but hallucinated at higher rates. Zaremba said the optimal approach lies between these extremes, with OpenAI models needing more refusals and Anthropic models offering more responses.

Both companies also studied sycophancy — AI models reinforcing harmful user behavior. Anthropic flagged “extreme” cases in GPT-4.1 and Claude Opus 4, where chatbots initially resisted but later validated troubling user decisions. The issue has gained attention following a lawsuit against OpenAI, in which parents alleged ChatGPT advice contributed to their 16-year-old son’s suicide. “It would be a sad story if we build AI that solves all these complex PhD level problems… and at the same time, we have people with mental health problems as a consequence of interacting with it,” Zaremba reflected.

OpenAI says its newly released GPT-5 significantly reduces sycophancy compared to GPT-4o, especially in sensitive areas like mental health. Both Zaremba and Carlini expressed interest in extending joint safety testing to future models, hoping more AI labs adopt collaborative evaluation practices.

- Advertisement -

OpenAI and Anthropic Partner Briefly on AI Safety Tests Amid Fierce Competition

Related Articles

UPI’s Next Leap: AI, Biometrics, and Global Expansion Redefine Digital Payments

Dahnesh Dilkhush Appointed as Chief Technology Officer for Microsoft India & South Asia

Rajnickant Patel Appointed as Independent Director and IT Strategy Committee Chairman at SBI Card

Deloitte’s Dual Headlines: Global Anthropic Partnership and AI Report Controversy Spark Debate on Oversight

LEAVE A REPLY Cancel reply

Latest Articles

UPI’s Next Leap: AI, Biometrics, and Global Expansion Redefine Digital Payments

Dahnesh Dilkhush Appointed as Chief Technology Officer for Microsoft India &...

Rajnickant Patel Appointed as Independent Director and IT Strategy Committee Chairman...

Deloitte’s Dual Headlines: Global Anthropic Partnership and AI Report Controversy Spark...

Microsoft Revokes Over 200 Fraudulent Certificates Used in Ransomware Attacks by...

UNC5142 Exploits Blockchain Smart Contracts to Distribute Info-Stealing Malware Across Windows...

Microsoft Supercharges Windows 11 with AI-Powered Copilot: Voice, Vision, and Autonomy...

Kayak Launches “AI Mode” to Revolutionize Travel Planning with ChatGPT Integration

NetApp Strengthens North America Partner Leadership with Appointment of Kristine Wedum

Reddit Expands AI-Powered Search to Five New Languages, Broadening Global Reach