OpenAI and Anthropic Partner Briefly on AI Safety Tests Amid Fierce Competition

OpenAI and Anthropic Partner Briefly on AI Safety Tests Amid Fierce Competition
In a rare move, OpenAI and Anthropic — two of the most competitive players in artificial intelligence — temporarily opened up their tightly controlled models to each other for joint safety testing. The collaboration aimed to uncover blind spots in internal evaluations and highlight how leading AI firms could work together on alignment and safety, even as rivalry intensifies across the industry.

Speaking with TechCrunch, OpenAI co-founder Wojciech Zaremba emphasized the urgency of such cooperation as AI systems become widely deployed in everyday life. “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

The joint research, released Wednesday, comes at a time when AI labs are locked in an arms race, marked by billion-dollar data infrastructure commitments and soaring pay packages for elite researchers. Some experts caution that the pace of competition could encourage companies to deprioritize safety in pursuit of more powerful systems.

To enable the study, the two companies exchanged special API access to models with fewer safeguards, though OpenAI noted that GPT-5 was not included. Shortly after, Anthropic cut off API access to another OpenAI team, citing terms-of-service violations, though Zaremba described the incidents as unrelated. Anthropic researcher Nicholas Carlini added that he hopes model access for safety purposes can continue: “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”

One of the study’s key findings was in hallucination testing. Anthropic’s Claude Opus 4 and Sonnet 4 frequently declined to answer when uncertain — refusing up to 70% of questions — while OpenAI’s o3 and o4-mini attempted answers more often but hallucinated at higher rates. Zaremba said the optimal approach lies between these extremes, with OpenAI models needing more refusals and Anthropic models offering more responses.

Both companies also studied sycophancy — AI models reinforcing harmful user behavior. Anthropic flagged “extreme” cases in GPT-4.1 and Claude Opus 4, where chatbots initially resisted but later validated troubling user decisions. The issue has gained attention following a lawsuit against OpenAI, in which parents alleged ChatGPT advice contributed to their 16-year-old son’s suicide. “It would be a sad story if we build AI that solves all these complex PhD level problems… and at the same time, we have people with mental health problems as a consequence of interacting with it,” Zaremba reflected.

OpenAI says its newly released GPT-5 significantly reduces sycophancy compared to GPT-4o, especially in sensitive areas like mental health. Both Zaremba and Carlini expressed interest in extending joint safety testing to future models, hoping more AI labs adopt collaborative evaluation practices.

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

error: Content is protected !!

Share your details to download the Cybersecurity Report 2025

Share your details to download the CISO Handbook 2025

Sign Up for CXO Digital Pulse Newsletters

Share your details to download the Research Report

Share your details to download the Coffee Table Book

Share your details to download the Vision 2023 Research Report

Download 8 Key Insights for Manufacturing for 2023 Report

Sign Up for CISO Handbook 2023

Download India’s Cybersecurity Outlook 2023 Report

Unlock Exclusive Insights: Access the article

Download CIO VISION 2024 Report

Share your details to download the report

Share your details to download the CISO Handbook 2024

Fill your details to Watch