Anthropic Unveils Claude Opus 4.5, Claims World’s Best Performance in Coding, Agents, and Complex Computer-Use Tasks

November 26, 2025

977

Anthropic Unveils Claude Opus 4.5, Claims World’s Best Performance in Coding, Agents, and Complex Computer-Use Tasks

Just days after the rollout of GPT-5.1 and Google’s Gemini 3, Anthropic has entered the race with the launch of Claude Opus 4.5, positioning it as the strongest AI model globally for coding-intensive workloads, agentic tasks, and computer-use automation. The company asserts that Opus 4.5 sets a new industry benchmark, surpassing all current frontier models across multiple real-world evaluations.

One of its headline achievements is its 80.9% score on SWE-bench Verified, a demanding benchmark that tests real-world software engineering ability. With this result, Claude Opus 4.5 becomes the first model ever to cross the 80% threshold on this benchmark. For comparison, Google’s newly released Gemini 3 Pro registered 76.2%, while OpenAI’s GPT-5.1 Codex Max reached 77.9%, placing Anthropic’s model firmly at the top of the leaderboard.

Beyond coding, the model also outperforms humans on Anthropic’s internal two-hour technical assessment, which is used for evaluating performance engineering candidates. As the company notes, “The take-home test is designed to assess technical ability and judgment under time pressure… But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.” Although the test does not measure collaboration or long-term instincts, Anthropic suggests that this level of capability signals a shift in how engineering roles may evolve.

Claude Opus 4.5 also demonstrates superior performance on the τ2-bench, a benchmark that evaluates real-world agentic reasoning and multi-turn task execution. In a scenario designed to test compliance with airline booking rules, Opus 4.5 successfully navigated policy constraints. Instead of wrongly modifying a basic economy ticket, the model “found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights.” Anthropic says this reflects the model’s ability to reason more strategically than competing systems.

A major highlight of the release is Anthropic’s focus on safety. The company describes Opus 4.5 as its “most robustly aligned model” yet. It claims significant improvements in defending against prompt-injection attacks that attempt to insert deceptive commands—asserting that Opus 4.5 is now “harder to trick… than any other frontier model in the industry.”

Claude Opus 4.5 is immediately available across the Claude mobile apps on Android and iOS, the Claude web interface, and through simultaneous release to developers.

- Advertisement -

Anthropic Unveils Claude Opus 4.5, Claims World’s Best Performance in Coding, Agents, and Complex Computer-Use Tasks

Related Articles

Critical Marimo Vulnerability Enables Pre-Auth RCE via WebSocket Flaw

Hathway Cable appoints Gurjeev Singh Kapoor as CEO amid industry transformation

Snap selects Qualcomm chips for upcoming AI-powered smart glasses

Anthropic restricts OpenClaw creator’s access to Claude amid security and policy concerns

LEAVE A REPLY Cancel reply

Latest Articles

Critical Marimo Vulnerability Enables Pre-Auth RCE via WebSocket Flaw

Hathway Cable appoints Gurjeev Singh Kapoor as CEO amid industry transformation

Snap selects Qualcomm chips for upcoming AI-powered smart glasses

Anthropic restricts OpenClaw creator’s access to Claude amid security and policy...

DeepSeek signals expansion with planned data center development in Inner Mongolia

Kastle Announces Direct Integration with the ICE MSP Mortgage Servicing System...

Celonis and Oracle Collaborate to Power Enterprise AI and Accelerate IT...

Nitin M. Jadhav Elevated to President & CRO at Yotta Data...

Subhabrata Ghosh Appointed COO at Tata AutoComp Gotion Green Energy Solutions

Accenture Invests in Replit to Advance AI-Driven Software Development for Enterprises