
Just days after the rollout of GPT-5.1 and Google’s Gemini 3, Anthropic has entered the race with the launch of Claude Opus 4.5, positioning it as the strongest AI model globally for coding-intensive workloads, agentic tasks, and computer-use automation. The company asserts that Opus 4.5 sets a new industry benchmark, surpassing all current frontier models across multiple real-world evaluations.
One of its headline achievements is its 80.9% score on SWE-bench Verified, a demanding benchmark that tests real-world software engineering ability. With this result, Claude Opus 4.5 becomes the first model ever to cross the 80% threshold on this benchmark. For comparison, Google’s newly released Gemini 3 Pro registered 76.2%, while OpenAI’s GPT-5.1 Codex Max reached 77.9%, placing Anthropic’s model firmly at the top of the leaderboard.
Beyond coding, the model also outperforms humans on Anthropic’s internal two-hour technical assessment, which is used for evaluating performance engineering candidates. As the company notes, “The take-home test is designed to assess technical ability and judgment under time pressure… But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.” Although the test does not measure collaboration or long-term instincts, Anthropic suggests that this level of capability signals a shift in how engineering roles may evolve.
Claude Opus 4.5 also demonstrates superior performance on the τ2-bench, a benchmark that evaluates real-world agentic reasoning and multi-turn task execution. In a scenario designed to test compliance with airline booking rules, Opus 4.5 successfully navigated policy constraints. Instead of wrongly modifying a basic economy ticket, the model “found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights.” Anthropic says this reflects the model’s ability to reason more strategically than competing systems.
A major highlight of the release is Anthropic’s focus on safety. The company describes Opus 4.5 as its “most robustly aligned model” yet. It claims significant improvements in defending against prompt-injection attacks that attempt to insert deceptive commands—asserting that Opus 4.5 is now “harder to trick… than any other frontier model in the industry.”
Claude Opus 4.5 is immediately available across the Claude mobile apps on Android and iOS, the Claude web interface, and through simultaneous release to developers.




