Anthropic Launches Claude Opus 4.5: Outperforms GPT-5.1 and Gemini in Coding

In a significant development in the artificial intelligence landscape, Anthropic has launched its latest AI model, Claude Opus 4.5, positioning it as the world's leading AI for coding and computer-related tasks. The announcement comes shortly after competitors OpenAI and Google released their GPT-5.1 and Gemini 3 models respectively.

Benchmark Dominance in Coding Performance

The new Claude Opus 4.5 has achieved a remarkable 80.9% score on SWE-bench Verified, a real-world software engineering benchmark that tests AI capabilities in practical coding scenarios. This performance marks a historic milestone as Opus 4.5 becomes the first AI model ever to breach the 80% threshold on this demanding evaluation platform.

When compared to its main competitors, Anthropic's offering demonstrates clear superiority. Google's recently launched Gemini 3 Pro managed a score of 76.2%, while OpenAI's GPT-5.1 Codex Max achieved 77.9%. The significant gap of nearly 3-4 percentage points establishes Claude Opus 4.5 as the new benchmark leader in AI-assisted coding.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Surpassing Human Candidates and Advanced Agent Capabilities

Perhaps more impressively, Anthropic revealed that their new model outperformed human candidates on the company's own two-hour time-limited test designed for prospective performance engineering candidates. This test assesses technical ability and judgment under time pressure, though Anthropic notes it doesn't evaluate other crucial skills like collaboration, communication, or experience-based instincts.

The company emphasized that this development raises important questions about how artificial intelligence will transform the engineering profession in the coming years.

In terms of agentic AI capabilities, Claude Opus 4.5 demonstrates superior performance in the τ2-bench, which measures how AI agents handle real-world, multi-turn tasks. In one tested scenario where the AI had to act as an airline service agent helping a distressed customer, Opus 4.5 found an innovative solution by suggesting upgrading the cabin first before modifying flights, effectively working within airline policies while still addressing customer needs.

Enhanced Safety and Availability

Anthropic has also focused significantly on safety improvements with this release. The company describes Claude Opus 4.5 as their most robustly aligned model to date, with substantial progress in defending against prompt injection attacks. These attacks involve smuggling deceptive instructions to trick AI models into harmful behavior.

The company stated in their official blog post that Opus 4.5 is harder to trick with prompt injection than any other frontier model currently available in the industry.

For users eager to experience the new capabilities, Claude Opus 4.5 is available immediately through the Claude application on both Android and iOS platforms, as well as directly via the Claude website. The model is being released simultaneously to developers, enabling broader integration and application development.

The launch represents another major step in the intensifying competition among AI giants, with Anthropic establishing a clear performance advantage in coding-related tasks while continuing to prioritize safety and alignment in their model development.