GitHub CTO Apologizes as AI Agent Surge Causes Reliability Crisis
GitHub CTO Apologizes as AI Agent Surge Causes Reliability Crisis

GitHub's Chief Technology Officer, Vlad Fedorov, has issued a rare public apology, acknowledging that a surge in AI-driven development workflows is the primary cause of the platform's worsening reliability issues. The company admits it severely underestimated the capacity required to maintain smooth operations.

Apology Amidst Growing Frustration

In a blog post published on April 28, 2026, Fedorov described two recent incidents as "not acceptable" and detailed a platform under significant strain. Uptime in April has fallen below 85%, far short of the 99.9% guaranteed in the service level agreement. The apology concludes with a blunt two-word statement: "We're sorry."

The timing of the apology is notable, coming just hours after Ghostty developer Mitchell Hashimoto—GitHub's 1,299th user, who joined in February 2008—publicly announced he was removing his popular terminal emulator from the platform. Hashimoto had been maintaining a daily journal marking every date a GitHub outage disrupted his work, with almost every day showing a mark. "This is no longer a place for serious work," he wrote.

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

The Unplanned AI Agent Surge

GitHub had anticipated increased traffic but not at this scale. The company began a 10x capacity expansion plan in October 2025, but by February 2026, it became clear that the platform needed to be designed for 30 times today's scale. The main driver, according to GitHub, is a sharp acceleration in agentic development workflows since late December 2025. Repository creation, pull request activity, API usage, and large-repository workloads are all growing rapidly and simultaneously.

This simultaneous growth is the critical challenge. It is not just one system buckling under pressure—it is everything at once. A single pull request can touch Git storage, mergeability checks, branch protection, GitHub Actions, search, notifications, permissions, webhooks, APIs, background jobs, caches, and databases. At high scale, small inefficiencies compound quickly.

Two Incidents That Led to the Apology

Two specific incidents pushed the platform to a breaking point. On April 23, a merge queue bug caused incorrect commits when a merge group contained more than one pull request, inadvertently reverting changes from previously merged pull requests. A total of 658 repositories and 2,092 pull requests were affected. Although no data was lost, default branches were left in incorrect states that GitHub could not safely repair automatically.

Then, on April 27, GitHub's Elasticsearch cluster became overloaded—likely due to a botnet attack—and stopped returning search results. This broke large parts of the user interface for pull requests, issues, and projects. Fedorov noted that Elasticsearch was one of the systems not yet fully isolated because other higher-risk areas had taken priority. That calculation clearly did not hold.

Immediate and Long-Term Fixes

GitHub has been implementing several fixes. These include moving webhooks out of MySQL, redesigning session caches, overhauling authentication flows to reduce database load, and accelerating the migration of performance-sensitive code from Ruby to Go. The migration to Azure, despite being blamed by some, has actually helped by allowing GitHub to spin up compute faster. A multi-cloud architecture is also being developed for longer-term resilience.

GitHub's stated priority order going forward is availability first, then capacity, then new features. The platform has also updated its status page to include live availability numbers and committed to flagging both large and small incidents, so developers no longer have to guess whether the problem is on their end or GitHub's.

The full blog post from GitHub CTO Vlad Fedorov provides further details on the incidents and the company's response. Whether these promises will translate into a platform developers can rely on again remains the critical question.

Pickt after-article banner — collaborative shopping lists app with family illustration