Amazon Rolls Out Strict New Coding Rules in Response to Major Service Outages
Amazon is reportedly implementing a comprehensive set of new regulations governing how its engineers write, review, and deploy code. These changes come as a direct reaction to a series of recent outages that have disrupted the company's e-commerce operations over the past few months, with some incidents being traced back to its own AI-powered coding tools.
Details of the 90-Day Safety Reset
Citing internal documents, Business Insider reveals that Amazon has initiated a 90-day temporary safety reset, which supplements existing policies. This new policy specifically targets approximately 335 'Tier-1 systems'—services that can directly impact consumers—that have experienced multiple order-impacting incidents since last year and are owned by VP-level organizations.
Triggering Incidents and Their Impact
The reset was reportedly triggered by "several major" incidents, described by Dave Treadwell, Amazon's SVP of e-commerce services, in an internal note as a "trend of incidents." The scale of these failures was significant. On March 2, customers across Amazon's marketplaces were shown incorrect delivery times when adding items to their carts, resulting in nearly 120,000 lost orders and around 1.6 million website errors. Amazon's AI coding assistant Q was identified as one of the primary contributors to this incident.
Three days later, on March 5, an outage caused a 99% drop in orders across Amazon's North American marketplaces, leading to an estimated 6.3 million lost orders in a single day. According to an internal document, the root cause was a production change deployed without going through Amazon's formal documentation and approval process.
Key Requirements Under the New Rules
Under the 90-day reset, Amazon engineers working on Tier-1 systems must now adhere to the following strict guidelines:
- Dual Review Mandate: Engineers must get two people to review their work before making any coding changes. This requirement had apparently lapsed or been bypassed in some teams previously.
- Documentation Tool Usage: They are required to use an internal documentation and approval tool called Modeled Change Management for all production changes.
- Automated Compliance: They must utilize an automated coding system that strictly follows Amazon's central reliability engineering standards.
An Amazon spokesperson clarified that the policy does not require junior or mid-level engineers to obtain sign-off specifically from senior engineers for AI-assisted changes, providing some flexibility within the stricter framework.
Leadership Oversight and Audits
Meanwhile, for directors and VP-level leaders who own Tier-1 systems, Amazon is instructing them to conduct thorough audits of all production code change activities within their organizations. This represents a top-down review aimed at scrutinizing how code has been written, approved, and deployed, ensuring accountability at higher management levels.
These measures highlight Amazon's proactive approach to addressing reliability issues, particularly those exacerbated by AI tools, as the company seeks to stabilize its e-commerce platforms and prevent future disruptions.
