In a bold move to advance artificial intelligence, a cluster of Silicon Valley startups has embarked on creating near-perfect digital copies of major websites like Amazon, Gmail, and United Airlines. These shadow sites are not for public use but serve as sophisticated training grounds for AI systems, pushing the industry closer to developing autonomous AI agents capable of performing complex tasks online.
From Chatbots to Autonomous Agents: The Training Grounds
This summer, legal representatives at United Airlines discovered a strikingly accurate replica of their official website. The clone featured identical buttons, menus for booking flights and hotels, and even the company's logo and brand name. The site was promptly issued a takedown notice for copyright infringement. However, the creator, Div Garg of the startup AGI, had no intention of infringing on intellectual property. His goal was purely educational for AI.
Garg's company, along with others like Plato and Matrices, is part of a new wave building these replica environments. Their objective is to transform today's limited chatbots into fully-fledged AI agents—systems that can independently book travel, schedule meetings, or manage data. "We want to build training environments that capture entire jobs that people do," stated Robert Farlow, co-founder of Plato.
Why Clone a Website? The Data Dilemma
The tech industry's relentless pursuit of AI advancement has led to a scarcity of high-quality training data. After exhausting vast swathes of internet text, companies are now turning to reinforcement learning. This technique requires AI to learn through immense trial and error, performing tasks thousands of times to understand what works.
"When you're doing training, you want to run thousands of AI agents at the same time... If you do that on a real website, you will get blocked," explained Div Garg. Sites like Amazon and Airbnb actively block bots that perform repetitive actions. Therefore, creating controlled replicas—with names like "Omnizon" for Amazon or "Go Mail" for Gmail—allows for unrestricted, accelerated learning without triggering security measures.
Backed by $10 million in funding from Menlo Ventures and others, AGI has cloned several major platforms. This approach marks a shift from training AI on human-generated data to letting AI generate its own learning data through simulated interactions.
Legal Grey Areas and the Future of Work
This novel practice ventures into uncharted legal territory. After United's complaint, Garg renamed his site "Fly Unified" and removed the logo, believing this mitigates legal risk. John Qian of Matrices shares a similar view, though he acknowledges the law is struggling to keep pace with AI innovation.
Robin Feldman, a law professor at UC Law San Francisco and author of "AI Versus IP," warns that using these shadow sites could violate copyrights. "These companies are shooting first and asking questions later," she noted, highlighting the potential for legal backlash as the field expands rapidly.
Despite the hype, current AI agent technology remains imperfect. Experimental tools from OpenAI and Anthropic that can shop or take notes often make errors. "There is a big gap between what companies want these agents to do and what they are capable of today," said Rayan Krishnan, CEO of Vals AI. The systems are often too slow or unreliable to be practically useful now.
However, the ambition is clear and vast. The end goal, as articulated by Robert Farlow, is to "re-create all the software and websites that people use... to train AI to do the jobs and start to do them even better than a human." This vision foresees a potential future where AI agents could automate significant portions of white-collar work, a transformation that continues to fuel both massive investment and serious ethical and legal debates.