AI Hallucinations and Fake Citations Infiltrate Scientific Research, Even at Top Conferences

In a concerning development for the scientific community, artificial intelligence hallucinations and fabricated citations are increasingly infiltrating academic research, even at prestigious conferences. A recent investigation has revealed that over 51 papers accepted at the Conference on Neural Information Processing Systems (NeurIPS) contained fake, AI-generated citations, raising alarms about the integrity of scholarly discourse.

The NeurIPS Findings: A Wake-Up Call

According to a report by AI detection startup GPTZero, more than 100 hallucinated citations were discovered across 51 research papers accepted at NeurIPS 2025. The conference, held in San Diego, California, in December last year, is one of the most significant events in artificial intelligence and machine learning. GPTZero scanned a total of 4,841 papers from the conference, focusing on both AI-generated text and hallucinated citations.

While 51 out of 4,841 papers might seem statistically insignificant, NeurIPS has a strict policy that considers any hallucinated citations as grounds for rejection or revocation. This underscores the severity of the issue, as these papers had already been accepted, presented, and effectively published, beating out thousands of other submissions in a highly competitive environment.

How GPTZero Detected the Fake Citations

GPTZero employed its proprietary AI tool, Hallucination Check, to scan the citations in over 4,000 NeurIPS papers. The tool flagged citations that could not be found online, and these were manually verified by humans to confirm they were AI-generated fakes. The company refers to such citations as vibe citations, defined as those likely resulting from generative AI use, excluding common human errors like spelling mistakes or dead URLs.

The Hallucination Check tool is now available for authors to pre-check manuscripts, and GPTZero's AI Detector helps editors and conference chairs identify suspicious content more efficiently.

Broader Implications for Scientific Research

NeurIPS is not alone in facing this challenge. GPTZero also detected over 50 hallucinated citations in papers under review for ICLR 2026, another major AI conference. Moreover, online pre-print repositories like arXiv are seeing a surge in low-quality, AI-generated papers. A report by The Atlantic noted that researchers using LLM-powered tools post about 33% more papers than those who do not, potentially flooding the system with substandard work.

This trend highlights a paradox: even leading AI experts struggle to ensure the accuracy of AI tools they rely on, risking the dilution of scientific rigor. As AI becomes more integrated into research processes, the need for robust detection and verification mechanisms grows.

About NeurIPS and the Peer Review Crisis

Founded in 1987, NeurIPS focuses on neural networks and the interplay of computation, neurobiology, and physics. It has evolved into a premier AI event, with the 2025 edition attracting a record 26,000 attendees. Submissions to NeurIPS increased by over 220% between 2020 and 2025, from 9,467 to 21,575, straining the peer-review system.

Organizers have had to recruit more reviewers to maintain rigorous standards, but the influx of AI-generated content complicates this mission. Ironically, a paper titled The AI Conference Peer Review Crisis predicted this issue months before NeurIPS 2025, identifying fake citations as a potential problem.

As scientific research grapples with the dual challenges of AI adoption and maintaining integrity, conferences like NeurIPS are at the forefront of developing solutions to safeguard scholarly discourse from the creeping menace of AI hallucinations and fake citations.