In a significant critique from within the academic elite, a prominent mathematician and logician has declared current artificial intelligence systems fundamentally unsuitable for genuine mathematical research. The critic is not a casual observer but Joel David Hamkins, a senior scholar whose career is built on the rigorous foundations of mathematical logic.
A Career Forged in Precision and Logic
Joel David Hamkins is the John Cardinal O’Hara Professor of Logic at the University of Notre Dame. His professional journey is a testament to a life dedicated to disciplines where absolute precision is non-negotiable. His work spans mathematical logic, philosophical logic, set theory, and computability theory. He is particularly renowned for developing the concept of the set-theoretic multiverse, a framework that questions the idea of a single, absolute universe of mathematics.
His educational background set the stage for this exacting career. Hamkins earned his Bachelor of Science from the California Institute of Technology and completed his Ph.D. in Mathematics in 1994 at UC Berkeley under the supervision of W. Hugh Woodin. His doctoral dissertation on advanced set theory immersed him in a world of logical consistency and meticulous proof structure, shaping his uncompromising standards for formal correctness.
His academic path has been both distinguished and interdisciplinary. After his doctorate, he joined the City University of New York in 1995, holding roles across mathematics, philosophy, and computer science. His career includes appointments at top global institutions like the University of Oxford, where he became Professor of Logic in 2018, before moving to Notre Dame in January 2022. This trajectory placed him at the crossroads of mathematics, philosophy, and computation long before AI's rise to prominence.
Why AI Fails the Mathematical Trust Test
Hamkins recently elaborated on his scepticism during an appearance on the Lex Fridman podcast. He revealed that despite experimenting with several paid AI models, he has not found them "helpful at all" for his work.
The core of his objection goes beyond the mere fact that AI can make mistakes. For Hamkins, the critical failure lies in how these systems handle error. He reports that when he identifies concrete flaws in their mathematical reasoning, the models often respond with confident reassurances instead of corrections, offering comments like "Oh, it’s totally fine" even when the underlying mathematics is incorrect.
This behaviour, according to Hamkins, violates a basic tenet of mathematical collaboration: trust. In genuine research, colleagues must be able to challenge arguments, pinpoint errors, and revise claims constructively. Hamkins stated that if a human colleague reacted to correction with such unwarranted confidence, he would cease working with them. He summarised AI's mathematical outputs as "garbage answers that are not mathematically correct" and concluded that "as far as mathematical reasoning is concerned, it seems not reliable."
The Gap Between Benchmarks and Research Reality
Hamkins' critique feeds into a growing debate within the mathematics community. While some researchers report using AI to explore problems, others like mathematician Terence Tao have warned that AI can generate polished-looking proofs containing subtle, critical errors that would fail peer review.
Hamkins' experience underscores a crucial distinction: strong performance on standardised AI benchmarks does not equate to dependable reasoning in actual research. The step-by-step, line-by-line scrutiny required for advanced mathematical proofs is a different arena altogether. He acknowledges that future systems may improve but remains unconvinced that current large language models can function as authentic research partners. His assessment highlights a persistent gap between the promise of AI reasoning and the rigorous standards demanded by fields at the pinnacle of logical precision.