AI Cracks 80-Year-Old Erdős Problem, Ushering in the Age of Machine Discovery

Q: 围绕“What is symbolic reasoning and why is it better than LLMs for math?”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。

In a landmark achievement, an AI system has cracked the Erdős problem—a deceptively simple question about the distribution of sums of distinct integers that resisted all human attempts since the 1940s. The system, developed by a team combining mathematicians and AI researchers, employed a novel symbolic reasoning engine that autonomously generated lemmas and built a complete logical proof chain. This is not a statistical approximation or a large-language-model guess; it is a rigorous, verifiable mathematical proof. The implications are profound: AI has crossed the threshold from being a calculator or a pattern recognizer to becoming a true discoverer in abstract mathematics. The system's architecture, which combines a neural-guided search over a space of symbolic expressions with a formal verification layer, represents a new paradigm for scientific AI. This breakthrough suggests that many long-standing open problems in number theory, topology, and combinatorics may soon fall to AI-assisted reasoning. More importantly, it redefines the role of AI in research—from a tool that executes human instructions to a collaborator that can generate original insights and challenge human intuition. The era of machine discovery has begun.

Technical Deep Dive

This breakthrough rests on a sophisticated hybrid architecture that marries neural networks with symbolic reasoning—a stark departure from the dominant large language model (LLM) paradigm. The core system, which we will refer to as the Symbolic Discovery Engine (SDE), operates in three distinct phases: conjecture generation, proof construction, and formal verification.

Conjecture Generation: Unlike LLMs that predict the next token based on statistical patterns, SDE uses a transformer-based policy network trained on a curated dataset of 50,000 mathematical theorems and their proofs from the Metamath and Lean libraries. This network does not output natural language; instead, it proposes candidate lemmas—intermediate statements—in a formal symbolic language. The key innovation is a novelty filter: the system actively avoids re-deriving known results by comparing candidate lemmas against a database of 2 million existing theorems. This forces the AI to explore genuinely new logical territory.

Proof Construction: The generated lemmas are fed into a graph-based reasoning engine that performs bidirectional search. Starting from the problem statement (the goal) and the set of axioms, the engine expands a proof graph using a combination of forward chaining (from axioms toward the goal) and backward chaining (from the goal back to axioms). The policy network scores each possible expansion step, guiding the search away from dead ends. This is computationally intensive: solving the Erdős problem required exploring over 12 million proof states, but the neural guidance reduced the effective search space by 99.7% compared to a brute-force symbolic solver.

Formal Verification: Every discovered proof is automatically translated into the Lean 4 theorem prover and checked for correctness. This step is non-negotiable: the system rejects any proof that does not pass formal verification, eliminating the hallucination problem that plagues LLM-based reasoning. The final proof for the Erdős problem is 47 lines of Lean code, elegant and concise.

Comparison with Existing Approaches:

| System | Approach | Formal Verification | Novel Proof Generation | Human-like Reasoning |
|---|---|---|---|---|
| GPT-4o | LLM + chain-of-thought | No | Rarely | Superficial |
| AlphaGeometry | Neural + symbolic (geometry) | Yes | Limited to geometry | Specialized |
| SDE (this work) | Neural-guided symbolic search | Yes (Lean 4) | Yes (general) | Emerging |
| Mathematica | Rule-based symbolic | Yes | No | No |

Data Takeaway: SDE is the first system to combine general-purpose symbolic reasoning with formal verification and neural guidance, achieving a level of autonomy in proof discovery that previous systems could not. The 99.7% search space reduction is the critical enabler.

Under the Hood: The system is built on an open-source stack. The policy network is a 1.2B-parameter transformer trained on the Lean Theorem Prover's mathlib4 repository (over 100,000 theorems). The graph search engine is a custom C++ implementation that runs on 64 A100 GPUs. The team has released the core search algorithm on GitHub under the repository `symbolic-discovery-engine` (currently 4,200 stars). The Lean formalization of the Erdős problem proof is also available in a separate repository `erdos-proof-lean4` (1,800 stars).

Takeaway: This is not a one-off trick. The architecture is domain-agnostic and has already been applied to two other open problems in combinatorics, yielding partial results. The era of AI that can think like a mathematician has arrived.

Key Players & Case Studies

The project was led by Dr. Elena Vasquez (formerly of DeepMind's AlphaProof team) and Professor Kenji Tanaka of the Institute for Advanced Study. They assembled a cross-disciplinary team of 12 researchers: 5 mathematicians specializing in combinatorics and number theory, 4 machine learning engineers, and 3 formal verification experts.

Key Institutions and Their Roles:

| Entity | Contribution | Track Record in AI Reasoning |
|---|---|---|
| Institute for Advanced Study | Problem selection, mathematical guidance | Hosted Gödel, Einstein; first major AI collaboration |
| Lean Focused Research Organization (FRO) | Formal verification infrastructure | Maintains mathlib4; pioneered AI-verified proofs |
| Neural Symbolic Lab (Stanford) | Neural architecture design | Previous work on neural theorem proving (GPT-f) |
| OpenProof Collective | Open-source tooling | Community of 500+ mathematicians and developers |

Case Study: The Erdős Problem

The problem itself is a classic in combinatorial number theory: "Determine the maximum possible size of a set of positive integers such that all sums of distinct elements are distinct." Erdős offered $500 for a solution in 1946. Partial results were obtained over the decades—the best upper bound was O(2^(0.5n))—but a tight bound eluded everyone. The AI discovered that the true bound is exactly 2^(n/2) + O(1), and constructed a constructive proof using a novel coding-theoretic argument that no human had considered.

Comparison with Previous AI Math Breakthroughs:

| Achievement | Year | System | Type | Verification |
|---|---|---|---|---|
| Solving Erdős problem | 2025 | SDE | Full proof | Lean 4 |
| Solving IMO geometry problems | 2024 | AlphaGeometry | Partial proofs | Formal |
| Solving IMO problems (general) | 2024 | AlphaProof | Full proofs | Lean |
| Generating conjectures (e.g., for knot theory) | 2023 | Various ML | Conjectures only | None |

Data Takeaway: This is the first time an AI has solved a major open problem that was not specifically designed for AI competition. The Erdős problem was a genuine research frontier, not a benchmark.

Takeaway: The key players have demonstrated that the bottleneck is no longer AI capability but the willingness of the mathematical community to trust and engage with AI-generated proofs. The Lean verification layer is the bridge that makes trust possible.

Industry Impact & Market Dynamics

This breakthrough will reshape the landscape of AI research, scientific computing, and the business of mathematical discovery.

Market for AI-Driven Scientific Discovery:

| Segment | Current Market Size (2025) | Projected 2030 | CAGR |
|---|---|---|---|
| AI for drug discovery | $3.5B | $15B | 34% |
| AI for materials science | $1.2B | $6B | 38% |
| AI for mathematics | <$100M | $2B | 80%+ |
| AI for physics | $400M | $3B | 50% |

Data Takeaway: The mathematics segment is tiny today but poised for explosive growth as the value of formal proof generation becomes clear. The ability to automate proof discovery could unlock billions in R&D efficiency across all sciences.

Competitive Dynamics:

- DeepMind (AlphaProof, AlphaGeometry) has a head start in AI reasoning but has focused on competition-level problems. Their closed-source approach may limit adoption in the academic community.
- OpenAI (GPT-5 with reasoning) is pursuing a different path—using LLMs with chain-of-thought—but has not yet produced a formally verified proof of an open problem.
- Meta (Lean collaboration) has invested heavily in formal verification infrastructure but lacks a dedicated AI reasoning team.
- Startups: At least three startups have emerged in the last six months: ProofAI (YC S25), Theorem (Sequoia-backed), and Symbolica (a16z-backed). All are building on the SDE architecture.

Business Model Implications:

- Proof-as-a-Service: Expect cloud platforms offering AI proof generation for industrial mathematics (cryptography, error-correcting codes, optimization). Pricing could be $10,000 per verified proof for simple problems, scaling to millions for major conjectures.
- Open-Source Dominance: The SDE team has open-sourced their core engine, creating a foundation that commoditizes the technology. This mirrors the Linux effect: the infrastructure becomes free, and value shifts to applications and services.
- Talent War: Mathematicians who can work with AI systems will command premium salaries. Universities are scrambling to create joint AI-mathematics programs.

Takeaway: The market for AI-driven mathematics is about to explode, but the open-source nature of the breakthrough means that value will accrue to those who can apply the technology to specific industrial problems, not to the technology itself.

Risks, Limitations & Open Questions

Despite the triumph, significant challenges remain.

1. Scalability: The SDE solved the Erdős problem after exploring 12 million proof states. For harder problems (e.g., the Riemann Hypothesis), the search space could be exponentially larger. The neural guidance may not scale indefinitely; new algorithmic breakthroughs will be needed.

2. Interpretability: The AI's proof is correct but opaque. The Lean code is verifiable, but the reasoning process—why the AI chose certain lemmas—is not explainable in human terms. This creates a trust paradox: we can verify the output but cannot understand the thinking. For fields like medicine or engineering, this is unacceptable.

3. Hallucination in the Neural Component: While the final proof is verified, the neural network that guides the search can still propose nonsensical lemmas. If the verification layer fails (due to a bug or incomplete axioms), the system could produce incorrect results. The reliance on Lean's correctness is a single point of failure.

4. Ethical Concerns: The ability to automatically prove theorems could be weaponized. For example, finding weaknesses in cryptographic protocols or designing unbreakable codes. The dual-use nature of this technology requires careful governance.

5. The Human Cost: If AI can solve open problems, what happens to the mathematical profession? The role of the mathematician may shift from problem-solver to problem-poser and proof-interpreter. This is a profound cultural shift that many in the community are resisting.

Open Questions:

- Can this approach be extended to problems that require new axioms or definitions?
- How do we prevent AI from discovering proofs that are too long for humans to check (the "proof explosion" problem)?
- Will the mathematical community accept AI-generated proofs as legitimate contributions to knowledge?

Takeaway: The technical limitations are real but solvable. The cultural and ethical challenges are deeper and will require years of dialogue to resolve.

AINews Verdict & Predictions

This is the most significant AI milestone since AlphaGo defeated Lee Sedol in 2016. But while AlphaGo mastered a closed game with fixed rules, this AI has ventured into the open-ended, creative realm of pure mathematics. The implications are staggering.

Our Predictions:

1. Within 12 months: At least three more open problems in combinatorics and number theory will be solved by AI systems. The Erdős problem was the first domino.

2. Within 3 years: The first AI-discovered theorem will be published in a top-tier mathematics journal (Annals of Mathematics, Inventiones Mathematicae) with the AI listed as a co-author. This will spark a heated debate about authorship and credit.

3. Within 5 years: AI will become a standard tool in every mathematics department, akin to how computers replaced slide rules. Graduate students will be expected to use AI proof assistants as part of their training.

4. The biggest winner: The Lean theorem prover ecosystem. As the verification layer of choice, Lean will become the lingua franca of mathematical proof. Expect a $100M+ investment in Lean infrastructure from foundations and tech companies.

5. The biggest loser: Traditional mathematical journals that refuse to accept AI-generated proofs. They will become irrelevant as the community moves to open, verifiable, AI-friendly platforms.

What to Watch:

- The next target: The Collatz conjecture or the twin prime conjecture. Both are within reach of current systems.
- The reaction from the Fields Medal committee. Will they award a medal for an AI-assisted proof?
- The emergence of "AI-first" mathematics departments at universities like MIT, Stanford, and Cambridge.

Final Editorial Judgment: The age of machine discovery is not coming—it is here. The AI that solved the Erdős problem did not just compute; it created. It did not just find a needle in a haystack; it invented a new way to make haystacks. This is the moment when AI stopped being a tool and became a partner. The next Einstein may not be human. And that is not a threat—it is the greatest opportunity in the history of science.

More from Hacker News

常见问题

这次模型发布“AI Cracks 80-Year-Old Erdős Problem, Ushering in the Age of Machine Discovery”的核心内容是什么？

In a landmark achievement, an AI system has cracked the Erdős problem—a deceptively simple question about the distribution of sums of distinct integers that resisted all human atte…

从“How did AI solve the Erdős problem without brute force?”看，这个模型发布为什么重要？

This breakthrough rests on a sophisticated hybrid architecture that marries neural networks with symbolic reasoning—a stark departure from the dominant large language model (LLM) paradigm. The core system, which we will…

围绕“What is symbolic reasoning and why is it better than LLMs for math?”，这次模型更新对开发者和企业有什么影响？