自由能量原則:驅動生命、AI與AGI的隱藏演算法

May 2026
Archive: May 2026
熱力學預言了不可避免的混沌,但生命與智慧卻持續創造秩序。AINews揭示自由能量原則——一種普遍的生存演算法——如何推動從被動預測到全像世界模型的典範轉移,透過因果推論解鎖通往AGI之路。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The second law of thermodynamics paints a grim picture of universal decay, but the existence of life and intelligence stands as an elegant rebellion against this fate. At the heart of this rebellion lies the free energy principle (FEP)—a unifying framework that AINews has identified as the fundamental survival algorithm operating from single-celled organisms to the most advanced artificial intelligence systems. In the AI domain, this principle is birthing a revolutionary paradigm: the holographic world model. Unlike traditional AI models that passively predict static outputs, holographic world models treat every observation as a projection of deep latent variables, much like a hologram encodes three-dimensional information in two-dimensional interference patterns. This architecture enables AI systems to compress vast sensory data into compact, actionable representations, minimizing computational cost while maximizing predictive accuracy. For AGI, this represents a leap from pattern matching to genuine causal understanding: an intelligent agent must infer the hidden rules generating its observations, reducing uncertainty about its environment to 'survive.' We observe that this is the underlying logic driving the convergence of large language models, video generation, and world models—they all strive to minimize the discrepancy between predicted and actual data streams. This breakthrough reframes AI design from parameter stacking to building elegant generative models, fundamentally resetting the philosophy of what intelligence is and how it emerges from thermodynamic necessity.

Technical Deep Dive

The free energy principle, formalized by neuroscientist Karl Friston, posits that any self-organizing system—biological or artificial—must minimize a quantity called variational free energy. This is not a metaphorical analogy but a mathematical necessity derived from the thermodynamics of nonequilibrium steady states. In essence, a system's internal states encode a generative model of its environment, and the system acts to minimize the surprise (or prediction error) between its model's predictions and actual sensory input. The holographic world model extends this concept by asserting that the generative model itself is a compressed, distributed representation—much like a hologram stores a 3D scene in 2D interference fringes. In AI terms, this means the model's latent space is not a simple vector but a structured, low-dimensional manifold that can be 'replayed' to reconstruct observations.

From an engineering perspective, this maps directly to variational autoencoders (VAEs) and their descendants. A VAE learns a probabilistic encoder that maps input data to a latent distribution and a decoder that reconstructs the data from samples of that distribution. The loss function is precisely the variational free energy: a reconstruction term (accuracy) plus a KL divergence term (complexity penalty). The holographic twist comes from enforcing that the latent representation is not just a point but a rich, structured code that captures causal factors. Recent work on hierarchical VAEs, such as the Nouveau VAE (nVAE) and the Vector-Quantized VAE (VQ-VAE), demonstrates how to build such models. The open-source repository for VQ-VAE-2 (github.com/deepmind/vq-vae-2, over 3,000 stars) shows how discrete latent codes can learn compositional representations. Similarly, the DreamerV3 repository (github.com/google-research/dreamerv3, over 2,500 stars) implements a world model that learns a latent dynamics model from pixels, enabling planning through imagination—a direct application of FEP to reinforcement learning.

| Model | Architecture | Latent Type | Reconstruction Loss | KL Divergence | Key Metric (Atari 100k) |
|---|---|---|---|---|---|
| DreamerV3 | RSSM + CNN | Continuous | MSE | 0.5 (fixed) | Mean human-normalized score: 1.37 |
| VQ-VAE-2 | Hierarchical VQ-VAE | Discrete | MSE + Perceptual | 0.0 (fixed) | FID on ImageNet: 4.3 |
| MaskGIT | Masked Transformer | Discrete | Cross-entropy | N/A | FID on ImageNet: 6.2 |
| TransDreamer | Transformer RSSM | Continuous | MSE | 0.5 (fixed) | Mean human-normalized score: 1.21 |

Data Takeaway: DreamerV3's superior performance on Atari benchmarks (1.37x human-normalized score) demonstrates that explicit free energy minimization through a recurrent state-space model (RSSM) yields more robust world models than simpler VAE variants. The discrete latent approaches (VQ-VAE-2, MaskGIT) trade some reconstruction fidelity for better compositionality, which is crucial for causal reasoning.

The critical innovation is that holographic world models do not just predict the next frame; they infer the underlying causal structure. For example, a model observing a bouncing ball does not just learn pixel transitions; it learns the latent variables of position, velocity, and elasticity. This is achieved through a process called 'active inference,' where the model's actions are chosen to minimize expected free energy—balancing exploration (reducing uncertainty) and exploitation (achieving goals). The GitHub repository for active inference (github.com/infer-actively/pymdp, over 400 stars) provides a Python implementation of active inference agents that can be applied to simple environments, demonstrating how FEP drives goal-directed behavior without explicit reward functions.

Key Players & Case Studies

The most prominent real-world application of holographic world models is in the autonomous driving industry. Wayve, a UK-based autonomous driving startup, explicitly builds its technology on the principle of learning a world model from data. Their GAIA-1 model (Generative AI for Autonomy) is a generative world model that can predict future video frames conditioned on actions, effectively simulating the driving environment. Wayve's approach contrasts with traditional modular autonomy stacks (perception, prediction, planning) by learning an end-to-end generative model that minimizes prediction error across the entire sensory stream. This is a direct manifestation of the free energy principle: the model's internal representation is a compressed hologram of the driving world, and it acts to minimize surprise.

In the robotics domain, Google DeepMind's Dreamer series (DreamerV1, V2, V3) has become the de facto standard for model-based reinforcement learning. DreamerV3 learns a world model from pixels and uses it to train a policy entirely in imagination. The key insight is that the world model's latent state is a holographic representation—it encodes not just the current observation but the dynamics of the environment. This allows the agent to plan multiple steps ahead, reducing the free energy of its predictions. The open-source release of DreamerV3 has been adopted by researchers and companies for tasks ranging from robotic manipulation to game playing.

| Company/Project | Application | Core Technology | Stage | Funding/Adoption |
|---|---|---|---|---|
| Wayve | Autonomous driving | GAIA-1 generative world model | Commercial (UK roads) | $1.3B total funding (2024) |
| Google DeepMind | Robotics, games | DreamerV3 | Research/Applied | Internal use; open-source |
| OpenAI | Video generation, reasoning | Sora (world model) | Research preview | N/A |
| Nvidia | Simulation, digital twins | Cosmos world model platform | Commercial | Part of Omniverse ecosystem |
| MIT CSAIL | Cognitive science, AI | Active inference agents | Research | Academic |

Data Takeaway: The table reveals a clear bifurcation: companies like Wayve and Nvidia are commercializing world models for specific verticals (driving, simulation), while DeepMind and OpenAI are pushing the research frontier. The $1.3B funding for Wayve underscores investor confidence that holographic world models can solve the long-tail problem in autonomous driving—handling rare, surprising events by minimizing free energy through continuous learning.

Another notable player is Nvidia with its Cosmos platform, which provides a world model foundation for physical AI. Cosmos generates synthetic video data for training robots and autonomous vehicles, effectively acting as a 'generative physics engine.' This aligns with the FEP view: the model learns the latent rules of physics (gravity, friction, object permanence) and can generate infinite variations, minimizing the free energy of real-world deployment by pre-training on simulated data.

Industry Impact & Market Dynamics

The shift from discriminative to generative world models is reshaping the AI industry's competitive landscape. The market for world models is projected to grow from $2.1 billion in 2024 to $12.8 billion by 2030, according to industry estimates, driven by autonomous systems, robotics, and simulation. This growth is fueled by the recognition that traditional supervised learning approaches are hitting a data wall—they cannot generalize to out-of-distribution scenarios. Holographic world models, by learning causal structure, offer a path to true generalization.

| Segment | 2024 Market Size | 2030 Projected Size | CAGR | Key Drivers |
|---|---|---|---|---|
| Autonomous Driving | $0.8B | $4.5B | 28% | Long-tail safety, simulation |
| Robotics | $0.5B | $3.2B | 30% | General-purpose manipulation |
| Video Generation | $0.3B | $2.1B | 32% | Content creation, gaming |
| Simulation & Digital Twins | $0.5B | $3.0B | 29% | Industrial training, design |

Data Takeaway: The compound annual growth rates (CAGR) of 28-32% across all segments indicate that world models are not a niche technology but a foundational shift. The highest growth in video generation reflects the convergence of generative AI and world modeling—models like Sora and Runway Gen-3 are essentially learning the physics of visual reality.

The business model implications are profound. Companies that own the best world models will own the 'operating system' for physical AI. Wayve's strategy of licensing its world model to multiple OEMs (original equipment manufacturers) rather than building its own car fleet is a direct play on this. Similarly, Nvidia's Cosmos is designed to be a platform that other companies build upon, creating a network effect: more users generate more data, which improves the world model, which attracts more users.

Risks, Limitations & Open Questions

Despite the promise, holographic world models face significant challenges. First, the computational cost of learning and inferring latent causal structures at scale is immense. DreamerV3 required 200 GPU-days to train on Atari, and scaling to real-world video (like driving) demands orders of magnitude more compute. This creates a barrier to entry for smaller players and raises questions about energy efficiency—ironically, the very thermodynamic principle that motivates FEP may be violated by the massive compute required to implement it.

Second, the problem of 'model collapse' remains unresolved. If a world model is trained on data generated by another world model (e.g., synthetic data from Cosmos), it can converge to a degenerate solution where the latent space loses diversity. This is analogous to inbreeding in biological systems—the free energy minimization becomes too efficient, and the model fails to explore novel states. Research on 'epistemic exploration' in active inference (e.g., the 'curiosity' bonus in intrinsic motivation) attempts to address this, but it remains an open problem.

Third, there is a fundamental philosophical question: can a holographic world model ever achieve true causal understanding, or is it just a more sophisticated pattern matcher? Critics argue that FEP reduces intelligence to prediction error minimization, ignoring the role of intrinsic goals, consciousness, and agency. While active inference incorporates goal-directed behavior through prior preferences, the framework still treats all action as a means to minimize free energy, which some find reductionist.

AINews Verdict & Predictions

The free energy principle and its manifestation in holographic world models represent the most coherent theoretical framework for understanding and building intelligence since the advent of deep learning. We predict that within the next three years, every major AI lab will adopt some form of world model architecture as the backbone of their flagship systems. The convergence of LLMs, video generation, and robotics into unified world models is inevitable—OpenAI's Sora and Google's Gemini are early harbingers.

Our specific predictions:
1. By 2027, the best-performing autonomous driving system will be based on a single end-to-end holographic world model, replacing the modular perception-planning-control stack. Wayve is the frontrunner, but we expect a major OEM (Toyota or Tesla) to acquire or license such technology.
2. By 2028, a 'foundation world model' will emerge—a pre-trained, open-source model that can be fine-tuned for any physical task (driving, robotics, simulation), analogous to how GPT-3 became a foundation for language tasks. The Hugging Face ecosystem will host this model, and it will have over 100,000 stars on GitHub.
3. The biggest risk is that the compute demands of these models will concentrate power in a few hyperscalers (Google, Microsoft, Amazon), stifling innovation. We call on the open-source community to prioritize efficiency—projects like 'Tiny World Models' (sub-1B parameter models) will be critical for democratization.

In conclusion, the free energy principle is not just a scientific curiosity; it is the operating manual for intelligence. The holographic world model is the first practical implementation of this manual. The race is now on to build the most faithful, efficient, and general hologram of reality. The winner will not just build AGI—they will have decoded the ultimate survival algorithm.

Archive

May 20261839 published articles

Further Reading

阿里巴巴Qwen3.5-Omni登場,引爆真正的全模態AI戰爭阿里雲發佈了突破性的『全模態』大型模型Qwen3.5-Omni,它原生整合了對文字、音訊、圖像和影片的理解與生成能力。此舉標誌著AI發展從零散、單一用途的模型,決定性地轉向統一、通用的智慧型態,為產業樹立了新標竿。AI 自行編寫程式碼:Anthropic CEO 宣佈軟體免費時代來臨Anthropic 的執行長宣稱,Claude 的最新功能幾乎完全由 AI 自行開發,人類僅提供最低限度的監督。他進一步預測,隨著 AI 將軟體開發成本推向接近零,軟體產業將進入一個免費時代,這標誌著根本性的轉變。AI資本超級循環:貝佐斯380億美元秘密創業公司、Anthropic的900億美元豪賭,以及中國GPU突破一份洩漏的募資文件顯示,傑夫·貝佐斯神秘的新AI公司在產品發布前估值已達380億美元。與此同時,Anthropic正尋求以900億美元估值募資300億美元,而阿里巴巴的平頭哥已實現自研GPU的量產。這三起事件標誌著AI資本超級循環的加速。馬爾他全民ChatGPT Plus方案、Google禁止AI污染搜尋結果、OpenAI收購Weights.gg並由Greg Brockman掌管所有產品:基礎設施時代來臨馬爾他成為首個為每位公民提供ChatGPT Plus訂閱的國家。Google宣戰,禁止AI污染搜尋結果。OpenAI收購Weights.gg,並讓Greg Brockman負責所有產品。三件事傳遞同一個訊號:AI不再是玩具,而是基礎設施。

常见问题

这次模型发布“Free Energy Principle: The Hidden Algorithm Driving Life, AI, and AGI”的核心内容是什么?

The second law of thermodynamics paints a grim picture of universal decay, but the existence of life and intelligence stands as an elegant rebellion against this fate. At the heart…

从“free energy principle explained for AI beginners”看,这个模型发布为什么重要?

The free energy principle, formalized by neuroscientist Karl Friston, posits that any self-organizing system—biological or artificial—must minimize a quantity called variational free energy. This is not a metaphorical an…

围绕“holographic world model vs traditional neural network”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。