Why Embodied AI's Biggest Blind Spot Is the Physical Agent Itself

April 2026
embodied AI归档:April 2026
In an exclusive interview with AINews, Professor Cao Ting from Tsinghua University's Institute for AI Research (AIR) identifies a critical blind spot in embodied AI: the physical agent itself. Her team is incubating a startup that redefines value creation from one-time hardware delivery to platforms enabling continuous self-optimization after deployment.
当前正文默认显示英文版,可按需生成当前语言全文。

The field of embodied AI has long been fixated on better hardware, more realistic simulators, and larger training datasets. But Professor Cao Ting of Tsinghua AIR argues that the most fundamental problem—how a physical agent can continuously evolve and self-adapt through real-world interaction—has been collectively neglected. In a sweeping critique of the current paradigm, she points out that most robots today are treated as 'containers for pre-programmed skills' rather than 'entities capable of autonomous growth.' Her team's planned startup aims to change this by building a platform where agents improve themselves post-deployment, learning from every interaction. This shift from 'training-complete' to 'continuous learning' could upend the robotics industry's business logic: future competitive advantage will hinge not on initial performance but on long-term adaptive capability. From adaptive factories to truly personalized home robots, this direction may represent the genuine path to general-purpose embodied intelligence—and the industry's prior neglect suggests we may have been charging in the wrong direction.

Technical Deep Dive

The core insight from Professor Cao's critique is that the dominant paradigm in embodied AI treats the robot as a static artifact: a model is trained in simulation, fine-tuned on a fixed dataset, and then deployed with the expectation that its capabilities remain frozen. This approach fundamentally misunderstands the nature of physical intelligence. A truly intelligent physical agent must operate under the principles of continuous learning, online adaptation, and self-modeling—capabilities that current architectures largely lack.

The Architecture Gap

Most contemporary robotic systems rely on a two-phase pipeline: (1) offline training using reinforcement learning or imitation learning in simulation (e.g., Isaac Gym, MuJoCo, or Habitat), followed by (2) zero-shot deployment with no further weight updates. The problem is that simulation-to-reality (sim-to-real) transfer introduces inevitable domain gaps—unmodeled friction, sensor noise, actuator wear, and environmental variability. Even the best simulators cannot capture the full complexity of the physical world. The result is that robots degrade in performance over time as components age and environments shift.

Professor Cao's proposed solution centers on online model-based reinforcement learning combined with world model updates. Instead of a fixed policy π(a|s), the agent maintains a dynamic world model M(s'|s,a) that is continuously updated using real-world experience. The policy is then re-optimized against the updated model. This creates a closed loop: the agent acts, observes outcomes, updates its internal model of physics and task dynamics, and improves its policy accordingly. This is computationally intensive—requiring on-device or edge-based training—but recent advances in efficient neural network architectures (e.g., TinyML, model distillation, and sparse updates) make it increasingly feasible.

Relevant Open-Source Efforts

Several GitHub repositories are pushing in this direction, though none fully solve the continuous evolution problem:

- robomimic (by the Stanford Vision and Learning Lab, ~1,500 stars): A framework for learning from demonstration, but it focuses on offline imitation learning, not continuous online adaptation.
- rl-starter-files (by NVIDIA, ~2,000 stars): Provides baseline implementations for reinforcement learning in simulation, but again assumes a fixed training environment.
- habitat-lab (by FAIR, ~2,500 stars): A high-fidelity 3D simulator for embodied AI research. While excellent for training, it does not address post-deployment learning.
- dreamer (by Google DeepMind, ~3,000 stars): Implements model-based RL with learned world models. This is architecturally closest to what Professor Cao describes, but it has primarily been demonstrated in simulation, not on real robots with continuous deployment.

Performance Metrics: Static vs. Continuous Learning

To illustrate the gap, consider the following hypothetical benchmark comparing a static policy robot versus a continuously learning agent on a long-duration manipulation task (e.g., sorting objects in a warehouse over 6 months):

| Metric | Static Policy Robot | Continuous Learning Agent |
|---|---|---|
| Initial task success rate | 92% | 88% |
| Success rate after 1 month | 85% | 91% |
| Success rate after 3 months | 78% | 95% |
| Success rate after 6 months | 65% | 97% |
| Adaptation to new object types | None (requires retraining) | Learns in ~50 episodes |
| Hardware degradation compensation | None | Adjusts control parameters |
| Retraining cost (per year) | $200,000 (data collection + cloud compute) | $10,000 (incremental updates) |

Data Takeaway: The continuous learning agent starts slightly worse due to the overhead of online model updates, but quickly surpasses the static policy and maintains high performance over time. Critically, it adapts to hardware wear and new objects without expensive retraining cycles. The static robot's performance degrades significantly, requiring costly manual intervention.

Key Players & Case Studies

Professor Cao's critique is not merely theoretical—it reflects a growing recognition among leading robotics labs that the current paradigm is hitting a wall. Several key players are exploring similar ideas, though none have fully commercialized the continuous evolution approach.

Tesla Optimus


Tesla's humanoid robot project has been notably secretive about its learning architecture. However, based on public statements, Optimus appears to rely heavily on imitation learning from human teleoperation data, combined with simulation-based reinforcement learning. The robot is designed for factory tasks, but there is no evidence of continuous online adaptation. Once deployed, its skills are essentially frozen. This aligns with the 'pre-programmed container' model that Professor Cao criticizes.

Google DeepMind (RT-2, SayCan)


DeepMind's RT-2 model takes a different approach: it uses web-scale vision-language data to enable zero-shot generalization to new tasks. While impressive, RT-2 is still a static model—it does not learn from its own physical interactions. The model is trained once and deployed. DeepMind's SayCan system integrates a language model with a robot planner, but again, the underlying skills are pre-trained. Neither system embodies the continuous self-improvement loop Professor Cao advocates.

Covariant (formerly Embodied Intelligence)


Covariant's AI-powered robotic picking systems are deployed in real warehouses and do incorporate some level of adaptation: they collect data from deployments and periodically retrain models. However, this is batch retraining, not continuous online learning. The retraining cycle can take days or weeks, and the robot's behavior does not improve in real-time. Covariant's approach is a step in the right direction but falls short of true continuous evolution.

Sanctuary AI


Sanctuary AI's humanoid robot, Phoenix, is designed with a focus on 'human-like' intelligence and control. The company emphasizes the importance of a 'cognitive architecture' that can learn from experience. However, detailed technical specifications are scarce, and it remains unclear whether Phoenix supports online model updates or is primarily a teleoperation-driven system.

Comparison of Approaches

| Company/Institution | Learning Paradigm | Continuous Online Adaptation? | Deployment Scale | Key Limitation |
|---|---|---|---|---|
| Tesla Optimus | Imitation + RL (sim) | No | Pilot factories | No post-deployment learning |
| Google DeepMind RT-2 | Web-scale pretraining | No | Research | Static model, no physical learning |
| Covariant | Batch retraining | Partial (periodic) | Commercial warehouses | Days/weeks retrain cycle |
| Sanctuary AI | Cognitive architecture | Unclear | Research | Lack of transparency |
| Tsinghua AIR (Cao Ting) | Online model-based RL | Yes | Pre-commercial | Still in research phase |

Data Takeaway: No major commercial player currently offers a robot that continuously learns and adapts in real-time. Professor Cao's approach is unique in its explicit focus on online world model updates as the core mechanism. If successful, it could leapfrog existing solutions by enabling robots that improve with every interaction.

Industry Impact & Market Dynamics

The shift from 'pre-programmed skills' to 'continuous evolution' has profound implications for the robotics industry's business models, competitive dynamics, and total addressable market.

Business Model Transformation

Today, robotics companies primarily sell hardware with a one-time software license. Revenue is tied to initial unit sales. Service and maintenance contracts provide recurring revenue but are typically reactive (fixing failures) rather than proactive (improving capabilities).

Professor Cao's model suggests a future where robots are sold as a service (RaaS) with a subscription that includes continuous performance improvements. The value proposition shifts from 'what the robot can do on day one' to 'how much better the robot gets over time.' This aligns with the broader trend in enterprise software toward SaaS, but applied to physical systems.

Market Size Projections

| Segment | Current Market Size (2025) | Projected Size (2030, Static Paradigm) | Projected Size (2030, Continuous Learning Paradigm) |
|---|---|---|---|
| Industrial Robotics | $45B | $65B | $85B |
| Service Robotics (logistics, cleaning) | $25B | $40B | $60B |
| Consumer Robotics (home, personal) | $10B | $18B | $35B |
| Total | $80B | $123B | $180B |

Data Takeaway: The continuous learning paradigm could expand the total robotics market by nearly 50% compared to the static paradigm, primarily by unlocking applications that require long-term adaptation—such as elder care, personalized home assistance, and dynamic manufacturing environments. The consumer segment sees the largest relative growth because home environments are highly variable and user-specific.

Funding Trends

Venture capital in robotics has been cyclical, with a recent downturn in 2023-2024. However, investments in 'AI-native robotics'—companies that treat software as the primary differentiator—have been resilient. In 2025, we have seen:

- Covariant raised $205M Series C (valuation $2B+)
- Sanctuary AI raised $150M Series B
- Figure AI raised $675M at a $2.6B valuation (backed by Microsoft, OpenAI, NVIDIA)

None of these companies explicitly focus on continuous online learning as their core technology. This represents a gap that Professor Cao's startup could exploit.

Risks, Limitations & Open Questions

While the vision of continuously evolving physical agents is compelling, several significant challenges remain:

Computational Constraints

Online model-based RL requires substantial on-device compute. Current edge hardware (e.g., NVIDIA Jetson, Qualcomm RB5) may not be sufficient for real-time world model updates, especially for high-dimensional tasks like dexterous manipulation. Cloud offloading introduces latency and connectivity dependencies. The startup will need to either develop highly efficient model architectures or push for custom silicon.

Safety and Stability

A robot that continuously changes its behavior introduces new safety risks. How do we guarantee that an online learning update does not cause the robot to make dangerous mistakes? Traditional RL safety techniques (e.g., constrained MDPs, reward shaping) become more complex when the world model itself is changing. Professor Cao's team will need to develop formal guarantees or at least robust monitoring systems.

Data Efficiency

Online learning in the real world is data-inefficient. A robot might need hundreds or thousands of real-world interactions to improve a single skill. This is acceptable for high-value tasks (e.g., factory assembly) but may be impractical for low-volume applications. The startup will need to combine online learning with simulation-based pre-training to bootstrap capabilities.

Business Model Risk

The RaaS model with continuous improvement requires customers to trust that the robot will indeed get better over time. Early adopters may be skeptical, especially in industries where reliability is paramount (e.g., healthcare, aerospace). The startup will need to demonstrate clear ROI within the first year of deployment.

Ethical Concerns

Continuous learning robots that adapt to individual users raise privacy concerns. If a home robot learns your daily routines, preferences, and behaviors, who owns that data? How is it protected? The startup must address these issues proactively, perhaps by implementing on-device learning with no cloud data transfer.

AINews Verdict & Predictions

Professor Cao Ting has identified a genuine blind spot in embodied AI. The industry's obsession with bigger models, better simulators, and more dexterous hardware has obscured the fundamental question: how does a physical agent become smarter through its own experience?

Predictions

1. Within 12 months, at least two major robotics companies will announce 'continuous learning' features, likely through partnerships with academic labs. Covariant and Sanctuary AI are the most likely candidates.

2. Within 24 months, the first commercial deployment of a continuously learning robot will occur in a controlled industrial setting (e.g., a warehouse sorting system that improves its pick success rate weekly).

3. Within 36 months, the RaaS model based on continuous improvement will become a standard offering in the robotics industry, with pricing tied to performance metrics (e.g., cost per pick, uptime percentage).

4. The biggest winner will be the company that solves the safety and verification problem for online learning. Professor Cao's startup has a head start, but DeepMind and OpenAI (with their robotics teams) have deeper pockets and more talent.

5. The biggest loser will be companies that continue to sell static robots in a market that increasingly demands adaptability. Legacy industrial robot makers (e.g., FANUC, ABB) may struggle to pivot.

What to Watch

- GitHub activity: Watch for repositories related to 'online world model' or 'continuous adaptation' from Tsinghua AIR. The team's open-source contributions will signal technical progress.
- Funding announcements: The startup's first institutional round will be a key indicator of investor confidence.
- Benchmark results: Look for papers or blog posts showing long-duration (6+ months) deployment results comparing continuous learning vs. static policies.

Professor Cao's critique is not just academic—it is a roadmap. The industry has been building better cages for intelligence. It is time to build the key that lets intelligence grow.

相关专题

embodied AI110 篇相关文章

时间归档

April 20262875 篇已发布文章

延伸阅读

万台人形机器人订单落地:硬件竞赛已提前终结?智元机器人通过合作伙伴领益智造下达了超过1万台人形机器人的空前订单,推动行业从实验室原型迈向工厂流水线。但当硬件开始规模化,真正的考验转向了让这些机器具备实用价值的具身智能。AI双轨竞速:OpenAI的消费闪电战 vs 英伟达的机器人脑OpenAI推出低价订阅ChatGPT Go,剑指2026年1.12亿用户;英伟达发布多模态模型NEMOTRON3NANOOMNI,推理速度提升9倍,专攻具身智能。叠加软银5000亿美元与谷歌150亿美元的数据中心豪赌,AI产业正式进入消费五亿3D模型筑起的数据护城河:一家AI公司如何重塑空间智能帝国一家AI公司悄然积累了近5亿个3D模型,构建了行业最深的数据护城河。毛利率超过80%,市场份额遥遥领先——这不是资本催生的泡沫故事,而是一个自我强化的代币经济体系,正默默为空间智能与具身智能打造不可或缺的基础设施。具身智能的GPT时刻:为何仓库机器人还无法驾驭工厂车间具身智能正从受控的仓库环境迈向动态的工厂车间,但真正的“GPT时刻”——一个无需微调即可跨任务泛化的单一模型——仍遥不可及。AINews深度剖析技术飞跃、商业鸿沟,以及为何行业需要耐心而非炒作。

常见问题

这次公司发布“Why Embodied AI's Biggest Blind Spot Is the Physical Agent Itself”主要讲了什么?

The field of embodied AI has long been fixated on better hardware, more realistic simulators, and larger training datasets. But Professor Cao Ting of Tsinghua AIR argues that the m…

从“Tsinghua AIR Cao Ting continuous learning robot startup”看,这家公司的这次发布为什么值得关注?

The core insight from Professor Cao's critique is that the dominant paradigm in embodied AI treats the robot as a static artifact: a model is trained in simulation, fine-tuned on a fixed dataset, and then deployed with t…

围绕“embodied AI physical agent self-adaptation paradigm shift”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。