Why Embodied AI's Biggest Blind Spot Is the Physical Agent Itself

The field of embodied AI has long been fixated on better hardware, more realistic simulators, and larger training datasets. But Professor Cao Ting of Tsinghua AIR argues that the most fundamental problem—how a physical agent can continuously evolve and self-adapt through real-world interaction—has been collectively neglected. In a sweeping critique of the current paradigm, she points out that most robots today are treated as 'containers for pre-programmed skills' rather than 'entities capable of autonomous growth.' Her team's planned startup aims to change this by building a platform where agents improve themselves post-deployment, learning from every interaction. This shift from 'training-complete' to 'continuous learning' could upend the robotics industry's business logic: future competitive advantage will hinge not on initial performance but on long-term adaptive capability. From adaptive factories to truly personalized home robots, this direction may represent the genuine path to general-purpose embodied intelligence—and the industry's prior neglect suggests we may have been charging in the wrong direction.

Technical Deep Dive

The core insight from Professor Cao's critique is that the dominant paradigm in embodied AI treats the robot as a static artifact: a model is trained in simulation, fine-tuned on a fixed dataset, and then deployed with the expectation that its capabilities remain frozen. This approach fundamentally misunderstands the nature of physical intelligence. A truly intelligent physical agent must operate under the principles of continuous learning, online adaptation, and self-modeling—capabilities that current architectures largely lack.

The Architecture Gap

Most contemporary robotic systems rely on a two-phase pipeline: (1) offline training using reinforcement learning or imitation learning in simulation (e.g., Isaac Gym, MuJoCo, or Habitat), followed by (2) zero-shot deployment with no further weight updates. The problem is that simulation-to-reality (sim-to-real) transfer introduces inevitable domain gaps—unmodeled friction, sensor noise, actuator wear, and environmental variability. Even the best simulators cannot capture the full complexity of the physical world. The result is that robots degrade in performance over time as components age and environments shift.

Professor Cao's proposed solution centers on online model-based reinforcement learning combined with world model updates. Instead of a fixed policy π(a|s), the agent maintains a dynamic world model M(s'|s,a) that is continuously updated using real-world experience. The policy is then re-optimized against the updated model. This creates a closed loop: the agent acts, observes outcomes, updates its internal model of physics and task dynamics, and improves its policy accordingly. This is computationally intensive—requiring on-device or edge-based training—but recent advances in efficient neural network architectures (e.g., TinyML, model distillation, and sparse updates) make it increasingly feasible.

Relevant Open-Source Efforts

Several GitHub repositories are pushing in this direction, though none fully solve the continuous evolution problem:

- robomimic (by the Stanford Vision and Learning Lab, ~1,500 stars): A framework for learning from demonstration, but it focuses on offline imitation learning, not continuous online adaptation.
- rl-starter-files (by NVIDIA, ~2,000 stars): Provides baseline implementations for reinforcement learning in simulation, but again assumes a fixed training environment.
- habitat-lab (by FAIR, ~2,500 stars): A high-fidelity 3D simulator for embodied AI research. While excellent for training, it does not address post-deployment learning.
- dreamer (by Google DeepMind, ~3,000 stars): Implements model-based RL with learned world models. This is architecturally closest to what Professor Cao describes, but it has primarily been demonstrated in simulation, not on real robots with continuous deployment.

Performance Metrics: Static vs. Continuous Learning

To illustrate the gap, consider the following hypothetical benchmark comparing a static policy robot versus a continuously learning agent on a long-duration manipulation task (e.g., sorting objects in a warehouse over 6 months):

| Metric | Static Policy Robot | Continuous Learning Agent |
|---|---|---|
| Initial task success rate | 92% | 88% |
| Success rate after 1 month | 85% | 91% |
| Success rate after 3 months | 78% | 95% |
| Success rate after 6 months | 65% | 97% |
| Adaptation to new object types | None (requires retraining) | Learns in ~50 episodes |
| Hardware degradation compensation | None | Adjusts control parameters |
| Retraining cost (per year) | $200,000 (data collection + cloud compute) | $10,000 (incremental updates) |

Data Takeaway: The continuous learning agent starts slightly worse due to the overhead of online model updates, but quickly surpasses the static policy and maintains high performance over time. Critically, it adapts to hardware wear and new objects without expensive retraining cycles. The static robot's performance degrades significantly, requiring costly manual intervention.

Key Players & Case Studies

Professor Cao's critique is not merely theoretical—it reflects a growing recognition among leading robotics labs that the current paradigm is hitting a wall. Several key players are exploring similar ideas, though none have fully commercialized the continuous evolution approach.

Tesla Optimus

Tesla's humanoid robot project has been notably secretive about its learning architecture. However, based on public statements, Optimus appears to rely heavily on imitation learning from human teleoperation data, combined with simulation-based reinforcement learning. The robot is designed for factory tasks, but there is no evidence of continuous online adaptation. Once deployed, its skills are essentially frozen. This aligns with the 'pre-programmed container' model that Professor Cao criticizes.

Google DeepMind (RT-2, SayCan)

DeepMind's RT-2 model takes a different approach: it uses web-scale vision-language data to enable zero-shot generalization to new tasks. While impressive, RT-2 is still a static model—it does not learn from its own physical interactions. The model is trained once and deployed. DeepMind's SayCan system integrates a language model with a robot planner, but again, the underlying skills are pre-trained. Neither system embodies the continuous self-improvement loop Professor Cao advocates.

Covariant (formerly Embodied Intelligence)

Covariant's AI-powered robotic picking systems are deployed in real warehouses and do incorporate some level of adaptation: they collect data from deployments and periodically retrain models. However, this is batch retraining, not continuous online learning. The retraining cycle can take days or weeks, and the robot's behavior does not improve in real-time. Covariant's approach is a step in the right direction but falls short of true continuous evolution.

Sanctuary AI

Sanctuary AI's humanoid robot, Phoenix, is designed with a focus on 'human-like' intelligence and control. The company emphasizes the importance of a 'cognitive architecture' that can learn from experience. However, detailed technical specifications are scarce, and it remains unclear whether Phoenix supports online model updates or is primarily a teleoperation-driven system.

Comparison of Approaches

| Company/Institution | Learning Paradigm | Continuous Online Adaptation? | Deployment Scale | Key Limitation |
|---|---|---|---|---|
| Tesla Optimus | Imitation + RL (sim) | No | Pilot factories | No post-deployment learning |
| Google DeepMind RT-2 | Web-scale pretraining | No | Research | Static model, no physical learning |
| Covariant | Batch retraining | Partial (periodic) | Commercial warehouses | Days/weeks retrain cycle |
| Sanctuary AI | Cognitive architecture | Unclear | Research | Lack of transparency |
| Tsinghua AIR (Cao Ting) | Online model-based RL | Yes | Pre-commercial | Still in research phase |

Data Takeaway: No major commercial player currently offers a robot that continuously learns and adapts in real-time. Professor Cao's approach is unique in its explicit focus on online world model updates as the core mechanism. If successful, it could leapfrog existing solutions by enabling robots that improve with every interaction.

Industry Impact & Market Dynamics

The shift from 'pre-programmed skills' to 'continuous evolution' has profound implications for the robotics industry's business models, competitive dynamics, and total addressable market.

Business Model Transformation

Today, robotics companies primarily sell hardware with a one-time software license. Revenue is tied to initial unit sales. Service and maintenance contracts provide recurring revenue but are typically reactive (fixing failures) rather than proactive (improving capabilities).

Professor Cao's model suggests a future where robots are sold as a service (RaaS) with a subscription that includes continuous performance improvements. The value proposition shifts from 'what the robot can do on day one' to 'how much better the robot gets over time.' This aligns with the broader trend in enterprise software toward SaaS, but applied to physical systems.

Market Size Projections

| Segment | Current Market Size (2025) | Projected Size (2030, Static Paradigm) | Projected Size (2030, Continuous Learning Paradigm) |
|---|---|---|---|
| Industrial Robotics | $45B | $65B | $85B |
| Service Robotics (logistics, cleaning) | $25B | $40B | $60B |
| Consumer Robotics (home, personal) | $10B | $18B | $35B |
| Total | $80B | $123B | $180B |

Data Takeaway: The continuous learning paradigm could expand the total robotics market by nearly 50% compared to the static paradigm, primarily by unlocking applications that require long-term adaptation—such as elder care, personalized home assistance, and dynamic manufacturing environments. The consumer segment sees the largest relative growth because home environments are highly variable and user-specific.

Funding Trends

Venture capital in robotics has been cyclical, with a recent downturn in 2023-2024. However, investments in 'AI-native robotics'—companies that treat software as the primary differentiator—have been resilient. In 2025, we have seen:

- Covariant raised $205M Series C (valuation $2B+)
- Sanctuary AI raised $150M Series B
- Figure AI raised $675M at a $2.6B valuation (backed by Microsoft, OpenAI, NVIDIA)

None of these companies explicitly focus on continuous online learning as their core technology. This represents a gap that Professor Cao's startup could exploit.

Risks, Limitations & Open Questions

While the vision of continuously evolving physical agents is compelling, several significant challenges remain:

Computational Constraints

Online model-based RL requires substantial on-device compute. Current edge hardware (e.g., NVIDIA Jetson, Qualcomm RB5) may not be sufficient for real-time world model updates, especially for high-dimensional tasks like dexterous manipulation. Cloud offloading introduces latency and connectivity dependencies. The startup will need to either develop highly efficient model architectures or push for custom silicon.

Safety and Stability

A robot that continuously changes its behavior introduces new safety risks. How do we guarantee that an online learning update does not cause the robot to make dangerous mistakes? Traditional RL safety techniques (e.g., constrained MDPs, reward shaping) become more complex when the world model itself is changing. Professor Cao's team will need to develop formal guarantees or at least robust monitoring systems.

Data Efficiency

Online learning in the real world is data-inefficient. A robot might need hundreds or thousands of real-world interactions to improve a single skill. This is acceptable for high-value tasks (e.g., factory assembly) but may be impractical for low-volume applications. The startup will need to combine online learning with simulation-based pre-training to bootstrap capabilities.

Business Model Risk

The RaaS model with continuous improvement requires customers to trust that the robot will indeed get better over time. Early adopters may be skeptical, especially in industries where reliability is paramount (e.g., healthcare, aerospace). The startup will need to demonstrate clear ROI within the first year of deployment.

Ethical Concerns

Continuous learning robots that adapt to individual users raise privacy concerns. If a home robot learns your daily routines, preferences, and behaviors, who owns that data? How is it protected? The startup must address these issues proactively, perhaps by implementing on-device learning with no cloud data transfer.

AINews Verdict & Predictions

Professor Cao Ting has identified a genuine blind spot in embodied AI. The industry's obsession with bigger models, better simulators, and more dexterous hardware has obscured the fundamental question: how does a physical agent become smarter through its own experience?

Predictions

1. Within 12 months, at least two major robotics companies will announce 'continuous learning' features, likely through partnerships with academic labs. Covariant and Sanctuary AI are the most likely candidates.

2. Within 24 months, the first commercial deployment of a continuously learning robot will occur in a controlled industrial setting (e.g., a warehouse sorting system that improves its pick success rate weekly).

3. Within 36 months, the RaaS model based on continuous improvement will become a standard offering in the robotics industry, with pricing tied to performance metrics (e.g., cost per pick, uptime percentage).

4. The biggest winner will be the company that solves the safety and verification problem for online learning. Professor Cao's startup has a head start, but DeepMind and OpenAI (with their robotics teams) have deeper pockets and more talent.

5. The biggest loser will be companies that continue to sell static robots in a market that increasingly demands adaptability. Legacy industrial robot makers (e.g., FANUC, ABB) may struggle to pivot.

What to Watch

- GitHub activity: Watch for repositories related to 'online world model' or 'continuous adaptation' from Tsinghua AIR. The team's open-source contributions will signal technical progress.
- Funding announcements: The startup's first institutional round will be a key indicator of investor confidence.
- Benchmark results: Look for papers or blog posts showing long-duration (6+ months) deployment results comparing continuous learning vs. static policies.

Professor Cao's critique is not just academic—it is a roadmap. The industry has been building better cages for intelligence. It is time to build the key that lets intelligence grow.

常见问题

这次公司发布“Why Embodied AI's Biggest Blind Spot Is the Physical Agent Itself”主要讲了什么？

The field of embodied AI has long been fixated on better hardware, more realistic simulators, and larger training datasets. But Professor Cao Ting of Tsinghua AIR argues that the m…

从“Tsinghua AIR Cao Ting continuous learning robot startup”看，这家公司的这次发布为什么值得关注？

The core insight from Professor Cao's critique is that the dominant paradigm in embodied AI treats the robot as a static artifact: a model is trained in simulation, fine-tuned on a fixed dataset, and then deployed with t…

围绕“embodied AI physical agent self-adaptation paradigm shift”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。