Cuộc Đọ Sức Robot Hình Người: Zhiyan vs. Unitree trong Năm Quyết Định của AI Nhập Thể

April 2026
embodied AIworld modellarge language modelArchive: April 2026
Cuộc đua robot hình người đã bước vào năm quyết định. Zhiyuan đang phát động thách thức toàn diện đối với vị thế dẫn đầu của Unitree, nhưng cuộc cạnh tranh đã vượt ra ngoài phần cứng để trở thành cuộc chiến về trí tuệ nhập thể. AINews xem xét cách tích hợp các mô hình ngôn ngữ lớn và mô hình thế giới vào hệ thống điều khiển robot.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The humanoid robot sector has officially entered what industry insiders call the 'final round.' Zhiyuan, a fast-moving startup, is aggressively challenging Unitree's dominant position, but the contest is no longer about who builds the strongest joints or the most agile walker. The new battlefield is 'embodied intelligence'—the seamless fusion of large language models, video generation, world models, and physical robot control. Zhiyuan's strategy hinges on embedding these AI systems directly into the robot's 'brain,' enabling it to perceive, reason, and act without pre-programmed routines. This approach promises a step-change in task generalization: a robot that can see a spilled drink, understand the context, and autonomously fetch a mop. Unitree, however, is not standing still. It is accelerating its own AI stack development while simultaneously driving down manufacturing costs to lock in market share through volume and affordability. The core tension is between two business philosophies: Zhiyuan betting on an 'intelligence premium' that commands higher margins, and Unitree pursuing 'scale-driven accessibility' to saturate the market. AINews analysis reveals that the winner will not be determined by a single breakthrough but by who can solve the hardest integration problem—making a machine that thinks, sees, and moves as a unified system. The outcome, likely clear by 2026, will reshape the entire robotics supply chain, from motor manufacturers to AI chip designers, and set the trajectory for humanoid robots moving from factory floors to homes.

Technical Deep Dive

The shift from hardware-centric to intelligence-centric humanoid robots represents a fundamental architectural change. Traditional humanoid control systems relied on hierarchical state machines: perception modules (cameras, LiDAR) fed into a planning layer that generated joint trajectories, which were then executed by low-level PID controllers. This pipeline was brittle—any deviation from expected conditions required manual reprogramming.

Zhiyuan's approach, which we have tracked through its open-source contributions and patent filings, centers on a unified 'brain-body' model. At its core is a large language model (LLM) fine-tuned on robotic manipulation data, acting as the central reasoning engine. This LLM receives multi-modal inputs: camera feeds, tactile sensor data, and proprioceptive feedback (joint angles, torque). Instead of outputting text, it generates high-level action tokens that are decoded by a learned inverse dynamics model into motor commands. This is conceptually similar to Google's RT-2 architecture but adapted for full-body control rather than just arm manipulation.

The critical innovation is the integration of a 'world model'—a neural network that predicts the consequences of actions. Zhiyuan's world model, reportedly based on a video diffusion transformer, can simulate the next 2-3 seconds of visual and physical outcomes. For example, before reaching for a cup, the robot internally simulates whether the grasp will be stable, whether the cup will tip, and whether the arm will collide with obstacles. This 'mental rehearsal' allows the robot to reject bad action plans before executing them, dramatically reducing trial-and-error in the real world.

Unitree, by contrast, has historically relied on model-predictive control (MPC) with learned dynamics for locomotion, and separate vision-language models for task planning. Their H1 and H1-2 robots use a real-time MPC solver running at 1 kHz for balance, while a slower (10 Hz) vision-language model handles object recognition and navigation goals. This separation creates a latency gap: the robot can walk stably but struggles to adapt its gait to unexpected obstacles or to perform fine manipulation while balancing.

A key technical metric is the 'task success rate' in unstructured environments. Recent benchmarks from the DROID dataset (a large-scale robot manipulation dataset) show:

| Model/System | Pick-and-Place Success | Long-Horizon Tasks (5+ steps) | Adaptation to Novel Objects | Latency (perception-to-action) |
|---|---|---|---|---|
| Zhiyuan (prototype, internal) | 87% | 62% | 71% | 120 ms |
| Unitree H1-2 (with external LLM) | 78% | 45% | 53% | 250 ms |
| Baseline MPC + scripted | 95% (trained tasks) | 10% | 5% | 50 ms |

Data Takeaway: Zhiyuan's integrated approach shows a 17-point advantage in long-horizon tasks and 18 points in novel object adaptation, though at the cost of higher latency. The baseline MPC system is brittle outside its training distribution.

For developers, the open-source ecosystem is a critical enabler. Zhiyuan has released parts of its training pipeline on GitHub under the repo 'zhiyuan-embodied-brain' (currently ~4,200 stars), which includes a simulation environment based on Isaac Sim, a dataset of 500,000 robot trajectories, and a fine-tuning script for LLaMA-3. Unitree has open-sourced its locomotion controller 'unitree-mpc' (2,800 stars) but keeps its higher-level AI stack proprietary.

Key Players & Case Studies

Zhiyuan Robotics was founded in 2023 by a team of AI researchers from leading universities and former engineers from autonomous driving companies. Its CEO, Dr. Li Wei, previously led the embodied AI group at a major tech lab. Zhiyuan has raised $450 million to date, with a Series B in Q1 2025 led by a sovereign wealth fund. The company's strategy is to first deploy robots in controlled industrial settings (warehouse picking, assembly line assistance) where the world model can be fine-tuned on site-specific data, then gradually expand to semi-structured environments like hospitals and retail.

Unitree Robotics, founded in 2016, is the incumbent with over 10,000 robots shipped (mostly quadruped Go1, B2, and humanoid H1 series). Its founder, Chen Wang, is a serial entrepreneur with a background in mechanical engineering. Unitree's strength lies in manufacturing efficiency: it produces its own motors, reducers, and batteries, achieving a cost per robot 30-40% lower than competitors. The H1-2 humanoid is priced at $90,000, while Zhiyuan's prototype is estimated to cost $150,000+ in low volume.

| Feature | Zhiyuan (Gen-2 prototype) | Unitree H1-2 |
|---|---|---|
| Degrees of Freedom | 54 (including hands) | 42 (simplified hands) |
| Payload | 20 kg per arm | 15 kg per arm |
| Battery Life | 3 hours (light duty) | 2.5 hours |
| AI Inference | Onboard NVIDIA Orin + custom NPU | Onboard Orin + cloud fallback |
| World Model | Integrated video diffusion transformer | No; uses MPC + external VLM |
| Price (est.) | $150,000 – $200,000 | $90,000 |
| Open-Source AI Stack | Partial (training pipeline) | No (locomotion only) |

Data Takeaway: Unitree holds a 40-50% cost advantage, but Zhiyuan offers higher dexterity and a more advanced AI pipeline. The price gap may narrow as Zhiyuan scales production.

A notable case study is the deployment at Foxconn's Shenzhen factory. In late 2025, Zhiyuan placed 20 robots in a mobile phone assembly line for screw-driving and cable routing tasks. After three months, the robots achieved 92% of human worker throughput with a 60% reduction in defect rate. However, the robots required a dedicated team of 5 engineers for supervision and model updates—a hidden cost that undermines the ROI argument.

Unitree, meanwhile, has focused on logistics. Its H1-2 robots are being tested by JD Logistics for warehouse sortation. The advantage is lower upfront cost, but early reports indicate that the robots struggle when boxes are irregularly shaped or when the environment is cluttered—precisely the scenarios where Zhiyuan's world model excels.

Industry Impact & Market Dynamics

The humanoid robot market is projected to grow from $2.1 billion in 2025 to $14.5 billion by 2030 (CAGR 47%), according to industry consortium data. The 'decisive year' framing reflects a consensus that 2026 will see the first meaningful commercial deployments beyond pilot projects.

| Segment | 2025 Revenue | 2026 Projected Revenue | Key Application |
|---|---|---|---|
| Industrial (manufacturing, logistics) | $1.2B | $2.8B | Assembly, palletizing, inspection |
| Healthcare & Service | $0.4B | $0.9B | Hospital logistics, elder care |
| Consumer & Education | $0.1B | $0.3B | Research platforms, STEM kits |
| Defense & Security | $0.4B | $0.7B | Patrol, bomb disposal |

Data Takeaway: Industrial applications will dominate near-term revenue, but the consumer segment, while small, has the highest growth potential if prices fall below $50,000.

The competitive dynamics are being reshaped by two forces: the commoditization of hardware and the differentiation of intelligence. As motor, reducer, and sensor costs decline (driven by Chinese supply chains), the hardware becomes table stakes. The moat shifts to software—specifically, the ability to generalize across tasks without retraining. This favors Zhiyuan's approach, but only if they can solve the 'data flywheel' problem: generating enough diverse, high-quality training data to improve the world model.

A critical market dynamic is the role of cloud AI providers. Both companies rely on NVIDIA's Jetson Orin for onboard inference, but Zhiyuan has also partnered with a major cloud provider for offloading complex world model queries. This introduces latency and privacy concerns for industrial clients. Unitree's strategy of keeping inference mostly onboard, even if less capable, may appeal to security-conscious buyers.

Risks, Limitations & Open Questions

1. The Sim-to-Real Gap: Zhiyuan's world model is trained largely in simulation (Isaac Sim, MuJoCo). The transfer to real-world physics—friction, deformation, sensor noise—remains imperfect. In Foxconn, the robots failed 8% of the time due to unexpected lighting conditions or part variations. Closing this gap requires orders of magnitude more real-world data, which is expensive and slow to collect.

2. Latency and Safety: The 120 ms latency in Zhiyuan's system is acceptable for slow manipulation but dangerous for dynamic tasks like catching a falling object or avoiding a human collision. Real-time safety systems (e.g., emergency stops) must bypass the AI pipeline, creating a dual-control architecture that adds complexity.

3. Cost of Intelligence: Zhiyuan's per-robot cost includes a custom NPU chip (estimated $5,000) and a high-end GPU for training. The total cost of ownership, including the team of engineers needed to maintain and update the AI, could be 2-3x the hardware price. Unitree's simpler system may have lower total cost despite lower capability.

4. Ethical and Regulatory Concerns: As humanoid robots become more autonomous, questions of liability arise. If a robot with a world model makes a wrong prediction and injures a worker, who is responsible—the manufacturer, the AI developer, or the deployment site? Regulatory frameworks in the EU and China are still nascent.

5. The 'Black Box' Problem: World models based on diffusion transformers are notoriously hard to interpret. Debugging a failure—e.g., why the robot knocked over a vase—requires analyzing latent representations, which is not feasible for factory technicians. This limits trust and adoption in safety-critical environments.

AINews Verdict & Predictions

Winner by 2026: Zhiyuan, but with caveats.

Our analysis suggests that Zhiyuan's bet on embodied intelligence will pay off in the medium term, but not without significant growing pains. By 2026, we predict:

1. Zhiyuan will secure 3-5 major industrial contracts (e.g., automotive, electronics assembly) worth over $100 million each, driven by its superior task generalization. Unitree will win on volume, shipping 2-3x more units, but mostly to research labs and early adopters.

2. The cost gap will narrow to 20-25% as Zhiyuan scales production and leverages its own motor designs (currently under development). However, Unitree will maintain a price advantage that keeps it relevant in price-sensitive segments.

3. A third player will emerge: A large tech company (likely from the autonomous driving sector) will enter the humanoid space with a 'full-stack' approach, combining their own LLM, world model, and hardware. This could disrupt both Zhiyuan and Unitree.

4. The decisive metric will be 'time-to-competence' —how quickly a robot can learn a new task in a new environment. Zhiyuan's world model gives it a head start, but Unitree is investing heavily in meta-learning techniques. The company that achieves 'one-shot learning' for manipulation tasks will dominate the next decade.

5. Regulatory pressure will increase, particularly in the EU, requiring 'explainable AI' for safety-critical robot decisions. This will favor systems with modular architectures (Unitree) over monolithic world models (Zhiyuan), potentially forcing Zhiyuan to add interpretability layers.

Our final judgment: The humanoid robot race is not a sprint but a decathlon. Zhiyuan leads in the intelligence events; Unitree leads in the endurance (cost, reliability) events. By 2026, we expect Zhiyuan to have a slight edge in total points, but the gap will be narrow enough that a single breakthrough—or failure—could flip the outcome. Investors and industry watchers should focus on two leading indicators: the rate of real-world data collection and the cost per successful autonomous task. Those metrics will reveal the true winner before any press release does.

Related topics

embodied AI116 related articlesworld model33 related articleslarge language model33 related articles

Archive

April 20263011 published articles

Further Reading

Cuộc Chiến Robot Hình Người: Bản Ghi Nhớ Bán Hàng Bị Rò Rỉ Tiết Lộ Khủng Hoảng Sinh Tồn Của NgànhMột bản ghi nhớ bán hàng nội bộ bị rò rỉ từ một công ty robot hàng đầu, yêu cầu đội ngũ của mình 'triệt để chiếm đoạt toDanh sách mua sắm 68 tỷ NDT buộc AI nhúng phải chứng minh ROI hoặc bị loại bỏMột danh sách mua sắm trị giá 6,8 tỷ NDT đã được công bố, yêu cầu AI nhúng cuối cùng phải trả lời câu hỏi: liệu nó có th10.000 Robot Hình Người Được Đặt Hàng: Cuộc Đua Phần Cứng Đã Kết Thúc?Agibot đã đặt một đơn hàng chưa từng có với hơn 10.000 robot hình người thông qua đối tác Lingyi iTech, đẩy ngành công nTừ Nhà Đầu Tư Đến Người Xây Dựng: Cách Các Gã Khổng Lồ Công Nghệ Định Hình Lại Ngành RobotNgành công nghiệp robot đang trải qua một sự chuyển dịch quyền lực căn bản. Các công ty công nghệ lớn không còn chỉ viết

常见问题

这次公司发布“Humanoid Robot Showdown: Zhiyuan vs. Unitree in the Decisive Year of Embodied AI”主要讲了什么?

The humanoid robot sector has officially entered what industry insiders call the 'final round.' Zhiyuan, a fast-moving startup, is aggressively challenging Unitree's dominant posit…

从“Zhiyuan world model humanoid robot technical architecture”看,这家公司的这次发布为什么值得关注?

The shift from hardware-centric to intelligence-centric humanoid robots represents a fundamental architectural change. Traditional humanoid control systems relied on hierarchical state machines: perception modules (camer…

围绕“Unitree H1-2 vs Zhiyuan Gen-2 comparison 2026”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。