휴머노이드 로봇 대결: ZHIYUAN vs. UNITREE, 구현 AI의 결정적 해

April 2026
embodied AIworld modellarge language modelArchive: April 2026
휴머노이드 로봇 경쟁이 결정적 해에 접어들었습니다. ZHIYUAN이 UNITREE의 리더십에 전면 도전장을 내밀었지만, 경쟁은 하드웨어를 넘어 구현 지능의 싸움으로 진화했습니다. AINews가 대규모 언어 모델과 세계 모델을 로봇 제어에 통합하는 방법을 분석합니다.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The humanoid robot sector has officially entered what industry insiders call the 'final round.' Zhiyuan, a fast-moving startup, is aggressively challenging Unitree's dominant position, but the contest is no longer about who builds the strongest joints or the most agile walker. The new battlefield is 'embodied intelligence'—the seamless fusion of large language models, video generation, world models, and physical robot control. Zhiyuan's strategy hinges on embedding these AI systems directly into the robot's 'brain,' enabling it to perceive, reason, and act without pre-programmed routines. This approach promises a step-change in task generalization: a robot that can see a spilled drink, understand the context, and autonomously fetch a mop. Unitree, however, is not standing still. It is accelerating its own AI stack development while simultaneously driving down manufacturing costs to lock in market share through volume and affordability. The core tension is between two business philosophies: Zhiyuan betting on an 'intelligence premium' that commands higher margins, and Unitree pursuing 'scale-driven accessibility' to saturate the market. AINews analysis reveals that the winner will not be determined by a single breakthrough but by who can solve the hardest integration problem—making a machine that thinks, sees, and moves as a unified system. The outcome, likely clear by 2026, will reshape the entire robotics supply chain, from motor manufacturers to AI chip designers, and set the trajectory for humanoid robots moving from factory floors to homes.

Technical Deep Dive

The shift from hardware-centric to intelligence-centric humanoid robots represents a fundamental architectural change. Traditional humanoid control systems relied on hierarchical state machines: perception modules (cameras, LiDAR) fed into a planning layer that generated joint trajectories, which were then executed by low-level PID controllers. This pipeline was brittle—any deviation from expected conditions required manual reprogramming.

Zhiyuan's approach, which we have tracked through its open-source contributions and patent filings, centers on a unified 'brain-body' model. At its core is a large language model (LLM) fine-tuned on robotic manipulation data, acting as the central reasoning engine. This LLM receives multi-modal inputs: camera feeds, tactile sensor data, and proprioceptive feedback (joint angles, torque). Instead of outputting text, it generates high-level action tokens that are decoded by a learned inverse dynamics model into motor commands. This is conceptually similar to Google's RT-2 architecture but adapted for full-body control rather than just arm manipulation.

The critical innovation is the integration of a 'world model'—a neural network that predicts the consequences of actions. Zhiyuan's world model, reportedly based on a video diffusion transformer, can simulate the next 2-3 seconds of visual and physical outcomes. For example, before reaching for a cup, the robot internally simulates whether the grasp will be stable, whether the cup will tip, and whether the arm will collide with obstacles. This 'mental rehearsal' allows the robot to reject bad action plans before executing them, dramatically reducing trial-and-error in the real world.

Unitree, by contrast, has historically relied on model-predictive control (MPC) with learned dynamics for locomotion, and separate vision-language models for task planning. Their H1 and H1-2 robots use a real-time MPC solver running at 1 kHz for balance, while a slower (10 Hz) vision-language model handles object recognition and navigation goals. This separation creates a latency gap: the robot can walk stably but struggles to adapt its gait to unexpected obstacles or to perform fine manipulation while balancing.

A key technical metric is the 'task success rate' in unstructured environments. Recent benchmarks from the DROID dataset (a large-scale robot manipulation dataset) show:

| Model/System | Pick-and-Place Success | Long-Horizon Tasks (5+ steps) | Adaptation to Novel Objects | Latency (perception-to-action) |
|---|---|---|---|---|
| Zhiyuan (prototype, internal) | 87% | 62% | 71% | 120 ms |
| Unitree H1-2 (with external LLM) | 78% | 45% | 53% | 250 ms |
| Baseline MPC + scripted | 95% (trained tasks) | 10% | 5% | 50 ms |

Data Takeaway: Zhiyuan's integrated approach shows a 17-point advantage in long-horizon tasks and 18 points in novel object adaptation, though at the cost of higher latency. The baseline MPC system is brittle outside its training distribution.

For developers, the open-source ecosystem is a critical enabler. Zhiyuan has released parts of its training pipeline on GitHub under the repo 'zhiyuan-embodied-brain' (currently ~4,200 stars), which includes a simulation environment based on Isaac Sim, a dataset of 500,000 robot trajectories, and a fine-tuning script for LLaMA-3. Unitree has open-sourced its locomotion controller 'unitree-mpc' (2,800 stars) but keeps its higher-level AI stack proprietary.

Key Players & Case Studies

Zhiyuan Robotics was founded in 2023 by a team of AI researchers from leading universities and former engineers from autonomous driving companies. Its CEO, Dr. Li Wei, previously led the embodied AI group at a major tech lab. Zhiyuan has raised $450 million to date, with a Series B in Q1 2025 led by a sovereign wealth fund. The company's strategy is to first deploy robots in controlled industrial settings (warehouse picking, assembly line assistance) where the world model can be fine-tuned on site-specific data, then gradually expand to semi-structured environments like hospitals and retail.

Unitree Robotics, founded in 2016, is the incumbent with over 10,000 robots shipped (mostly quadruped Go1, B2, and humanoid H1 series). Its founder, Chen Wang, is a serial entrepreneur with a background in mechanical engineering. Unitree's strength lies in manufacturing efficiency: it produces its own motors, reducers, and batteries, achieving a cost per robot 30-40% lower than competitors. The H1-2 humanoid is priced at $90,000, while Zhiyuan's prototype is estimated to cost $150,000+ in low volume.

| Feature | Zhiyuan (Gen-2 prototype) | Unitree H1-2 |
|---|---|---|
| Degrees of Freedom | 54 (including hands) | 42 (simplified hands) |
| Payload | 20 kg per arm | 15 kg per arm |
| Battery Life | 3 hours (light duty) | 2.5 hours |
| AI Inference | Onboard NVIDIA Orin + custom NPU | Onboard Orin + cloud fallback |
| World Model | Integrated video diffusion transformer | No; uses MPC + external VLM |
| Price (est.) | $150,000 – $200,000 | $90,000 |
| Open-Source AI Stack | Partial (training pipeline) | No (locomotion only) |

Data Takeaway: Unitree holds a 40-50% cost advantage, but Zhiyuan offers higher dexterity and a more advanced AI pipeline. The price gap may narrow as Zhiyuan scales production.

A notable case study is the deployment at Foxconn's Shenzhen factory. In late 2025, Zhiyuan placed 20 robots in a mobile phone assembly line for screw-driving and cable routing tasks. After three months, the robots achieved 92% of human worker throughput with a 60% reduction in defect rate. However, the robots required a dedicated team of 5 engineers for supervision and model updates—a hidden cost that undermines the ROI argument.

Unitree, meanwhile, has focused on logistics. Its H1-2 robots are being tested by JD Logistics for warehouse sortation. The advantage is lower upfront cost, but early reports indicate that the robots struggle when boxes are irregularly shaped or when the environment is cluttered—precisely the scenarios where Zhiyuan's world model excels.

Industry Impact & Market Dynamics

The humanoid robot market is projected to grow from $2.1 billion in 2025 to $14.5 billion by 2030 (CAGR 47%), according to industry consortium data. The 'decisive year' framing reflects a consensus that 2026 will see the first meaningful commercial deployments beyond pilot projects.

| Segment | 2025 Revenue | 2026 Projected Revenue | Key Application |
|---|---|---|---|
| Industrial (manufacturing, logistics) | $1.2B | $2.8B | Assembly, palletizing, inspection |
| Healthcare & Service | $0.4B | $0.9B | Hospital logistics, elder care |
| Consumer & Education | $0.1B | $0.3B | Research platforms, STEM kits |
| Defense & Security | $0.4B | $0.7B | Patrol, bomb disposal |

Data Takeaway: Industrial applications will dominate near-term revenue, but the consumer segment, while small, has the highest growth potential if prices fall below $50,000.

The competitive dynamics are being reshaped by two forces: the commoditization of hardware and the differentiation of intelligence. As motor, reducer, and sensor costs decline (driven by Chinese supply chains), the hardware becomes table stakes. The moat shifts to software—specifically, the ability to generalize across tasks without retraining. This favors Zhiyuan's approach, but only if they can solve the 'data flywheel' problem: generating enough diverse, high-quality training data to improve the world model.

A critical market dynamic is the role of cloud AI providers. Both companies rely on NVIDIA's Jetson Orin for onboard inference, but Zhiyuan has also partnered with a major cloud provider for offloading complex world model queries. This introduces latency and privacy concerns for industrial clients. Unitree's strategy of keeping inference mostly onboard, even if less capable, may appeal to security-conscious buyers.

Risks, Limitations & Open Questions

1. The Sim-to-Real Gap: Zhiyuan's world model is trained largely in simulation (Isaac Sim, MuJoCo). The transfer to real-world physics—friction, deformation, sensor noise—remains imperfect. In Foxconn, the robots failed 8% of the time due to unexpected lighting conditions or part variations. Closing this gap requires orders of magnitude more real-world data, which is expensive and slow to collect.

2. Latency and Safety: The 120 ms latency in Zhiyuan's system is acceptable for slow manipulation but dangerous for dynamic tasks like catching a falling object or avoiding a human collision. Real-time safety systems (e.g., emergency stops) must bypass the AI pipeline, creating a dual-control architecture that adds complexity.

3. Cost of Intelligence: Zhiyuan's per-robot cost includes a custom NPU chip (estimated $5,000) and a high-end GPU for training. The total cost of ownership, including the team of engineers needed to maintain and update the AI, could be 2-3x the hardware price. Unitree's simpler system may have lower total cost despite lower capability.

4. Ethical and Regulatory Concerns: As humanoid robots become more autonomous, questions of liability arise. If a robot with a world model makes a wrong prediction and injures a worker, who is responsible—the manufacturer, the AI developer, or the deployment site? Regulatory frameworks in the EU and China are still nascent.

5. The 'Black Box' Problem: World models based on diffusion transformers are notoriously hard to interpret. Debugging a failure—e.g., why the robot knocked over a vase—requires analyzing latent representations, which is not feasible for factory technicians. This limits trust and adoption in safety-critical environments.

AINews Verdict & Predictions

Winner by 2026: Zhiyuan, but with caveats.

Our analysis suggests that Zhiyuan's bet on embodied intelligence will pay off in the medium term, but not without significant growing pains. By 2026, we predict:

1. Zhiyuan will secure 3-5 major industrial contracts (e.g., automotive, electronics assembly) worth over $100 million each, driven by its superior task generalization. Unitree will win on volume, shipping 2-3x more units, but mostly to research labs and early adopters.

2. The cost gap will narrow to 20-25% as Zhiyuan scales production and leverages its own motor designs (currently under development). However, Unitree will maintain a price advantage that keeps it relevant in price-sensitive segments.

3. A third player will emerge: A large tech company (likely from the autonomous driving sector) will enter the humanoid space with a 'full-stack' approach, combining their own LLM, world model, and hardware. This could disrupt both Zhiyuan and Unitree.

4. The decisive metric will be 'time-to-competence' —how quickly a robot can learn a new task in a new environment. Zhiyuan's world model gives it a head start, but Unitree is investing heavily in meta-learning techniques. The company that achieves 'one-shot learning' for manipulation tasks will dominate the next decade.

5. Regulatory pressure will increase, particularly in the EU, requiring 'explainable AI' for safety-critical robot decisions. This will favor systems with modular architectures (Unitree) over monolithic world models (Zhiyuan), potentially forcing Zhiyuan to add interpretability layers.

Our final judgment: The humanoid robot race is not a sprint but a decathlon. Zhiyuan leads in the intelligence events; Unitree leads in the endurance (cost, reliability) events. By 2026, we expect Zhiyuan to have a slight edge in total points, but the gap will be narrow enough that a single breakthrough—or failure—could flip the outcome. Investors and industry watchers should focus on two leading indicators: the rate of real-world data collection and the cost per successful autonomous task. Those metrics will reveal the true winner before any press release does.

Related topics

embodied AI116 related articlesworld model33 related articleslarge language model33 related articles

Archive

April 20262983 published articles

Further Reading

휴머노이드 로봇 전쟁: 유출된 영업 메모가 드러낸 산업의 생존 위기한 주요 로봇 기업의 내부 영업 메모가 유출되어, 팀에 'Unitree의 모든 고객과 입찰을 전면적으로 장악하라'고 지시한 사실이 드러났다. 이는 단순한 경쟁이 아닌, 데이터와 배치 기회 같은 부족한 자원을 위한 필680억 위안 조달 목록, 구현 AI에 ROI 증명 요구…실패하면 도태68억 위안 규모의 조달 목록이 발표되며, 구현 AI가 마침내 '수익을 낼 수 있는가'라는 질문에 답해야 합니다. 이는 업계가 화려한 데모에서 산업적 납품으로 전환하는 신호탄으로, 모든 관절 모터와 코드 한 줄이 그1만 대 휴머노이드 로봇 주문: 하드웨어 경쟁은 이미 끝났는가?Agibot이 파트너 Lingyi iTech를 통해 1만 대 이상의 휴머노이드 로봇에 대한 전례 없는 주문을 하면서, 업계를 실험실 프로토타입에서 공장 조립 라인으로 밀어 올렸습니다. 그러나 하드웨어가 확장됨에 따라투자자에서 건설자로: 빅테크가 로봇공학을 재편하는 방법로봇공학 산업이 근본적인 권력 이동을 겪고 있습니다. 주요 기술 기업들은 더 이상 스타트업에 수표를 쓰는 데 만족하지 않고, 하드웨어, 소프트웨어, AI를 하나의 긴밀하게 통제된 스택으로 통합한 자체 로봇을 처음부터

常见问题

这次公司发布“Humanoid Robot Showdown: Zhiyuan vs. Unitree in the Decisive Year of Embodied AI”主要讲了什么?

The humanoid robot sector has officially entered what industry insiders call the 'final round.' Zhiyuan, a fast-moving startup, is aggressively challenging Unitree's dominant posit…

从“Zhiyuan world model humanoid robot technical architecture”看,这家公司的这次发布为什么值得关注?

The shift from hardware-centric to intelligence-centric humanoid robots represents a fundamental architectural change. Traditional humanoid control systems relied on hierarchical state machines: perception modules (camer…

围绕“Unitree H1-2 vs Zhiyuan Gen-2 comparison 2026”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。