Pembiayaan Pra-A $70 Juta Vbot Pecah Rekod, Isyaratkan Perlumbaan AI Otak Robot Pengguna

May 2026
embodied intelligenceworld modelArchive: May 2026
Vbot (維他動力) telah menutup pusingan pembiayaan Pra-A bernilai 500 juta RMB, pelaburan tunggal terbesar pernah dicatat dalam sektor kecerdasan terwujud gred pengguna. Ini menandakan peralihan modal yang tegas daripada robotik industri ke pasaran rumah, dengan pertaruhan bahawa model bahasa besar dan model dunia akan memacu robot pengguna generasi baharu.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

In a landmark deal for the embodied AI sector, Vbot (维他动力) has secured 500 million RMB (approximately $70 million) in a Pre-A funding round, setting a new record for the highest single investment in consumer-grade embodied intelligence. The round was led by a consortium of top-tier venture capital firms, signaling a profound shift in investor confidence: the belief that the next frontier for robotics is not the factory floor, but the living room. This funding is not merely a financial milestone; it represents a strategic bet on a fundamental architectural change. Past consumer robots, from R

Technical Deep Dive

Vbot’s core innovation lies not in novel hardware but in its software architecture, specifically the integration of a large language model (LLM) with a learned world model. This is a departure from traditional robotics stacks that rely on explicit state machines and SLAM (Simultaneous Localization and Mapping).

The Architecture: The system operates in a three-layer loop:
1. Perception Layer: Multi-modal inputs (RGB-D cameras, microphones, tactile sensors) are processed by a vision-language model (VLM) fine-tuned for household objects. This model, likely based on an open-source backbone like LLaVA or InternVL, outputs a structured semantic map of the environment, identifying objects (e.g., 'red mug on the kitchen counter'), their states ('the mug is empty'), and human poses.
2. Reasoning & Planning Layer: An LLM (possibly a distilled version of GPT-4 or an open-source alternative like Qwen-72B) interprets natural language commands (e.g., 'Bring me a glass of water from the kitchen'). It decomposes this into a sequence of sub-goals: 'navigate to kitchen', 'locate a clean glass', 'grasp the glass', 'navigate to the tap', 'fill the glass', 'navigate to the user'. Crucially, this is not a static plan. The LLM queries the world model to simulate the feasibility of each step in the current environment.
3. World Model & Simulation Layer: This is the most technically ambitious component. Vbot is reportedly developing a learned world model that can predict the outcomes of actions in 3D space. For example, it can simulate 'what happens if I push this cup to the edge of the table?' or 'will my arm collide with the lamp if I rotate 30 degrees?'. This is trained using a combination of real-world data and synthetic data generated by video diffusion models (e.g., Stable Video Diffusion or Sora-like models). The world model allows the robot to 'think before it acts', drastically reducing trial-and-error damage and improving safety.

Relevant Open-Source Repositories:
- Habitat 3.0 (by Meta AI): A simulation platform for training embodied agents in realistic home environments. It supports multi-agent scenarios and human-robot interaction. (Stars: ~8k). Vbot likely uses this for synthetic data generation.
- Isaac Gym (by NVIDIA): A physics simulation environment for reinforcement learning. (Stars: ~4k). Used for training low-level motor skills like grasping and walking.
- Octo (by UC Berkeley, Stanford, CMU): A large transformer-based model for robot manipulation, trained on the Open X-Embodiment dataset. (Stars: ~2k). Could serve as a base for Vbot’s manipulation policies.

Performance Benchmarks (Hypothetical, based on industry standards):
| Metric | Vbot (Target) | Current Best Consumer Robot (e.g., Amazon Astro) | Industrial Baseline (e.g., Boston Dynamics Spot) |
|---|---|---|---|
| Task Completion Rate (10 household tasks) | 85% | 45% | 95% (but in structured environments) |
| Average Planning Time (per task) | 2.5 seconds | N/A (pre-programmed) | 0.1 seconds |
| Object Recognition Accuracy (100 household items) | 92% | 70% | 99% (controlled lighting) |
| Language Command Success Rate (50 diverse commands) | 88% | 60% | N/A |

Data Takeaway: The table illustrates the trade-off. Vbot’s architecture sacrifices raw speed and precision (compared to industrial robots) for dramatic improvements in generalization and language understanding. The 85% task completion rate in a dynamic home envi

Related topics

embodied intelligence23 related articlesworld model39 related articles

Archive

May 20261239 published articles

Further Reading

Pengeluar Robot China Menyerbu Lembah Silikon: Tiga Pertempuran Menentukan Masa Depan AI FizikalSyarikat robotik China tidak lagi hanya mengejar—mereka mentakrifkan semula peraturan AI Fizikal. Dengan menggabungkan pKemuncak AI Jensen Huang: Merancang Laluan dari LLM ke Model Dunia BerbadanDalam satu perbincangan bersejarah, Jensen Huang dari NVIDIA mengadakan forum bersama CEO daripada startup AI paling berPersimpangan Besar AI: Model Berjasad vs. Model Bahasa – Laluan Mana Yang Menang?Dalam satu malam, dua pusingan pembiayaan besar mendedahkan perpecahan asas dalam AI. Seorang pemimpin bertaruh pada robIPO Robot Generasi Pertama: Pemeriksaan Realiti Industri BermulaGelombang syarikat robot generasi pertama sedang disenaraikan di pasaran awam, memaksa industri kecerdasan berjasad bera

常见问题

这起“Vbot's $70M Pre-A Shatters Records, Signaling Consumer Robotics' AI Brain Race”融资事件讲了什么?

In a landmark deal for the embodied AI sector, Vbot (维他动力) has secured 500 million RMB (approximately $70 million) in a Pre-A funding round, setting a new record for the highest si…

从“What is a world model and why does Vbot need it?”看,为什么这笔融资值得关注?

Vbot’s core innovation lies not in novel hardware but in its software architecture, specifically the integration of a large language model (LLM) with a learned world model. This is a departure from traditional robotics s…

这起融资事件在“How does Vbot's funding compare to Figure AI and 1X?”上释放了什么行业信号?

它通常意味着该赛道正在进入资源加速集聚期,后续值得继续关注团队扩张、产品落地、商业化验证和同类公司跟进。