Раунд Pre-A на $70 млн от Vbot бьет рекорды, сигнализируя о гонке за ИИ-мозг в потребительской робототехнике

11 мая 2026 г. в 20:40 AINews May 2026

Vbot (维他动力) закрыл раунд финансирования Pre-A на 500 миллионов юаней — крупнейшую единичную инвестицию в истории сектора потребительского воплощенного интеллекта. Это сигнализирует о решающем повороте капитала от промышленной робототехники к домашнему рынку, делая ставку на то, что большие языковые модели и модели мира

The article body is currently shown in English by default. You can generate the full version in this language on demand.

In a landmark deal for the embodied AI sector, Vbot (维他动力) has secured 500 million RMB (approximately $70 million) in a Pre-A funding round, setting a new record for the highest single investment in consumer-grade embodied intelligence. The round was led by a consortium of top-tier venture capital firms, signaling a profound shift in investor confidence: the belief that the next frontier for robotics is not the factory floor, but the living room. This funding is not merely a financial milestone; it represents a strategic bet on a fundamental architectural change. Past consumer robots, from R

Technical Deep Dive

Vbot’s core innovation lies not in novel hardware but in its software architecture, specifically the integration of a large language model (LLM) with a learned world model. This is a departure from traditional robotics stacks that rely on explicit state machines and SLAM (Simultaneous Localization and Mapping).

The Architecture: The system operates in a three-layer loop:
1. Perception Layer: Multi-modal inputs (RGB-D cameras, microphones, tactile sensors) are processed by a vision-language model (VLM) fine-tuned for household objects. This model, likely based on an open-source backbone like LLaVA or InternVL, outputs a structured semantic map of the environment, identifying objects (e.g., 'red mug on the kitchen counter'), their states ('the mug is empty'), and human poses.
2. Reasoning & Planning Layer: An LLM (possibly a distilled version of GPT-4 or an open-source alternative like Qwen-72B) interprets natural language commands (e.g., 'Bring me a glass of water from the kitchen'). It decomposes this into a sequence of sub-goals: 'navigate to kitchen', 'locate a clean glass', 'grasp the glass', 'navigate to the tap', 'fill the glass', 'navigate to the user'. Crucially, this is not a static plan. The LLM queries the world model to simulate the feasibility of each step in the current environment.
3. World Model & Simulation Layer: This is the most technically ambitious component. Vbot is reportedly developing a learned world model that can predict the outcomes of actions in 3D space. For example, it can simulate 'what happens if I push this cup to the edge of the table?' or 'will my arm collide with the lamp if I rotate 30 degrees?'. This is trained using a combination of real-world data and synthetic data generated by video diffusion models (e.g., Stable Video Diffusion or Sora-like models). The world model allows the robot to 'think before it acts', drastically reducing trial-and-error damage and improving safety.

Relevant Open-Source Repositories:
- Habitat 3.0 (by Meta AI): A simulation platform for training embodied agents in realistic home environments. It supports multi-agent scenarios and human-robot interaction. (Stars: ~8k). Vbot likely uses this for synthetic data generation.
- Isaac Gym (by NVIDIA): A physics simulation environment for reinforcement learning. (Stars: ~4k). Used for training low-level motor skills like grasping and walking.
- Octo (by UC Berkeley, Stanford, CMU): A large transformer-based model for robot manipulation, trained on the Open X-Embodiment dataset. (Stars: ~2k). Could serve as a base for Vbot’s manipulation policies.

Performance Benchmarks (Hypothetical, based on industry standards):
| Metric | Vbot (Target) | Current Best Consumer Robot (e.g., Amazon Astro) | Industrial Baseline (e.g., Boston Dynamics Spot) |
|---|---|---|---|
| Task Completion Rate (10 household tasks) | 85% | 45% | 95% (but in structured environments) |
| Average Planning Time (per task) | 2.5 seconds | N/A (pre-programmed) | 0.1 seconds |
| Object Recognition Accuracy (100 household items) | 92% | 70% | 99% (controlled lighting) |
| Language Command Success Rate (50 diverse commands) | 88% | 60% | N/A |

Data Takeaway: The table illustrates the trade-off. Vbot’s architecture sacrifices raw speed and precision (compared to industrial robots) for dramatic improvements in generalization and language understanding. The 85% task completion rate in a dynamic home envi

常见问题

这起“Vbot's $70M Pre-A Shatters Records, Signaling Consumer Robotics' AI Brain Race”融资事件讲了什么？

In a landmark deal for the embodied AI sector, Vbot (维他动力) has secured 500 million RMB (approximately $70 million) in a Pre-A funding round, setting a new record for the highest si…

从“What is a world model and why does Vbot need it?”看，为什么这笔融资值得关注？

这起融资事件在“How does Vbot's funding compare to Figure AI and 1X?”上释放了什么行业信号？

它通常意味着该赛道正在进入资源加速集聚期，后续值得继续关注团队扩张、产品落地、商业化验证和同类公司跟进。

Раунд Pre-A на $70 млн от Vbot бьет рекорды, сигнализируя о гонке за ИИ-мозг в потребительской робототехнике

Technical Deep Dive

Related topics

Archive

Further Reading

常见问题