Technical Deep Dive
Li Yitong's move to JiYi Intelligence is a bet on solving the 'Sim-to-Real' gap — the fundamental challenge of transferring AI models trained in simulation to the messy, unpredictable real world. At Huawei, Li worked on large models for the terminal cloud, optimizing inference latency and memory footprint for on-device deployment. This experience is directly transferable to embodied AI, where models must run in real-time on resource-constrained robot hardware.
JiYi's technical stack likely revolves around a multimodal large model architecture that fuses vision, language, and proprioception (sensor data). The core challenge is grounding: ensuring that a language model's output corresponds to physically feasible actions. This requires a combination of:
- Vision-Language-Action (VLA) models: End-to-end neural networks that take in camera images and text commands, and output motor torques or joint angles. Google's RT-2 and the open-source OpenVLA (a repo on GitHub with over 5,000 stars) are leading examples.
- Reinforcement Learning with Human Feedback (RLHF) for robotics: Adapting the technique that made ChatGPT successful to reward models for safe, efficient physical behavior.
- System-level testing: Li's role explicitly includes system testing, which is critical for reliability. Unlike software bugs, a robot failure can cause physical damage or injury.
A key technical metric is the 'Task Success Rate' across diverse environments. Below is a comparison of current embodied AI models:
| Model | Task Success Rate (Tabletop Manipulation) | Latency (ms per action) | Training Data (episodes) | Open Source? |
|---|---|---|---|---|
| RT-2 (Google) | 78% | 300 | 130,000 | No |
| OpenVLA (Stanford) | 71% | 450 | 60,000 | Yes (GitHub) |
| Octo (UC Berkeley) | 65% | 500 | 800,000 | Yes (GitHub) |
| JiYi Internal (Est.) | ~60% (public demos) | 350 | Unknown | No |
Data Takeaway: The best open-source models still lag behind proprietary ones by 7-10 percentage points in task success. JiYi's investment in Li suggests they aim to close that gap with proprietary optimization, particularly in reducing latency, which is critical for real-time control.
Key Players & Case Studies
Li Yitong is not the only high-profile talent moving from consumer AI to embodied AI. The trend is industry-wide:
- Fei-Fei Li's World Labs: The Stanford professor and ImageNet creator founded World Labs in 2024, focusing on spatial intelligence. She has hired several researchers from Meta and Google.
- Figure AI: This startup poached researchers from OpenAI and Boston Dynamics, and recently raised $675 million at a $2.6 billion valuation. Their humanoid robot, Figure 01, uses a large language model for natural language interaction.
- Skild AI: Founded by former Carnegie Mellon professors, Skild raised $300 million to build a 'general-purpose brain for robots.' Their model is trained on data from over 100 different robot types.
JiYi Intelligence itself has a track record of attracting top talent. Founded in 2023 by ex-Baidu and Microsoft researchers, the company has raised over $100 million in Series B funding. Its product lineup includes:
- JiYi Arm: A 7-DOF robotic arm for precision assembly.
- JiYi Mobile Manipulator: A wheeled robot with an arm for warehouse logistics.
- JiYi Humanoid (prototype): A bipedal robot for service applications.
Below is a comparison of JiYi's key competitors:
| Company | Funding Raised | Key Product | Focus Area | Talent Source |
|---|---|---|---|---|
| JiYi Intelligence | $100M+ | JiYi Arm, Mobile Manipulator | Industrial & logistics | Huawei, Baidu, Microsoft |
| Figure AI | $675M | Figure 01 Humanoid | General-purpose | OpenAI, Boston Dynamics |
| Skild AI | $300M | Skild Brain (software) | Robot-agnostic AI | CMU, Google |
| 1X Technologies | $100M | NEO Humanoid | Home service | OpenAI, Tesla |
Data Takeaway: JiYi is a 'tier 2' player in terms of funding but has a strong focus on industrial applications, which have clearer revenue paths than general-purpose humanoids. Li's hire could help them leapfrog into the top tier by improving their AI capabilities.
Industry Impact & Market Dynamics
The talent war in embodied AI is a direct consequence of the market's explosive growth. The global robotics market is projected to reach $260 billion by 2030, with embodied AI being the fastest-growing segment. Key drivers include:
- Labor shortages: Aging populations in Japan, Germany, and the US are creating demand for robots in manufacturing, healthcare, and logistics.
- Foundation models: The success of LLMs has shown that scaling laws apply to robotics, making large-scale data collection and training economically viable.
- Hardware commoditization: Sensors, motors, and batteries are becoming cheaper, shifting the competitive advantage to software.
However, the talent pool is extremely limited. There are fewer than 10,000 researchers worldwide with deep expertise in both large language models and robotics. This has led to bidding wars:
- Salaries: Top embodied AI researchers command $500,000 to $1 million+ in total compensation, often with significant equity.
- Recruiting strategies: Companies are increasingly hiring from adjacent fields (e.g., autonomous driving, computer vision) and retraining them.
Below is a table showing the estimated talent distribution:
| Sector | Number of Researchers (Est.) | Average Salary (USD) | Growth Rate (YoY) |
|---|---|---|---|
| Consumer AI (LLMs) | 50,000+ | $300,000 | 15% |
| Autonomous Driving | 20,000 | $250,000 | 5% |
| Embodied AI | 8,000 | $450,000 | 40% |
| Traditional Robotics | 15,000 | $150,000 | 2% |
Data Takeaway: The embodied AI sector has the highest salary growth rate (40% YoY) despite having the smallest talent pool. This indicates a severe supply-demand imbalance that will persist for at least 3-5 years.
Risks, Limitations & Open Questions
Despite the excitement, several risks could derail the embodied AI boom:
1. Data Scarcity: Unlike text and images, robot data is expensive and dangerous to collect. A robot failure can cost thousands of dollars in damage. Simulation data is cheap but suffers from the Sim-to-Real gap.
2. Hardware Reliability: Current robot hardware (motors, grippers, sensors) is not yet robust enough for 24/7 operation in unstructured environments. Mean time between failures (MTBF) for advanced robots is still measured in weeks, not months.
3. Safety and Regulation: As robots become more autonomous, the risk of accidents increases. No clear regulatory framework exists for embodied AI systems, creating liability uncertainty.
4. Energy Efficiency: Running large models on battery-powered robots is a challenge. A typical humanoid robot consumes 500-1000W, limiting runtime to 2-4 hours.
Li Yitong's role at JiYi includes system testing, which directly addresses the reliability issue. However, the company must also invest in simulation infrastructure (e.g., NVIDIA Isaac Sim, MuJoCo) to generate synthetic training data.
AINews Verdict & Predictions
Li Yitong's move is a clear signal that the embodied AI industry is maturing. Here are our predictions:
1. More 'Genius Teens' will leave big tech for startups: The combination of equity upside and technical challenge is irresistible. Expect at least 3-5 more high-profile departures from Huawei, Google, and Meta in the next 12 months.
2. JiYi will release a production-grade VLA model within 18 months: Li's expertise in system testing and optimization will accelerate their timeline. Their model will likely achieve >80% task success on standardized benchmarks.
3. The talent war will lead to consolidation: Smaller startups without the resources to compete for top talent will either fail or be acquired. JiYi is well-positioned as a buyer, not a seller.
4. Regulation will catch up: By 2027, expect the first national-level regulatory framework for embodied AI, focusing on safety certification and data privacy.
The bottom line: Li Yitong's move is not just a hire; it is a strategic pivot for JiYi Intelligence. If successful, it will demonstrate that the path to general-purpose robotics runs through large language models. If it fails, it will be a cautionary tale about the difficulty of bridging the digital and physical worlds. Either way, the next 24 months will be decisive.