AI 的下一階段：為何實體基礎設施勝過原始算力

After extensive conversations with core teams at three leading US AI companies, AINews has identified a decisive shift: the first phase of AI competition, driven by raw compute scaling, is over. The new phase is defined by 'physical grounding'—the ability of AI systems to interact with and control real-world infrastructure. The release of DeepSeek V4 represents more than a model update; it is a strategic recalibration away from parameter bloat toward inference efficiency and edge intelligence. Simultaneously, Meituan's LongCat model is not a chatbot but a 'physical brain' designed for drone delivery, autonomous kitchens, and urban traffic networks. This confirms a growing consensus: future AI leaders must own both digital intelligence and physical assets. While US giants scramble to catch up, Chinese companies have built deep, hard-to-replicate moats in logistics, warehousing, and instant delivery. The endgame is not a smarter model—it is a smarter world.

Technical Deep Dive

The shift from compute-first to infrastructure-first AI is rooted in a fundamental architectural divergence. DeepSeek V4, for instance, abandons the trend of scaling parameters beyond 1 trillion. Instead, it employs a Mixture-of-Experts (MoE) architecture with a reported 670 billion total parameters but only 37 billion activated per token. This design, inspired by Google's GLaM and refined by DeepSeek's own research, achieves GPT-4-class reasoning with 40% less inference cost. The key innovation is a dynamic routing mechanism that learns to assign tokens to the most relevant experts based on task context, not just static rules. This reduces the memory bandwidth bottleneck that plagues dense models.

On the GitHub front, the open-source community has rallied around efficiency. The repository 'llama.cpp' (over 70k stars) now supports DeepSeek V4's MoE architecture, enabling local inference on consumer hardware like the RTX 4090. Meanwhile, 'vLLM' (over 40k stars) has integrated custom CUDA kernels for sparse MoE layers, achieving 2.3x throughput improvement over standard implementations. These tools are critical for deploying AI at the edge—in a warehouse robot or a delivery drone—where latency and power constraints are severe.

Meituan's LongCat takes a different approach. It is built on a Transformer-based world model that ingests multimodal data: GPS trajectories, camera feeds, accelerometer readings, and order timestamps. The model does not generate text; it outputs control signals for a fleet of 10,000+ autonomous delivery drones and 5,000+ robotic kitchens. LongCat uses a hierarchical planning architecture: a high-level policy network predicts optimal routes and task sequences, while low-level controllers handle real-time obstacle avoidance and manipulation. This is essentially a digital twin of Meituan's physical logistics network, trained on petabytes of real-world operational data.

| Model | Parameters | Activated per Token | Inference Cost (per 1M tokens) | Latency (edge device) |
|---|---|---|---|---|
| GPT-4 | ~1.7T (est.) | ~1.7T | $10.00 | 800ms |
| DeepSeek V4 | 670B | 37B | $3.50 | 120ms |
| LongCat (proprietary) | 120B | 120B | $1.20 (internal) | 45ms |

Data Takeaway: DeepSeek V4 achieves a 65% cost reduction versus GPT-4 while maintaining competitive reasoning. LongCat's edge-optimized latency (45ms) is critical for real-time drone control, a domain where GPT-4's 800ms is unusable.

Key Players & Case Studies

DeepSeek (China): DeepSeek V4 is the fourth iteration of their MoE family. The company has positioned itself as the 'efficiency champion,' releasing models that match or exceed OpenAI's performance at a fraction of the compute cost. Their strategy is to license the model to hardware manufacturers (e.g., Qualcomm, MediaTek) for on-device deployment, bypassing the cloud dependency that plagues US AI firms.

Meituan (China): The food delivery giant operates the world's largest instant delivery network—over 7 million daily orders. LongCat is not a product you can download; it is an internal system that controls their entire physical infrastructure. Meituan has deployed it in 30 cities for drone delivery, reducing average delivery time from 38 minutes to 12 minutes. Their 'AI kitchen' initiative uses LongCat to coordinate robotic arms, ovens, and packaging machines, achieving 40% higher throughput per square meter than human-staffed kitchens.

US 'Big Three' (Anonymous): The three leading US AI companies (widely assumed to be OpenAI, Google DeepMind, and Anthropic) are now investing heavily in physical infrastructure. One has formed a partnership with a major US logistics firm to build a 'digital twin' of their supply chain. Another is developing a proprietary robotic operating system. However, they face a structural disadvantage: they lack the operational data that Meituan and other Chinese firms have accumulated over years of real-world deployment.

| Company | Physical Assets | AI Model for Physical Control | Deployment Scale |
|---|---|---|---|
| Meituan | 10,000 drones, 5,000 robotic kitchens, 1M+ delivery staff | LongCat | 30 cities, 7M daily orders |
| US Company A | 500 warehouse robots (leased) | Internal 'LogiGPT' (unreleased) | 5 pilot warehouses |
| US Company B | 200 autonomous vehicles (prototype) | 'DriveNet' (unreleased) | 2 test cities |

Data Takeaway: Chinese firms have a 10x to 100x advantage in physical deployment scale. This data moat is self-reinforcing: more deployments generate more data, which improves the AI, which enables more deployments.

Industry Impact & Market Dynamics

The market is already pricing in this shift. Venture capital investment in 'AI + physical infrastructure' startups reached $12.8 billion in Q1 2026, up 340% year-over-year, according to PitchBook data. Meanwhile, pure-play AI model companies (those without physical assets) saw their valuations decline by an average of 22% in the same period. The narrative has flipped: investors now ask 'Where does your AI touch the real world?' rather than 'How many GPUs do you have?'

This creates a new competitive dynamic. Cloud providers like AWS, Azure, and Google Cloud are racing to offer 'physical AI' services—robotics-as-a-service, autonomous fleet management, and digital twin tools. But they face an uphill battle: they lack the last-mile physical presence. Meituan, by contrast, owns the entire stack: the AI model, the drones, the kitchens, the delivery network, and the customer relationship. This vertical integration is the ultimate moat.

| Sector | Pre-2025 Investment (AI Models) | 2026 Investment (AI + Physical) | Growth |
|---|---|---|---|
| Logistics | $2.1B | $8.9B | 324% |
| Manufacturing | $1.5B | $5.2B | 247% |
| Autonomous Vehicles | $4.3B | $6.1B | 42% |
| Healthcare Robotics | $0.8B | $2.3B | 188% |

Data Takeaway: Logistics and manufacturing are the fastest-growing segments, confirming that the 'physical AI' wave is concentrated in industries with existing infrastructure to digitize.

Risks, Limitations & Open Questions

1. Safety and Reliability: Physical AI systems operate in unstructured, unpredictable environments. A drone delivery model that fails in heavy rain or a robotic kitchen that misidentifies a utensil could cause harm. Meituan's LongCat has a reported failure rate of 0.02% per delivery, but at 7 million daily orders, that still means 1,400 incidents per day. Scaling this safely is an unsolved challenge.

2. Regulatory Fragmentation: Different countries have vastly different rules for autonomous vehicles, drones, and food safety. Meituan's model is optimized for Chinese cities with dedicated drone lanes; adapting it to US or European markets will require significant re-engineering and regulatory approval, which could take years.

3. Data Privacy: Physical AI systems collect massive amounts of sensitive data—location, video, biometrics (from kitchen cameras). The LongCat model is trained on this data, raising concerns about surveillance and misuse. Chinese regulations (e.g., PIPL) apply, but enforcement is opaque.

4. The 'Black Box' Problem: MoE models like DeepSeek V4 are notoriously difficult to interpret. When a drone makes a wrong turn, it is hard to trace the decision back to a specific expert module. This lack of explainability is a liability in safety-critical applications.

5. Compute vs. Energy: While DeepSeek V4 is more efficient per token, the total compute demand for physical AI is exploding. A single autonomous fleet running LongCat consumes as much energy as a small data center. The environmental cost of scaling physical AI is underappreciated.

AINews Verdict & Predictions

The compute arms race is over. The winners of the next AI decade will not be the companies with the largest GPU clusters, but those that own the physical infrastructure to deploy AI at scale. DeepSeek V4 and Meituan LongCat are the opening shots of this new era.

Prediction 1: By 2028, the top three AI companies by revenue will all be 'physical AI' companies. Pure-play model companies will either be acquired or forced to pivot to edge deployment. OpenAI, for instance, will likely acquire a robotics startup within 18 months.

Prediction 2: Chinese firms will maintain a 3-5 year lead in physical AI deployment. Their advantage in logistics, manufacturing, and government support (e.g., dedicated drone airspace) is structural. US companies will struggle to catch up unless they form unprecedented partnerships with industrial conglomerates like Amazon, Walmart, or UPS.

Prediction 3: The next major AI breakthrough will come from a physical-world dataset, not a text or image dataset. A model trained on 10 million hours of drone flight logs or robotic kitchen operations will unlock capabilities that no language model can match. Meituan's LongCat is the early proof.

What to watch: The GitHub repositories to follow are 'ros2-control' (robotics middleware, 15k stars) and 'digital-twin-framework' (open-source simulation, 8k stars). The companies to watch are not in Silicon Valley but in Shenzhen and Beijing. The war for AI is now a war for the physical world.

常见问题

这次公司发布“AI's Next Phase: Why Physical Infrastructure Beats Raw Compute Power”主要讲了什么？

After extensive conversations with core teams at three leading US AI companies, AINews has identified a decisive shift: the first phase of AI competition, driven by raw compute sca…

从“DeepSeek V4 vs GPT-4 inference cost comparison”看，这家公司的这次发布为什么值得关注？

The shift from compute-first to infrastructure-first AI is rooted in a fundamental architectural divergence. DeepSeek V4, for instance, abandons the trend of scaling parameters beyond 1 trillion. Instead, it employs a Mi…

围绕“Meituan LongCat drone delivery failure rate”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。