Waarom Alibaba en Tencent racen om te investeren in DeepSeek's AI-toekomst

In the white-hot crucible of China's AI competition, DeepSeek has emerged as the most sought-after startup, drawing simultaneous investment interest from tech behemoths Alibaba and Tencent. This is far more than a simple portfolio diversification play. At its core, DeepSeek has achieved what many thought impossible: delivering model performance that rivals frontier systems like GPT-4 and Claude 3.5 while using a fraction of the compute cost. By pioneering a Mixture-of-Experts (MoE) architecture and aggressively open-sourcing its models, DeepSeek has created a technical and community-driven moat that threatens to disrupt the centralized, proprietary model paradigm championed by incumbents. For Alibaba and Tencent, investing in DeepSeek is a dual-motive strategy: a defensive move to prevent a rival from owning this efficiency breakthrough, and an offensive maneuver to integrate DeepSeek's cost-effective models into their cloud services (Alibaba Cloud, Tencent Cloud), e-commerce (Taobao, JD), social (WeChat), and enterprise tools. The deeper logic is that whoever controls the most efficient, widely-adopted open-source model will define the default infrastructure for the next wave of AI applications. This investment race is not about who writes the biggest check, but about who can successfully internalize DeepSeek's technical DNA into their own ecosystem, thereby dictating the rules of the AI application layer for years to come.

Technical Deep Dive

DeepSeek's technical breakthrough centers on its efficient use of the Mixture-of-Experts (MoE) architecture, a design that activates only a subset of a model's parameters for any given input. While models like GPT-4 are also believed to use MoE, DeepSeek has taken the approach to an extreme with its DeepSeek-V2 and V3 series. The key innovation is a novel attention mechanism and a load-balancing strategy that allows for a massive total parameter count (e.g., 671B total parameters) while keeping the active parameters per token relatively low (around 37B). This directly translates to drastically lower inference costs—reportedly as low as $0.14 per million tokens for the API, compared to $2.50 or more for comparable proprietary models.

From an engineering perspective, DeepSeek's training pipeline is also noteworthy. The team has published detailed technical reports on their training infrastructure, including the use of FP8 mixed-precision training and optimized communication protocols that allow them to train on a cluster of 2,048 NVIDIA H800 GPUs with near-linear scaling efficiency. This is a significant achievement given the well-known challenges of distributed training across thousands of accelerators. The open-source community has responded enthusiastically; the `deepseek-ai/DeepSeek-V2` repository on GitHub has garnered over 15,000 stars, and derivative fine-tuned models are appearing rapidly. The architecture's efficiency makes it particularly attractive for on-device and edge deployments, a domain where larger, denser models struggle.

| Model | Total Parameters | Active Parameters | MMLU Score | Inference Cost (per 1M tokens) |
|---|---|---|---|---|
| DeepSeek-V2 | 671B | 37B | 78.5 | $0.14 |
| GPT-4 Turbo | ~1.7T (est.) | ~200B (est.) | 86.4 | $10.00 |
| Claude 3.5 Sonnet | — | — | 88.3 | $3.00 |
| Llama 3.1 405B | 405B | 405B | 87.3 | $2.50 |

Data Takeaway: DeepSeek-V2 achieves a MMLU score within 10 points of the best proprietary models while costing over 70x less per token than GPT-4 Turbo. This cost-performance ratio is the core technical justification for the investment frenzy.

Key Players & Case Studies

The investment race involves two distinct strategic approaches from Alibaba and Tencent. Alibaba, through its cloud division (Alibaba Cloud), has been aggressively building its own proprietary model family, Qwen. The Qwen2-72B model is competitive, but Alibaba recognizes that no single model can dominate all verticals. By investing in DeepSeek, Alibaba gains access to a complementary, highly efficient model that can be offered as a lower-cost alternative on its cloud platform, especially for price-sensitive SMEs. This mirrors Amazon's strategy of offering both its own Titan models and hosting third-party models like Anthropic's Claude on AWS.

Tencent's calculus is different. Tencent's core strength is its massive social and gaming ecosystem (WeChat, QQ, Honor of Kings). The company has been slower to release a flagship foundational model, instead focusing on application-layer integrations. Investing in DeepSeek gives Tencent a direct line to a state-of-the-art model that can be fine-tuned for WeChat's mini-program ecosystem, customer service bots, and content recommendation. The efficiency of DeepSeek is critical here: running a 671B-parameter model at scale for billions of WeChat users would be prohibitively expensive with dense models, but DeepSeek's MoE architecture makes it economically viable.

| Investor | Core Business | Strategic Goal with DeepSeek | Competing Model |
|---|---|---|---|
| Alibaba | Cloud, E-commerce | Offer low-cost cloud AI inference, compete with AWS | Qwen2-72B |
| Tencent | Social, Gaming, Payments | Integrate into WeChat ecosystem, enable real-time AI features | Hunyuan (internal) |
| ByteDance | Social (Douyin), Content | Has not invested; relies on internal 'Doubao' model | Doubao |

Data Takeaway: The table reveals a clear pattern: companies without a dominant foundational model (Tencent) are investing to catch up, while those with strong internal models (Alibaba) are investing to hedge and expand their cloud offerings. ByteDance's absence is notable, suggesting it believes its internal model is sufficient.

Industry Impact & Market Dynamics

The DeepSeek investment wave is reshaping China's AI landscape in three profound ways. First, it is accelerating the commoditization of large language models. DeepSeek's open-source release and low-cost API are forcing all competitors—including Baidu's ERNIE, SenseTime's SenseNova, and even Alibaba's Qwen—to slash prices. In the past six months, API inference costs in China have dropped by over 80%, directly benefiting downstream application developers.

Second, it is shifting the center of gravity from model performance to model efficiency. The narrative is no longer about who has the smartest model, but who can deliver the most intelligence per dollar. This favors startups like DeepSeek that have optimized for inference cost from day one. The market for AI inference is projected to grow from $10 billion in 2024 to over $70 billion by 2028, and DeepSeek is positioning itself as the default engine for this wave.

Third, the investment is a geopolitical signal. Both Alibaba and Tencent are under intense pressure from the US export controls on advanced GPUs (H100, B200). DeepSeek's ability to achieve frontier-level performance using the less powerful H800 chips (which are still available to Chinese firms) demonstrates a path forward that is less dependent on cutting-edge hardware. This makes DeepSeek not just a commercial asset but a national strategic asset.

| Metric | Pre-DeepSeek (2023) | Post-DeepSeek (2024) | Change |
|---|---|---|---|
| Avg. API cost per 1M tokens (China) | $2.00 | $0.35 | -82.5% |
| Number of open-source LLMs >70B | 3 | 12 | +300% |
| Inference market size (China, $B) | 1.5 | 3.2 | +113% |

Data Takeaway: The market data confirms that DeepSeek's entry has triggered a price war and an explosion in open-source model availability, directly correlating with a doubling of the Chinese inference market size in just one year.

Risks, Limitations & Open Questions

Despite the euphoria, significant risks remain. The most immediate is the sustainability of DeepSeek's cost advantage. The MoE architecture, while efficient for inference, is notoriously difficult to train and requires careful tuning of the routing mechanism. If a competitor like Alibaba or Tencent develops a more efficient architecture (e.g., based on state-space models like Mamba), DeepSeek's lead could evaporate.

There is also the question of model alignment and safety. DeepSeek's open-source nature means that anyone can fine-tune it for malicious purposes, including generating disinformation or creating harmful chatbots. The Chinese government's increasingly stringent AI regulations could impose liability on model developers for downstream misuse, potentially chilling DeepSeek's open-source strategy.

Furthermore, the talent retention risk is real. DeepSeek's core team is small (estimated at less than 200 people), and with Alibaba and Tencent now holding equity stakes, there is a danger of brain drain as these larger companies attempt to poach key researchers. The startup's culture of rapid, independent research may clash with the slower, compliance-heavy processes of its new investors.

Finally, the US-China tech decoupling poses an existential threat. If further export controls cut off access to even H800 GPUs, DeepSeek's training pipeline would be severely disrupted. The company's reliance on NVIDIA hardware, even if optimized, remains a single point of failure.

AINews Verdict & Predictions

DeepSeek represents the most significant inflection point in the Chinese AI landscape since the release of ChatGPT. Our editorial judgment is that the investment from Alibaba and Tencent will not lead to a bidding war, but rather to a coordinated, multi-stakeholder ownership structure that allows DeepSeek to remain independent while providing its technology to both ecosystems. This is the optimal outcome for DeepSeek: it gets the capital and distribution of two giants without being absorbed by either.

Our specific predictions:
1. Within 12 months, DeepSeek will release a multimodal version (DeepSeek-VL) that achieves competitive performance on vision-language benchmarks, further widening its moat.
2. Alibaba Cloud will offer DeepSeek as a first-party service by Q3 2025, undercutting its own Qwen API prices by at least 50%.
3. Tencent will integrate DeepSeek into WeChat's AI assistant before the end of 2025, enabling real-time, cost-effective conversational AI for over a billion users.
4. The open-source community will fork DeepSeek extensively, leading to a proliferation of specialized models for healthcare, finance, and legal domains, but also creating a moderation headache for the original developers.

The key metric to watch is not parameter count or benchmark scores, but inference cost per user interaction. DeepSeek has already won this metric. The question is whether it can maintain its lead as the industry inevitably copies its architecture. Our bet is that DeepSeek's head start in efficient training and its unique culture of research transparency will prove difficult to replicate, making it the foundational infrastructure layer for China's AI future.

常见问题

这起“Why Alibaba and Tencent Are Racing to Invest in DeepSeek's AI Future”融资事件讲了什么？

In the white-hot crucible of China's AI competition, DeepSeek has emerged as the most sought-after startup, drawing simultaneous investment interest from tech behemoths Alibaba and…

从“Why is DeepSeek's Mixture-of-Experts architecture cheaper than dense models?”看，为什么这笔融资值得关注？

DeepSeek's technical breakthrough centers on its efficient use of the Mixture-of-Experts (MoE) architecture, a design that activates only a subset of a model's parameters for any given input. While models like GPT-4 are…

这起融资事件在“How does DeepSeek's investment compare to other Chinese AI startups like Zhipu AI or Baichuan?”上释放了什么行业信号？

它通常意味着该赛道正在进入资源加速集聚期，后续值得继续关注团队扩张、产品落地、商业化验证和同类公司跟进。