DeepSeek, 가격 전쟁의 반항자에서 중국 기술 거물들이 지원하는 AI 인프라로 변신

DeepSeek, once a scrappy offshoot of quant firm High-Flyer, has completed a remarkable metamorphosis. It started as an open-source model team, then became the engine of China's large-model price war, slashing inference costs to near-zero. Now, in a move that redefines the industry's power structure, three of China's largest tech conglomerates—Huawei, Tencent, and Alibaba—are jointly backing DeepSeek as a foundational AI infrastructure layer. This is not a simple investment; it is a strategic realignment. The trio, fierce competitors in cloud, e-commerce, and social media, have found common ground in the belief that AI's future depends on a universally accessible, low-cost, and open foundation. DeepSeek's proven ability to deliver competitive performance at a fraction of the cost—its V3 model reportedly achieves GPT-4-class results on MMLU with 90% less training compute—makes it the ideal candidate. The deal signals that China's AI industry is moving away from a winner-take-all model of proprietary giants toward a shared utility. For DeepSeek, the challenge is no longer survival but maintaining its open-source ethos and technical neutrality while being courted by three titans with diverging agendas. This is arguably the most significant structural development in Chinese AI since the launch of the first homegrown LLMs.

Technical Deep Dive

DeepSeek's technical architecture is the primary reason it has been selected as the infrastructure backbone. The team pioneered a Mixture-of-Experts (MoE) approach that dramatically reduces the computational cost of both training and inference. Their DeepSeek-V2 model, for instance, uses a novel Multi-head Latent Attention (MLA) mechanism that compresses the key-value cache by up to 75%, enabling much longer context windows without proportional memory growth. This is a direct engineering solution to the quadratic memory problem that plagues standard Transformer architectures.

On the open-source front, the DeepSeek-Coder series has become a staple for developers. The repository on GitHub has crossed 15,000 stars, and its specialized code models consistently outperform similarly sized models from CodeLlama and StarCoder on HumanEval benchmarks. The key innovation here is the use of a fill-in-the-middle (FIM) training objective combined with repository-level data deduplication, which reduces hallucination rates in code generation by approximately 30%.

Benchmark Comparison: DeepSeek vs. Competitors

| Model | Parameters (Active) | MMLU Score | HumanEval Pass@1 | Cost per 1M Tokens (Inference) | Training Cost (Estimated) |
|---|---|---|---|---|---|
| DeepSeek-V2 | 21B (MoE) | 78.5 | 72.6 | $0.14 | $2.8M |
| GPT-4 | ~200B (est.) | 86.4 | 67.0 | $10.00 | $100M+ |
| Llama 3 70B | 70B | 82.0 | 81.7 | $0.95 | $15M+ |
| Qwen2-72B | 72B | 84.2 | 79.3 | $1.20 | $10M+ |

Data Takeaway: DeepSeek achieves 90% of GPT-4's MMLU performance at less than 2% of the inference cost and a fraction of the training budget. This cost-performance ratio is the technical foundation for its infrastructure candidacy.

Furthermore, DeepSeek's training infrastructure is built on a custom-designed HPC cluster using Huawei Ascend 910B chips, a fact that makes it politically and logistically viable for the Huawei-backed infrastructure push. The team has published detailed logs of their training stability techniques, including a novel gradient checkpointing strategy that reduces memory spikes during MoE routing. This transparency is rare and builds trust among developers.

Key Players & Case Studies

The three titans—Huawei, Tencent, and Alibaba—each bring distinct assets to the table, and their motivations are as different as their core businesses.

- Huawei provides the hardware layer. Its Ascend 910B and upcoming 910C chips are the only viable domestic alternative to NVIDIA's H100. By backing DeepSeek, Huawei ensures its silicon has a flagship model optimized for its architecture, creating a reference implementation for enterprise customers. Huawei's CLOUD division will likely offer DeepSeek as a managed service, competing directly with AWS and Azure on price.

- Tencent brings the application ecosystem. With WeChat, QQ, and a massive gaming portfolio, Tencent needs a cost-effective, customizable LLM to embed into its products. DeepSeek's open-source nature allows Tencent to fine-tune the model on its proprietary data without vendor lock-in. Tencent Cloud will also serve DeepSeek, but its primary interest is internal deployment.

- Alibaba contributes the data center scale and the largest public cloud in China. Alibaba Cloud already hosts Qwen, its own model family. By also hosting DeepSeek, Alibaba is hedging its bets. It signals to the market that it values openness over proprietary control, a strategic move to attract AI startups that distrust single-vendor lock-in.

Comparison of Cloud AI Offerings

| Provider | Flagship Model | DeepSeek Support | Price per 1M Tokens (DeepSeek-V2) | Key Differentiator |
|---|---|---|---|---|
| Huawei Cloud | Pangu | Native (Ascend optimized) | $0.10 (subsidized) | Hardware-software co-design |
| Alibaba Cloud | Qwen2 | Hosted API | $0.14 | Largest public cloud market share |
| Tencent Cloud | Hunyuan | Hosted API + WeChat integration | $0.12 | Social graph data access |

Data Takeaway: The three clouds are offering DeepSeek at near-cost prices, indicating a strategic play to capture developer mindshare rather than immediate profit. The race is to become the default platform for the next wave of AI applications.

Notable researcher contributions: DeepSeek's core team, led by Liang Wenfeng, has published extensively on the trade-offs between dense and sparse models. Their 2024 paper on "Scaling MoE with Dynamic Routing" is considered a foundational text in the field. The team's willingness to share negative results—such as the failure modes of certain expert balancing strategies—has earned them respect in the academic community.

Industry Impact & Market Dynamics

This tripartite backing will reshape China's AI market in several profound ways.

First, it accelerates the commoditization of LLM inference. With three hyperscalers competing to offer DeepSeek at the lowest price, the cost per token will likely drop below $0.10 per million tokens within six months. This will unlock use cases that were previously uneconomical, such as real-time conversational AI for customer service, AI-powered code review in CI/CD pipelines, and personalized education tutors.

Second, it creates a de facto standard for model interoperability. As more developers build on DeepSeek, the ecosystem of fine-tuning tools, adapters, and evaluation suites will coalesce around this model family. This is analogous to how Linux became the standard for server operating systems—not because it was technically superior, but because it was open and backed by multiple vendors.

Third, it pressures other Chinese AI labs—such as Baidu's ERNIE, ByteDance's Doubao, and Zhipu AI's GLM—to either join the infrastructure coalition or differentiate on specialized verticals. Baidu, for instance, may focus on autonomous driving and search-specific models, while ByteDance will likely double down on recommendation and content generation.

Market Growth Projections

| Year | China LLM Market Size (USD) | DeepSeek Market Share (Est.) | Average Cost per 1M Tokens | Number of Deployed AI Applications |
|---|---|---|---|---|
| 2024 | $2.1B | 5% | $0.50 | 150,000 |
| 2025 | $4.5B | 18% | $0.20 | 450,000 |
| 2026 | $8.0B | 30% | $0.08 | 1.2M |

Data Takeaway: The infrastructure model predicts a 4x increase in application deployment within two years, driven primarily by cost reduction. DeepSeek's market share is projected to grow sixfold as it becomes the default choice for new projects.

Risks, Limitations & Open Questions

Despite the optimism, significant risks remain.

Technical Limitations: DeepSeek's MoE architecture, while efficient, introduces latency variance. For real-time applications like voice assistants, the dynamic routing can cause unpredictable response times. The team is working on a fixed-routing variant, but it is not yet production-ready.

Governance and Neutrality: The three backers are direct competitors in many markets. How will DeepSeek handle feature requests or model updates that favor one partner over another? There is already tension: Alibaba wants DeepSeek to prioritize e-commerce fine-tuning, while Tencent wants better multi-turn dialogue for social apps. DeepSeek's leadership must establish a transparent governance model, perhaps an independent foundation, to avoid becoming a puppet.

Geopolitical Risk: Huawei's involvement makes DeepSeek a target for US export controls. If the US expands chip restrictions to cover any model trained on Huawei hardware, DeepSeek could be cut off from global developers. The team has already preemptively open-sourced weights under a permissive license, but future versions may face restrictions.

Open Source Sustainability: DeepSeek's current business model relies on selling inference API access. If the three clouds undercut each other to zero margins, DeepSeek itself may struggle to fund continued research. The company needs to develop value-added services—such as enterprise security features, custom fine-tuning pipelines, and SLAs—that generate revenue without compromising its open-core philosophy.

AINews Verdict & Predictions

This is the most important strategic move in Chinese AI since the release of the first GPT-3-class model. DeepSeek is not just a company; it is becoming a protocol. Our editorial board makes the following predictions:

1. By Q3 2025, DeepSeek will be the most deployed LLM in China by total inference volume, surpassing Baidu's ERNIE and Alibaba's Qwen combined. The cost advantage is simply too large to ignore.

2. A formal DeepSeek Foundation will be established within 12 months, modeled after the Linux Foundation or the Apache Software Foundation. This will be necessary to manage contributions from the three backers and ensure neutral governance.

3. The price war will shift from model inference to model customization. As DeepSeek becomes infrastructure, the competitive battleground will move to fine-tuning tools, data pipelines, and vertical-specific adapters. Companies like Hugging Face and Replicate will face new competition from Chinese equivalents.

4. The biggest loser will be proprietary model vendors who cannot match the cost curve. Baidu's ERNIE and ByteDance's Doubao will need to pivot to niche, high-value applications (e.g., legal, medical, financial compliance) where accuracy and data privacy command a premium.

5. Global implications: This model of infrastructure-backed open-source AI will be replicated in other regions. Expect a similar consortium to emerge in Southeast Asia and possibly Europe, where local champions will band together to counter US hyperscalers.

DeepSeek's journey from lone wolf to shared infrastructure is a masterclass in strategic positioning. It proves that in the age of AI, the most powerful player is not the one with the biggest model, but the one that makes AI the most accessible. The era of the AI utility has begun.

常见问题

这次公司发布“DeepSeek Transforms From Price War Rebel to AI Infrastructure Backed by China's Tech Titans”主要讲了什么？

DeepSeek, once a scrappy offshoot of quant firm High-Flyer, has completed a remarkable metamorphosis. It started as an open-source model team, then became the engine of China's lar…

从“DeepSeek AI infrastructure partnership with Huawei Tencent Alibaba”看，这家公司的这次发布为什么值得关注？

DeepSeek's technical architecture is the primary reason it has been selected as the infrastructure backbone. The team pioneered a Mixture-of-Experts (MoE) approach that dramatically reduces the computational cost of both…

围绕“DeepSeek open source MoE model cost comparison”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。