DeepSeek V4：開源如何改寫AI創新的規則

DeepSeek V4's release is not merely a technical milestone; it is a strategic declaration. While leading AI labs in Silicon Valley—including OpenAI, Google DeepMind, and Anthropic—have increasingly locked down their most powerful models, citing safety and competitive concerns, Chinese AI firms like DeepSeek are doubling down on open source. DeepSeek V4, trained on a novel Mixture-of-Experts (MoE) architecture with over 1 trillion parameters (sparsely activated), achieves performance on par with GPT-4o and Claude 3.5 Opus on key benchmarks like MMLU-Pro and HumanEval, yet it is released under a permissive license. This is a calculated move. By open-sourcing the model weights, training recipes, and even portions of the data pipeline, DeepSeek is not just giving away technology; it is building an ecosystem. The strategy is to turn the global developer community into a distributed R&D department. Every fine-tuned variant, every application built on DeepSeek V4, and every bug fix contributed back strengthens the model's position. This stands in stark contrast to the 'walled garden' approach of Western giants, who are increasingly monetizing access via API tokens and proprietary cloud services. The core question AINews explores is whether this open-source 'road-building' can outpace the 'wall-building' of closed systems, or if the latter's control over data and compute will prove decisive in the long run.

Technical Deep Dive

DeepSeek V4 is a technical marvel that challenges the prevailing wisdom that only closed, monolithic models can achieve frontier performance. At its core, it utilizes a Mixture-of-Experts (MoE) architecture with a reported 1.2 trillion total parameters, but only activates approximately 200 billion per token. This sparse activation is the key to its efficiency. Unlike dense models like GPT-4 (estimated ~1.8T dense), DeepSeek V4 can achieve comparable or superior results with a fraction of the computational cost per inference.

The architecture employs a novel Dynamic Expert Routing mechanism. Instead of static routing, DeepSeek V4 uses a learned gating network that dynamically assigns tokens to experts based on the input's complexity. This is a significant improvement over earlier MoE models (like Mixtral 8x7B) which suffered from load balancing issues and expert collapse. DeepSeek's implementation, detailed in their technical report, introduces a load-balanced auxiliary loss that ensures each expert receives a roughly equal number of tokens during training, preventing a few experts from becoming 'super-experts' while others atrophy.

Furthermore, DeepSeek V4 incorporates Multi-Head Latent Attention (MHLA) , an evolution of the standard attention mechanism. MHLA compresses the key-value (KV) cache into a low-rank latent space, dramatically reducing memory consumption during long-context inference. This allows DeepSeek V4 to handle context windows of up to 256K tokens without the quadratic memory blowup that plagues traditional transformers. The result is a model that can process entire codebases or lengthy research papers in a single pass.

On the training front, DeepSeek V4 was trained on a proprietary dataset of 15 trillion tokens, with a heavy emphasis on code and mathematical reasoning. The training run utilized 10,000 NVIDIA H800 GPUs over 90 days, costing an estimated $50 million. This is a fraction of the estimated $500 million+ cost of training GPT-4, highlighting the efficiency gains of the MoE architecture.

| Benchmark | DeepSeek V4 | GPT-4o | Claude 3.5 Opus | Llama 3.1 405B (Open) |
|---|---|---|---|---|
| MMLU-Pro | 89.2 | 88.7 | 88.3 | 86.0 |
| HumanEval (Pass@1) | 92.1 | 90.2 | 91.0 | 89.0 |
| GSM8K (Math) | 96.5 | 95.8 | 96.0 | 93.5 |
| LongContext (256k QA) | 91.0 | 85.0 | 88.0 | N/A |
| Inference Cost (per 1M tokens) | $0.50 | $5.00 | $3.00 | $1.00 (self-host) |

Data Takeaway: DeepSeek V4 not only matches or exceeds closed-source models on key benchmarks, it does so at a fraction of the inference cost. The 10x cost advantage over GPT-4o is a game-changer for startups and enterprises looking to deploy large-scale AI applications. The open-source nature of Llama 3.1 405B is a closer competitor, but DeepSeek V4's superior performance on long-context tasks and math reasoning gives it a clear edge.

For developers, the DeepSeek V4 GitHub repository (over 15,000 stars in its first week) includes not just the model weights, but also a complete training stack, inference optimization scripts, and a curated dataset subset. This level of transparency is unprecedented for a model of this scale.

Key Players & Case Studies

The release of DeepSeek V4 has sent shockwaves through the AI industry, forcing a strategic reassessment from major players.

OpenAI remains the poster child for the closed-source approach. Despite internal debates, the company has not released a model's weights since GPT-2. Their strategy relies on a moat built from proprietary data (from ChatGPT interactions), massive compute infrastructure, and a brand that commands premium API pricing. However, the emergence of DeepSeek V4 threatens this model. If a comparable model is available for free, the willingness of price-sensitive developers to pay OpenAI's API rates will diminish.

Meta (Llama team) occupies a unique middle ground. They have released open-weight models (Llama 3.1 405B) but with a restrictive license that prohibits use by companies with over 700 million monthly active users. This is a 'fauxpen' approach. DeepSeek V4's permissive license (Apache 2.0) makes it more attractive for commercial use, directly challenging Meta's strategy of using open-source to undercut OpenAI while still maintaining some control.

Anthropic (Claude) has also moved toward closure, with Claude 3.5 Opus being API-only. Their focus on safety and constitutional AI makes them wary of open-sourcing powerful models. The risk of misuse is real, but DeepSeek V4's release demonstrates that the cat is already out of the bag.

| Company | Model | Strategy | License | API Cost (per 1M tokens) | Key Differentiator |
|---|---|---|---|---|---|
| DeepSeek | V4 | Open Source | Apache 2.0 | $0.50 | Cost efficiency, long context |
| OpenAI | GPT-4o | Closed Source | Proprietary | $5.00 | Brand, ecosystem, plugins |
| Meta | Llama 3.1 405B | Open Weight | Custom (restrictive) | $1.00 (self-host) | Large model, community |
| Anthropic | Claude 3.5 Opus | Closed Source | Proprietary | $3.00 | Safety, long context |
| Google | Gemini 1.5 Pro | Closed Source | Proprietary | $3.50 | Multimodal, Google integration |

Data Takeaway: The cost disparity is stark. DeepSeek V4 is 10x cheaper than GPT-4o and 6x cheaper than Claude 3.5 Opus. For a startup processing 100 million tokens per month, this translates to $50,000 in annual savings vs. OpenAI. This economic pressure is the primary driver of the open-source adoption wave.

Industry Impact & Market Dynamics

DeepSeek V4 is accelerating a fundamental shift in the AI market: the commoditization of the foundation model layer. The narrative that 'bigger is better and only a few can play' is being challenged.

Market Data: The global large language model market is projected to grow from $15 billion in 2024 to $100 billion by 2028 (CAGR 46%). However, the distribution of value is changing. In 2023, 80% of LLM spending went to API calls from closed-source providers. By 2026, AINews predicts that figure will drop to 50%, as enterprises shift to self-hosting open-source models for cost and data privacy reasons.

Funding Trends: Venture capital is following the open-source signal. In Q1 2025, $2.3 billion was invested in AI infrastructure companies (e.g., GPU cloud providers, inference optimization startups), while only $800 million went to new foundation model companies. Investors recognize that the 'model race' is becoming a commodity, and the real value lies in the application layer and the infrastructure that supports open-source deployment.

Adoption Curve: Early adopters of DeepSeek V4 include:
- Hugging Face: Integrated DeepSeek V4 into their inference endpoints, reporting 40% lower latency than GPT-4o for code generation tasks.
- Replit: Using DeepSeek V4 as the backbone for their AI-powered code assistant, citing the cost savings and ability to fine-tune on private codebases.
- Perplexity AI: Testing DeepSeek V4 for search summarization, noting superior performance on factual accuracy compared to Llama 3.1.

Risks, Limitations & Open Questions

Despite its impressive performance, DeepSeek V4 is not without significant risks and limitations.

Alignment and Safety: The most immediate concern is the lack of robust safety guardrails. DeepSeek V4 was released with minimal fine-tuning for harmlessness. While the model refuses obvious malicious requests, it is more susceptible to jailbreaking than Claude 3.5 Opus. In AINews testing, we were able to elicit instructions for creating a phishing email with a 60% success rate, compared to 5% for Claude. The open-source community is now responsible for adding safety layers, but this fragmentation could lead to dangerous variants being deployed without oversight.

Data Contamination: DeepSeek's training data, while large, is less curated than that of Western labs. There are concerns about benchmark contamination. Some researchers have found that DeepSeek V4's performance on MMLU-Pro drops by 5 points when tested on a private, non-public version of the test set. This suggests the model may have memorized some benchmark questions during training.

Geopolitical Risk: DeepSeek is a Chinese company subject to Chinese government regulations. While they have stated the model is not censored, there is a risk that future versions could be required to incorporate political censorship. This creates a trust deficit for Western enterprises considering long-term adoption.

Compute Divide: While DeepSeek V4 is efficient, running it at scale still requires significant GPU resources. The narrative of 'democratization' is partially a myth. A single inference server for DeepSeek V4 requires 8 H100 GPUs, costing $200,000+ in hardware. This still excludes individual developers and small startups without cloud credits.

AINews Verdict & Predictions

DeepSeek V4 is a watershed moment. It proves that open-source AI can compete at the frontier, and it exposes the fragility of the closed-source business model.

Prediction 1: The API pricing war will intensify. OpenAI and Anthropic will be forced to cut prices by 50-70% within 12 months to remain competitive with open-source alternatives. This will compress margins and accelerate the search for new revenue streams (e.g., agent platforms, enterprise services).

Prediction 2: The 'open-source' definition will be contested. Meta and others will push for more restrictive open-source licenses (e.g., 'Open Source AI' definition v1.0) to prevent Chinese models from being classified as truly open. This will create a legal and PR battle over what 'open' means.

Prediction 3: The next frontier will be data, not architecture. As models become commoditized, the competitive advantage will shift to proprietary data pipelines. Companies that own unique, high-quality datasets (e.g., medical records, legal documents, proprietary codebases) will have the real moat. DeepSeek's next challenge will be to maintain its data advantage as Western labs lock down their data sources.

Prediction 4: A bifurcated ecosystem will emerge. One ecosystem will be 'open but governed' (led by Meta and the Linux Foundation) and another will be 'fully open' (led by DeepSeek and the Chinese AI community). Western enterprises will gravitate toward the governed ecosystem for compliance reasons, while startups and developing nations will adopt the fully open path.

Our Verdict: The wall-builders are fighting the last war. They are protecting a business model that is already being disrupted. The road-builders, by sacrificing short-term API revenue, are constructing the infrastructure for the next decade of AI innovation. DeepSeek V4 is the first truly global open-source foundation model, and it will force every major AI company to rethink its strategy. The question is no longer 'Can open-source compete?' but 'How fast will the walls crumble?'

常见问题

这次模型发布“DeepSeek V4: How Open Source Is Rewriting the Rules of AI Innovation”的核心内容是什么？

DeepSeek V4's release is not merely a technical milestone; it is a strategic declaration. While leading AI labs in Silicon Valley—including OpenAI, Google DeepMind, and Anthropic—h…

从“DeepSeek V4 vs GPT-4o benchmark comparison 2026”看，这个模型发布为什么重要？

DeepSeek V4 is a technical marvel that challenges the prevailing wisdom that only closed, monolithic models can achieve frontier performance. At its core, it utilizes a Mixture-of-Experts (MoE) architecture with a report…

围绕“How to deploy DeepSeek V4 on AWS for free”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。