DeepSeek V4 and Huawei Chips: China's Open-Source AI Breaks the Closed-Source Monopoly

DeepSeek V4 represents a watershed moment for Chinese AI. By combining a state-of-the-art open-source large language model with Huawei's domestic chip architecture, DeepSeek has achieved what many deemed impossible: surpassing leading closed-source models on core benchmarks while establishing a fully domestic hardware-software stack. The model's gains in agentic capabilities—the ability to autonomously plan, execute, and reason across multi-step tasks—and world knowledge mark a fundamental shift from conversational AI to actionable intelligence. This strategic alliance with Huawei's Ascend chip series not only mitigates external supply chain risks but also proves that the combination of domestic silicon and open-source models can compete head-to-head with the NVIDIA-plus-closed-source paradigm. The implications are profound: from industrial control to autonomous driving, the barrier to deploying advanced AI in China has been dramatically lowered. DeepSeek V4 is not just a technical release; it is a declaration that China's AI future will be built on its own terms.

Technical Deep Dive

DeepSeek V4's architecture builds on the Mixture-of-Experts (MoE) paradigm that made its predecessors famous, but with critical innovations. The model employs a dynamic routing mechanism that activates only the most relevant expert modules per token, reducing inference cost by an estimated 40% compared to dense models of equivalent capability. The total parameter count is estimated at 1.2 trillion, with approximately 120 billion activated per forward pass. This sparse activation is key to fitting on Huawei's Ascend 910B chips, which offer 256 TFLOPS (FP16) per card—competitive with NVIDIA's A100 but requiring significant software optimization.

A major technical leap is the integration of a 'world model' module. Unlike traditional LLMs that predict the next token based purely on statistical patterns, DeepSeek V4 incorporates a latent reasoning layer that models causal relationships. This is achieved through a novel 'causal attention mask' that forces the model to reason about cause and effect during training, using a dataset of 50 billion tokens of structured causal narratives. The result is a 15% improvement on the 'AgentBench' benchmark for multi-step planning tasks.

On the inference side, DeepSeek has open-sourced a custom inference engine, 'DeepSeek-Engine', optimized for Huawei's Da Vinci architecture. The engine uses operator fusion and memory pooling to achieve a throughput of 1,200 tokens per second on an 8-card Ascend 910B server, compared to 1,500 tokens per second on a comparable NVIDIA A100 setup. This is a remarkable achievement given the relative immaturity of the Huawei software stack.

| Benchmark | DeepSeek V4 (8x Ascend 910B) | GPT-4o (8x A100) | Claude 3.5 Sonnet | DeepSeek V3 (8x A100) |
|---|---|---|---|---|
| MMLU (5-shot) | 89.2 | 88.7 | 88.3 | 86.5 |
| AgentBench (multi-step) | 74.5 | 68.1 | 70.2 | 62.3 |
| HumanEval (pass@1) | 82.1 | 81.0 | 80.5 | 75.9 |
| Latency (first token, ms) | 210 | 180 | 195 | 240 |
| Cost per 1M tokens (inference) | $0.85 | $5.00 | $3.00 | $1.20 |

Data Takeaway: DeepSeek V4 not only matches but exceeds GPT-4o on MMLU and AgentBench, while costing 83% less per token. The latency gap is narrowing, and on Huawei hardware, the cost advantage is even more pronounced due to domestic pricing.

Key Players & Case Studies

The partnership between DeepSeek and Huawei is the centerpiece. DeepSeek, founded by Liang Wenfeng, has a track record of pushing open-source boundaries—its V3 model was the first to use MoE at scale. Huawei, through its Ascend Computing division, has been aggressively courting AI developers with its CANN (Compute Architecture for Neural Networks) toolkit and MindSpore framework. The collaboration involved six months of joint optimization, including custom kernel development for the Ascend 910B's tensor cores.

Other players are taking notice. ByteDance, which previously relied on a mix of NVIDIA and domestic chips, has announced it is testing DeepSeek V4 on Huawei hardware for its recommendation systems. Alibaba Cloud is exploring a similar integration for its Tongyi Qianwen model line. The open-source community has responded enthusiastically: the DeepSeek V4 GitHub repository has already amassed 15,000 stars in its first week, with developers reporting successful deployments on Ascend 910B clusters.

| Company/Product | Strategy | Chip Dependency | Open-Source Commitment |
|---|---|---|---|
| DeepSeek V4 + Huawei | Full domestic stack | Huawei Ascend 910B | Fully open-source (MIT license) |
| OpenAI GPT-4o | Closed-source, proprietary | NVIDIA H100/B200 | None |
| Anthropic Claude 3.5 | Closed-source, safety-first | NVIDIA H100 | None |
| Meta Llama 3 | Open-source, but NVIDIA-first | NVIDIA H100 | Open-source (custom license) |
| Baidu ERNIE 4 | Hybrid domestic/NVIDIA | Kunlun + NVIDIA | Partially open |

Data Takeaway: DeepSeek V4 is the only major model that combines full open-source licensing with a fully domestic chip stack. This dual advantage positions it uniquely for enterprises concerned about both cost and geopolitical risk.

Industry Impact & Market Dynamics

The release of DeepSeek V4 is reshaping the competitive landscape in two fundamental ways. First, it validates the open-source model as a viable competitor to closed-source giants. Historically, the argument for closed-source models was that only massive, well-funded labs could achieve frontier performance. DeepSeek V4 disproves this, showing that a focused team with innovative architecture can match or exceed the output of organizations spending billions.

Second, the Huawei partnership creates a new axis of competition. The global AI chip market, currently dominated by NVIDIA with an estimated 80% market share in data center AI accelerators, now faces a credible alternative. Huawei's Ascend series, while still behind in raw performance, offers a cost advantage of 30-40% when factoring in domestic supply chain efficiencies and government subsidies. For Chinese enterprises, the total cost of ownership for a 1,000-card cluster using Ascend 910B is approximately $4.5 million, compared to $7.2 million for an equivalent NVIDIA A100 cluster.

| Market Segment | Pre-DeepSeek V4 | Post-DeepSeek V4 | Projected Growth (2025-2026) |
|---|---|---|---|
| Domestic chip AI inference | 15% of China market | 35% of China market | 120% YoY |
| Open-source model adoption | 25% of enterprise AI | 45% of enterprise AI | 80% YoY |
| China AI software market | $12B | $18B (projected) | 50% YoY |

Data Takeaway: DeepSeek V4 is a catalyst for the domestic chip ecosystem. Within two years, we predict domestic chips will handle over a third of AI inference workloads in China, up from 15% today.

Risks, Limitations & Open Questions

Despite the impressive benchmarks, several risks remain. First, the Huawei chip supply chain is not immune to external pressure. While the Ascend 910B is produced on SMIC's N+2 process (equivalent to 7nm), yields are reportedly lower than TSMC's 5nm, potentially constraining supply. Second, the software ecosystem around Huawei's chips is still maturing. Developers report that the CANN toolkit, while improving, lacks the polish and community support of CUDA. Debugging distributed training on Ascend clusters remains a pain point.

Third, there are questions about the reproducibility of DeepSeek V4's results. The model's training data mixture is not fully disclosed, and some researchers have noted that the causal reasoning improvements may be partially due to data contamination from the AgentBench evaluation set. Independent verification is needed.

Finally, the geopolitical dimension is a double-edged sword. While the domestic stack reduces dependency on US technology, it also makes DeepSeek V4 a target for export controls and sanctions. If the US expands restrictions to cover any chip that can train frontier models, even domestic alternatives could be affected.

AINews Verdict & Predictions

DeepSeek V4 is not just a technical achievement; it is a strategic masterstroke. By aligning with Huawei, DeepSeek has created a self-reinforcing cycle: the model's success drives adoption of Huawei chips, which in turn funds further chip development, enabling even better models. This is the flywheel that China's AI industry has been missing.

Our predictions:
1. Within 12 months, at least three major Chinese cloud providers will offer DeepSeek V4 as a service on Huawei hardware, undercutting NVIDIA-based offerings by 50%.
2. The open-source community will fork DeepSeek V4 to create specialized versions for finance, healthcare, and manufacturing, accelerating vertical AI adoption.
3. NVIDIA will respond by lowering prices for its China-compliant H20 chips, but it will be too late—the domestic ecosystem has reached critical mass.
4. The next frontier will be agentic AI: DeepSeek V4's causal reasoning capabilities will enable autonomous systems for supply chain management and scientific research, areas where Chinese companies have strong incentives to innovate.

DeepSeek V4 proves that the future of AI is not monolithic. It is open, it is domestic, and it is here.

常见问题

这次模型发布“DeepSeek V4 and Huawei Chips: China's Open-Source AI Breaks the Closed-Source Monopoly”的核心内容是什么？

DeepSeek V4 represents a watershed moment for Chinese AI. By combining a state-of-the-art open-source large language model with Huawei's domestic chip architecture, DeepSeek has ac…

从“DeepSeek V4 vs GPT-4o benchmark comparison”看，这个模型发布为什么重要？

DeepSeek V4's architecture builds on the Mixture-of-Experts (MoE) paradigm that made its predecessors famous, but with critical innovations. The model employs a dynamic routing mechanism that activates only the most rele…

围绕“Huawei Ascend 910B vs NVIDIA A100 performance”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。