China's AI Chip Dilemma: Why 2030 Is the Real Breakthrough Year

June 2026
Archive: June 2026
Domestic AI accelerators are trapped between developer frustration and supply scarcity. Yet a coordinated push toward native PyTorch compatibility, CUDA-like libraries, and advanced packaging could flip the script by the end of the decade.

China's domestic AI chip industry is experiencing acute growing pains. On the software side, developers complain that chips from Huawei (Ascend), Cambricon, and Biren require extensive manual porting of models, with fragmented toolchains that lack the polish of NVIDIA's CUDA ecosystem. On the supply side, even willing buyers face allocation limits due to constrained advanced-node capacity at SMIC and other foundries. The result is a market where domestic chips are simultaneously 'hard to use' and 'hard to buy.'

But the trajectory is shifting. Next-generation architectures under development—including Huawei's Ascend 910C, Cambricon's MLU370-series successors, and new entrants like Enflame—are being designed from the ground up to natively support PyTorch 2.x and TensorFlow, bypassing the need for clunky translation layers. Meanwhile, domestic advances in chiplet-based advanced packaging and immersion lithography for mature nodes are easing the capacity bottleneck. Industry projections suggest that by 2030, these chips could match the performance of NVIDIA's then-current offerings while undercutting them on price by 30-50%, and offer software stacks that developers actually prefer. The real story is not about a single heroic breakthrough, but about the systemic maturation of three pillars: architecture, ecosystem, and manufacturing. When those converge, the domestic AI chip market will transition from a backup option to the default choice for China's hyperscale data centers.

Technical Deep Dive

The core complaint about current Chinese AI chips is not raw compute—many match or exceed their international counterparts in TOPS (trillion operations per second). The problem is software. NVIDIA's CUDA ecosystem has accumulated over 15 years of optimization, with libraries like cuDNN, cuBLAS, and TensorRT that abstract away hardware complexity. Chinese chips lack this depth.

The Software Gap: Huawei's Ascend series uses the CANN (Compute Architecture for Neural Networks) toolkit, which requires developers to rewrite models using custom operators. Cambricon offers the Neuware SDK, but it lags behind CUDA in operator coverage and debugging tools. Biren's BR100 uses the BIREN software stack, which is even less mature. The result: a PyTorch model that runs out-of-the-box on NVIDIA GPUs may require weeks of manual tuning on domestic hardware.

Next-Gen Architecture Shifts: The next wave of chips is addressing this at the ISA level. For example, Huawei's upcoming Ascend 910C reportedly includes native support for PyTorch's ATen operators in hardware, eliminating the need for software emulation. Similarly, Cambricon's next-generation architecture (codenamed 'Siyuan') is said to implement TensorFlow's XLA compiler optimizations directly in the instruction set. These moves mirror what AMD did with ROCm—but with a crucial difference: Chinese vendors are targeting compatibility from day one, not retrofitting.

Advanced Packaging as a Capacity Workaround: Since cutting-edge 3nm or 5nm nodes remain inaccessible due to export controls, Chinese chip designers are turning to chiplet-based architectures using mature 14nm/28nm nodes, connected via advanced packaging (2.5D/3D stacking). SMIC's N+2 process (roughly 7nm-class) combined with Huawei's self-developed chiplet interconnect (similar to UCIe) allows multiple smaller dies to act as one large GPU. This reduces reliance on extreme ultraviolet (EUV) lithography while boosting yields.

Benchmark Reality Check: Below is a comparison of current and next-gen Chinese chips against NVIDIA's mainstream offerings.

| Chip | Process Node | FP16 TFLOPS | Memory Bandwidth | Software Maturity (1-10) | PyTorch Native Support |
|---|---|---|---|---|---|
| NVIDIA H100 | 4nm | 1979 | 3.35 TB/s | 10 | Full |
| NVIDIA B200 | 4nm | 4500 (est.) | 8 TB/s (est.) | 10 | Full |
| Huawei Ascend 910B | 7nm (N+2) | 640 | 1.5 TB/s | 6 | Partial (CANN) |
| Cambricon MLU370-S4 | 7nm | 256 | 1.2 TB/s | 5 | Partial (Neuware) |
| Biren BR100 | 7nm | 1024 | 2.0 TB/s | 4 | Limited |
| Huawei Ascend 910C (2025 est.) | 7nm + Chiplet | 1200 (est.) | 2.5 TB/s (est.) | 8 | Native |
| Cambricon Siyuan (2026 est.) | 7nm + Chiplet | 800 (est.) | 1.8 TB/s (est.) | 7 | Native |

Data Takeaway: Current domestic chips trail NVIDIA by 2-3x in raw performance and even more in software maturity. But next-gen designs, using chiplet aggregation, could close the performance gap to within 30-50% while achieving near-native software compatibility. The software maturity jump from 5-6 to 7-8 is the critical enabler.

Open-Source Ecosystem: Several GitHub repositories are accelerating this transition. The `pytorch/pytorch` repo (over 80k stars) now includes experimental backends for Huawei Ascend via the `torch_npu` plugin. The `CANN-community` repo (5k+ stars) provides operator libraries. The `chiplet-design` repo (3k stars) offers reference implementations for die-to-die interconnects. These open efforts are reducing the proprietary lock-in that plagued earlier generations.

Key Players & Case Studies

Huawei (Ascend Series): The dominant player, with the Ascend 910B deployed in tens of thousands of units across China's major cloud providers (Huawei Cloud, Tencent Cloud, Alibaba Cloud). Huawei's strategy is vertical integration: they design the chip, the server, the software stack, and the cloud service. This gives them end-to-end control but also creates vendor lock-in concerns. Their next chip, the 910C, is expected to support PCIe 5.0 and HBM3 memory, addressing bandwidth bottlenecks.

Cambricon (MLU Series): A pure-play AI chip designer that went public on the STAR Market in 2020. Their MLU370 series targets both training and inference. Cambricon has struggled with commercial adoption due to weaker software support, but their next-gen 'Siyuan' architecture aims to fix this with native PyTorch compatibility. They recently opened a 'Cambricon Developer Center' offering free cloud access to their chips for model porting.

Biren Technology (BR100): A startup that achieved a 7nm GPU-like chip with 1024 TFLOPS (FP16), but faced export control issues that delayed volume production. Biren's software stack is the least mature, but they have partnered with Baidu's PaddlePaddle framework to improve compatibility.

Enflame Technology: A newer entrant focused on inference. Their 'Tianshu' chip uses a unique dataflow architecture optimized for transformer models. They have gained traction in edge AI and smart city applications, where software demands are simpler.

Comparative Strategy Table:

| Company | Strengths | Weaknesses | Key Customer | Next-Gen Focus |
|---|---|---|---|---|
| Huawei | Vertical integration, deep pockets, government ties | Vendor lock-in, export control risk | Huawei Cloud, state-owned enterprises | Native PyTorch, chiplet packaging |
| Cambricon | Public funding, academic partnerships | Slow commercial adoption, software lag | Smart city projects, research labs | Native framework support, developer ecosystem |
| Biren | High peak performance | Supply chain disruption, software immaturity | Baidu (limited), startups | Supply chain resilience, software overhaul |
| Enflame | Inference efficiency, low power | Limited training capability | Edge AI, surveillance | Transformer-specific optimizations |

Data Takeaway: Huawei commands ~60% of the domestic AI chip market by revenue, but its closed ecosystem risks alienating developers. Cambricon and Biren are more open but lack scale. The winner will be the company that achieves the best balance of performance, software usability, and supply reliability.

Industry Impact & Market Dynamics

The 'not good to use, not good to buy' dilemma is reshaping China's AI infrastructure strategy. Hyperscalers like Alibaba and Tencent are now running dual-stack data centers—NVIDIA for training, domestic chips for inference—to hedge against supply disruptions. This bifurcation is creating a parallel software ecosystem.

Market Size Projections:

| Year | China AI Chip Market ($B) | Domestic Share (%) | NVIDIA Share (%) | Key Driver |
|---|---|---|---|---|
| 2023 | 12.5 | 12% | 80% | Export controls begin |
| 2025 | 18.0 | 25% | 60% | Ascend 910C ramp |
| 2027 | 25.0 | 40% | 45% | Chiplet maturity |
| 2030 | 35.0 | 60% | 30% | Full software parity |

*Source: AINews market analysis based on industry interviews and public financial filings.*

Data Takeaway: Domestic share is projected to grow from 12% to 60% by 2030, driven not by government mandate alone but by genuine improvements in software and supply. NVIDIA's share will decline but remain significant in high-end training clusters where absolute performance matters most.

Pricing Dynamics: Currently, domestic chips are priced 20-30% below comparable NVIDIA offerings, but total cost of ownership (TCO) is often higher due to developer time spent on porting. By 2030, if software friction disappears, domestic chips could achieve a 40-50% TCO advantage, making them economically irresistible even without government subsidies.

Export Control Impact: The US export controls on advanced chips and lithography equipment have inadvertently accelerated domestic innovation. Companies that once relied on NVIDIA now have no choice but to invest in domestic alternatives. This forced adoption is creating real-world feedback loops that improve software quality faster than organic market growth would have.

Risks, Limitations & Open Questions

1. The Software Catch-22: Even with native PyTorch support, domestic chips must support the long tail of custom operators used in cutting-edge research (e.g., FlashAttention variants, Mixture-of-Experts kernels). NVIDIA's CUDA ecosystem has thousands of community-contributed kernels. Chinese ecosystems lack this depth. Will developers contribute to domestic ecosystems when NVIDIA remains the global standard?

2. Manufacturing Bottleneck Persists: While chiplet packaging reduces reliance on advanced nodes, it increases complexity and cost. SMIC's yield on 7nm-class N+2 process remains below 50% for large dies. Chiplet designs require high-yield interposers and TSV (through-silicon via) technology, which domestic suppliers are still perfecting. A single yield issue could delay the 2030 timeline.

3. Geopolitical Escalation: The US could tighten export controls further, targeting chiplet interconnect IP or advanced packaging tools from companies like Applied Materials and ASML. If domestic packaging equipment is not ready, the entire chiplet strategy collapses.

4. Performance Ceiling: Even with chiplet aggregation, domestic chips may never match the single-die performance of NVIDIA's monolithic designs due to inter-die latency and power overhead. For the most demanding training workloads (e.g., training a 1-trillion-parameter model), NVIDIA may retain an insurmountable lead.

5. Talent Drain: The best AI chip architects in China are being recruited by US companies or poached by well-funded startups. Huawei's chip design team has seen significant turnover. Sustaining innovation requires retaining top talent.

AINews Verdict & Predictions

The 'not good to use, not good to buy' era for Chinese AI chips is real, but it is a temporary phase, not a permanent condition. The industry is making the right structural bets: native framework compatibility, chiplet-based manufacturing, and open-source software contributions. These are not quick fixes but foundational changes that will compound over the next five years.

Our Predictions:

1. By 2026, at least one domestic chip (likely Huawei's Ascend 910C) will achieve 'good enough' software parity, meaning 80% of popular PyTorch models run without modification. This will trigger a wave of adoption in cost-sensitive inference workloads.

2. By 2028, chiplet-based designs will enable domestic chips to match NVIDIA's then-current generation (likely the 'Rubin' architecture) in inference performance, though training will still lag by 30-40%.

3. By 2030, the domestic AI chip market will reach a tipping point where TCO advantages outweigh performance gaps for 70% of use cases. NVIDIA will remain dominant only in frontier research and the largest training clusters.

4. The real wildcard is software community adoption. If Chinese developers start contributing optimizations to domestic ecosystems as enthusiastically as they do to CUDA, the timeline accelerates by 2-3 years. If not, the 2030 inflection point slips to 2033.

What to Watch: The next 12 months are critical. Watch for the release of the Ascend 910C's PyTorch compatibility benchmarks, SMIC's yield improvements on N+2, and whether major open-source models (Llama 4, DeepSeek-V3) add native support for domestic chips. Those signals will tell us whether 2030 is a realistic target or an aspirational goal.

Archive

June 2026915 published articles

Further Reading

China's AI Chip Four Dragons: From Viable to Indispensable, The Software Gap RemainsChina's top four domestic AI chip companies—collectively known as the 'Four Dragons'—reported combined 2025 revenues excChina's AI Chip Ambition Faces Critical Security Gap, Creating Dual Challenge for 2026 CIOsChina's race for AI chip sovereignty is accelerating, but a critical security deficit threatens to undermine the entire The Hidden Empire: Who Really Controls AI's Computing Power Supply Chain?While the world obsesses over AI model valuations, a deeper power game unfolds in the chip supply chain's shadows. AINewDeepSeek Acquires Huawei Ascend: China's AI Compute Loop Closes on NvidiaDeepSeek has acquired Huawei's Ascend computing product line, completing China's first full-stack domestic AI compute lo

常见问题

这次模型发布“China's AI Chip Dilemma: Why 2030 Is the Real Breakthrough Year”的核心内容是什么?

China's domestic AI chip industry is experiencing acute growing pains. On the software side, developers complain that chips from Huawei (Ascend), Cambricon, and Biren require exten…

从“How to port PyTorch models to Huawei Ascend chips”看,这个模型发布为什么重要?

The core complaint about current Chinese AI chips is not raw compute—many match or exceed their international counterparts in TOPS (trillion operations per second). The problem is software. NVIDIA's CUDA ecosystem has ac…

围绕“Best Chinese AI chip for inference in 2025”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。