La Fragmentation des Puces IA Chinoises : Comment les Protocoles d'Interconnexion Propriétaires Sapent l'Ambition Collective

The fundamental battleground in AI computing has decisively shifted from raw transistor performance to the efficiency of connecting thousands of processors into a coherent, unified system. In this arena, NVIDIA's dominance is built not just on GPU architecture but on a vertically integrated, closed ecosystem anchored by its NVLink and NVSwitch technologies. China's semiconductor industry, in its drive for technological sovereignty, has produced several competitive AI accelerators, including Huawei's Ascend series and other emerging designs. However, a critical strategic divergence has emerged: instead of coalescing around a common, open interconnect standard, major players are developing proprietary protocols like Huawei's Lingqu, creating isolated technology stacks. This fragmentation imposes severe costs on end-users—primarily hyperscalers and national computing centers—who face vendor lock-in, incompatible software ecosystems, and the inability to mix hardware from different domestic suppliers. The immediate consequence is reduced competition and innovation within China's own market; the long-term risk is that no single domestic player can assemble the scale of computing resources needed to train frontier AI models, leaving the industry collectively unable to challenge NVIDIA at the system level. While global initiatives like the UALink Consortium, backed by AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, and Microsoft, aim to create an open alternative to NVLink, China's internal divisions may prevent it from leveraging similar collective action, turning technological independence into technological isolation at the worst possible moment.

Technical Deep Dive

The core technical challenge in modern AI training is memory bandwidth and latency, not just FLOPs. Training a model like GPT-4 requires thousands of GPUs to act as a single, massive computer with a unified memory space. This is achieved through high-bandwidth, low-latency interconnects that link chips within a server (node-level) and servers across a data center (rack- and cluster-level).

NVIDIA's stack is hierarchical: NVLink (4th gen offers 900 GB/s bidirectional bandwidth between 8 GPUs) within a node, NVSwitch for scaling within a rack, and a proprietary implementation of InfiniBand (Quantum-2 platform with 400 Gb/s per port and in-network computing via SHARP) for cluster-scale communication. This creates a seamless, software-defined fabric where the system appears as one logical GPU to the programmer.

In contrast, the Chinese landscape is a patchwork. Huawei's Ascend processors use its proprietary HCCS (Huawei Collective Communication Service) and Lingqu interconnect within the Atlas 900 cluster. While technically capable (Huawei claims its AI fabric offers 1.6Tbps per port), it is a closed ecosystem. Other domestic players, such as those developing GPGPU-like accelerators (e.g., from companies like Biren Technology, Iluvatar CoreX, or MetaX), are believed to be pursuing their own interconnect strategies, often adapting or extending open standards like PCIe/CXL or Ethernet in non-standard ways.

The fundamental engineering trade-off is between optimization and interoperability. A proprietary protocol like Lingqu can be tightly co-designed with Huawei's Da Vinci architecture and MindSpore framework, potentially offering superior performance for that specific stack. However, it creates an "island of efficiency" that cannot communicate with other islands. Open standards like UALink or enhanced Ethernet (with RoCE and in-network compute extensions) prioritize interoperability, accepting some performance overhead for the flexibility to build heterogeneous systems.

| Interconnect Type | Example Protocols | Typical Bandwidth | Primary Use Case | Key Limitation |
|---|---|---|---|---|
| Proprietary On-Package | NVLink (NVIDIA), Xe Link (Intel), ? (Huawei) | 600-900 GB/s | Intra-node, GPU-to-GPU | Requires identical chips, closed ecosystem |
| Proprietary Rack-Scale | NVSwitch, Huawei's AI Fabric | 1.6-3.6 Tbps switch capacity | Scaling within a rack/box | Vendor lock-in at the system level |
| Open Standard Cluster | InfiniBand (NVIDIA Mellanox), Enhanced Ethernet (RoCEv2) | 400-800 Gb/s | Cross-rack, data center scale | Higher latency, less tight software integration |
| Emerging Open Standard | UALink (Consortium), CXL over Ethernet | Target: ~1.6 Tbps | Unified intra- & inter-rack | Still in specification/early deployment phase |

Data Takeaway: The table reveals a clear stratification. Proprietary links dominate the highest-performance, lowest-latency tiers (on-package, within-rack), while open standards manage the larger-scale cluster networking. China's dilemma is that its multiple proprietary solutions compete at the *same* tier, creating parallel, incompatible "high-performance islands" rather than a cohesive hierarchy.

Relevant open-source efforts that highlight the global move towards standardization include the UCX (Unified Communication X) framework, a collection of APIs for high-performance networking. While not an interconnect hardware standard, UCX is critical for abstracting the underlying hardware, allowing applications to run across different transports. Its adoption and optimization by the global HPC and AI community underscores the software necessity that accompanies hardware interoperability.

Key Players & Case Studies

The strategic postures of China's leading AI chip developers reveal a pattern of deep vertical integration, with interconnect as a key moat.

Huawei: The most advanced and integrated player. Its Ascend 910B processor and Atlas 900 cluster represent the closest domestic alternative to NVIDIA's DGX/HGX systems. Huawei's strategy is a full-stack replica of NVIDIA's: proprietary silicon (Ascend), interconnect (Lingqu/HCCS), system (Atlas servers), and software (MindSpore, CANN). This creates a powerful, performant, but completely closed ecosystem. For a large cloud provider or national lab, choosing Huawei means committing entirely to the Huawei stack. The company's strength in networking (through HiSilicon) gives it a unique advantage in fabric design, but it uses this to build walls, not bridges.

Biren Technology: Known for its BR100 series GPGPU, Biren has discussed its own high-speed interconnect technology. While details are scarce, its partnerships and positioning suggest it aims to be a merchant chip provider rather than a full-stack system vendor. This could, in theory, make Biren more amenable to an open standard, but without a dominant market position, it lacks the leverage to define one.

Iluvatar CoreX & MetaX: These companies focus on AI training and inference chips, respectively. Their public material emphasizes software compatibility (e.g., PyTorch/CUDA translation layers) but is silent on interconnect strategy for large-scale clustering. This silence is telling—it likely means they rely on standard Ethernet or are developing niche solutions, leaving them unable to lead a consolidation effort.

The Hyperscaler Customers (Alibaba Cloud, Tencent Cloud, Baidu Cloud): These are the primary consumers of AI chips and the entities that ultimately need to build 10,000-100,000 card clusters. Their ideal scenario is a competitive market of interchangeable accelerators. Currently, they are forced into difficult choices: adopt Huawei's powerful but monopolistic stack, work with multiple smaller vendors and manage extreme heterogeneity, or continue relying heavily on NVIDIA. Their collective buying power could, in theory, force standardization, but competitive tensions between the cloud providers themselves prevent unified action.

| Company | Chip Series | Interconnect Approach | Stack Openness | Scale Demonstrated | Strategic Posture |
|---|---|---|---|---|---|
| Huawei | Ascend 910B | Proprietary (Lingqu/HCCS) | Closed | 4,096+ cards (Atlas 900) | Full-stack vertical integration, ecosystem control |
| Biren Tech | BR100, BR104 | Likely Proprietary (BLink?) | Semi-open (Merchant chip) | Rack-scale (estimated) | GPU/IP provider, seeks partnerships |
| Iluvatar | CoreX T30 | Undisclosed / Standard Ethernet | Software-focused | Node-level emphasis | Compatibility via software translation |
| Global Reference: NVIDIA | H100, Blackwell | Proprietary (NVLink/NVSwitch) | Closed but de facto standard | 10,000+ GPU clusters | Dominant ecosystem, sets the benchmark |
| Global Reference: UALink Consortium | AMD MI300X, etc. | Open Standard (UALink) | Designed for openness | In development | Coalition to break NVIDIA lock-in |

Data Takeaway: Huawei is the only domestic player with a demonstrated, NVIDIA-like capability to scale at the cluster level, but it achieves this through the same closed model it seeks to overthrow. Other players lack either the scale ambition or the technical capability to define a viable alternative ecosystem, resulting in a market with one closed giant and several fragmented challengers.

Industry Impact & Market Dynamics

The fragmentation of interconnect protocols has profound second-order effects on China's entire AI industry.

1. Stifled Domestic Competition: A healthy market requires multiple suppliers of compatible components. When each chip requires its own unique network fabric, servers, and system software, the barrier to entry for a new chip designer becomes impossibly high. They must develop not just a chip, but an entire system architecture. This protects incumbents like Huawei but cripples the long-term innovative capacity of the broader industry.

2. Skyrocketing Customer Costs and Complexity: For an AI cloud provider, managing multiple incompatible clusters is an operational nightmare. It requires separate driver stacks, separate cluster scheduling systems, separate diagnostic tools, and separate optimization teams. This duplication of effort consumes resources that could be spent on application-level innovation. The Total Cost of Ownership (TCO) for a fragmented domestic AI infrastructure could easily exceed that of a unified NVIDIA-based infrastructure, despite lower upfront chip costs.

3. Hindered Scale-Up of AI Research: The training of frontier models is a race of scale. The lack of a standardized, composable fabric means no single entity in China can aggregate computing power as efficiently as NVIDIA's customers can. A research institute may have 2,000 Huawei cards and 1,000 Biren cards, but they cannot combine them into a single 3,000-card training run. This effectively caps the maximum model size that can be trained within any single vendor's ecosystem.

4. Missed Window for Global Influence: The period from 2020-2025 was a unique window where dissatisfaction with NVIDIA's lock-in was high, and no alternative standard had solidified. The formation of the UALink Consortium represents the global industry's attempt to seize this window. China's internal fragmentation has prevented it from either joining this consortium with a unified voice or proposing a compelling alternative standard of its own. The industry is now reacting to, rather than shaping, the global interconnect landscape.

| Impact Dimension | Short-Term Effect (1-2 years) | Long-Term Risk (5+ years) |
|---|---|---|
| Customer Choice | False choice between vendors; high switching costs | Market consolidation around 1-2 full-stack vendors, reduced competition |
| Innovation Pace | Rapid point solutions in isolated stacks | Slowed systemic innovation due to lack of composable components |
| Cluster Scale | Multiple medium-scale clusters (1k-4k cards) | Inability to build efficient 10k+ card clusters, falling behind in model scale |
| Global Relevance | Domestic solutions meet local needs | Chinese chips become irrelevant in global cloud and research infrastructure |
| Software Ecosystem | Multiple fragile translation layers to CUDA | Failure to birth a native, thriving alternative AI software stack (like ROCm) |

Data Takeaway: The impacts are systemic and self-reinforcing. Fragmentation begets higher costs, which reduces adoption, which starves vendors of revenue for R&D, which perpetuates their inability to build competitive full-stack solutions, locking in the fragmentation. It is a vicious cycle that benefits only the current dominant global player.

Risks, Limitations & Open Questions

1. The Sovereignty vs. Standardization Trap: The drive for technological sovereignty is legitimate, but it is being misinterpreted as requiring full-stack, from-the-ground-up reinvention. True sovereignty in computing could be achieved by mastering and contributing to open standards (like RISC-V, UALink, or CXL), not by creating weaker domestic copies of closed standards. The risk is that in fleeing from NVIDIA's walled garden, the industry is building several smaller, less capable walled gardens.

2. The "Good Enough" Fallacy: Proponents of proprietary solutions argue that for many national and enterprise AI tasks, a 4,096-card Huawei cluster is "good enough." This misses the point. The frontier of AI capability is defined by the largest models trained on the largest clusters. Ceding that frontier means forever following architectural trends set elsewhere, relying on open-source model releases, and lacking the capability to generate fundamental breakthroughs.

3. The Software Chasm is the Real Barrier: Even if a magical, unified interconnect standard emerged tomorrow, the deeper problem is the software stack. NVIDIA's CUDA is a 15-year head start. China's fragmented hardware landscape directly undermines the possibility of a unified software effort. No company will invest billions in developing a competitor to CUDA if it only works on their own chips. This software gap is the ultimate limiter, and hardware fragmentation guarantees it will never close.

4. Open Questions:
* Can a neutral industry body (e.g., under China's MIIT) successfully mandate a common interconnect standard? History suggests such mandates often fail against the commercial interests of powerful incumbents.
* Will the Chinese hyperscalers (Alibaba, Tencent, ByteDance) finally use their collective demand to force standardization? Their rivalry may be too deep-seated.
* Is there a technical path to a "bridge" or gateway that translates between different proprietary interconnects? The performance overhead would likely be prohibitive for training workloads.

AINews Verdict & Predictions

AINews Verdict: China's pursuit of proprietary AI interconnects is a critical strategic misstep that confuses vertical integration with technological leadership. In the quest for independence from NVIDIA, the industry is architecting its own dependence on a handful of domestic monopolies and, more damningly, its own inability to compete at the highest level of global AI infrastructure. The failure to prioritize an open, industry-wide interconnect standard will be remembered as the factor that limited China's AI ambitions to the regional level.

Predictions:

1. Prediction 1 (High Confidence): By 2026, the economic and operational pain of fragmentation will become unbearable for large customers. This will lead not to a unified open standard, but to a *de facto* consolidation around Huawei's stack for large-scale training, with other vendors relegated to inference and niche training markets. Huawei will become the "NVIDIA of China," replicating the very monopoly dynamic the industry sought to escape.

2. Prediction 2 (Medium Confidence): A government-led initiative will attempt to create a national interconnect standard around 2025-2026. However, it will arrive too late. It will either be a lowest-common-denominator specification that no leading vendor fully adopts, or it will be heavily based on Huawei's technology, formalizing its dominance and stifling the original goal of a competitive marketplace.

3. Prediction 3 (High Confidence): China's maximum scale for single-model training will lag the global frontier by a factor of 3-5x by 2027. While the U.S./global industry will demonstrate reliable 50,000-100,000 GPU clusters, the largest efficient cluster in China will be limited to the scale of a single vendor's ecosystem, likely in the 10,000-20,000 card range. This gap in *system scale* will be more decisive than any gap in single-chip performance.

4. Prediction 4 (Speculative): The most likely path to a unified fabric will come from an unexpected direction: optical interconnect. If a Chinese company (e.g., Hengtong, Innolight) makes a breakthrough in co-packaged optics or optical circuit switching that drastically reduces the cost and complexity of high-bandwidth networking, it could provide a physical-layer reset. This new layer could be standardized more easily, as it would be a disruptive technology for all incumbents. Watch for R&D announcements in this space.

What to Watch Next: Monitor the deployment patterns of China's national "computing power networks." If these mega-projects are built as heterogeneous clusters mixing hardware from multiple vendors, it signals a serious push for interoperability. If they are built as monolithic blocks from single vendors, it confirms the fragmentation is locked in. Secondly, watch for any Chinese company joining or aligning with the UALink Consortium; such a move would be the first, belated sign of strategic clarity.

常见问题

这次公司发布“China's AI Chip Fragmentation: How Proprietary Interconnect Protocols Undermine Collective Ambition”主要讲了什么？

The fundamental battleground in AI computing has decisively shifted from raw transistor performance to the efficiency of connecting thousands of processors into a coherent, unified…

从“Huawei Ascend Lingqu interconnect vs NVLink performance”看，这家公司的这次发布为什么值得关注？

The core technical challenge in modern AI training is memory bandwidth and latency, not just FLOPs. Training a model like GPT-4 requires thousands of GPUs to act as a single, massive computer with a unified memory space.…

围绕“Can Biren BR100 work with Huawei Atlas cluster”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。