Light Over Copper: Why Optical Interconnects Are the Only Future for AI Supercomputing

The era of copper-dominated data center interconnects is ending. When single-lane signaling rates exceed 112 Gbps, copper traces suffer irreversible signal degradation within just two meters, forcing data center architects into painful trade-offs between thermal density and cabling complexity. Meanwhile, AI training clusters have breached the hundred-thousand-GPU scale, where every microsecond of latency compounds into days of extra training time for trillion-parameter models. Optical interconnects, long relegated to long-haul and campus networks, are now being pulled into the rack and even onto the package itself. Co-packaged optics (CPO) and silicon photonics (SiPh) are the leading technologies, promising bandwidth densities 10x higher than copper while reducing per-bit energy by over 60%. This shift is not incremental; it is a fundamental re-architecture of how data flows inside AI machines. The business implications are equally profound: optical modules, previously commoditized and margin-thin, are becoming the most strategically valuable component in the AI supply chain, commanding long-term design wins and premium pricing. With the rise of video generation and world models demanding near-real-time tensor parallelism across thousands of nodes, optical interconnects have transitioned from a nice-to-have to a non-negotiable requirement. The physics of computation and the economics of scale have converged on a single answer: light must replace copper.

Technical Deep Dive

The physics of copper is the root cause of the bottleneck. At 112 Gbps PAM4 signaling—the current standard for 800G modules—the skin effect and dielectric losses cause the signal amplitude to decay by over 20 dB per meter in typical twinax copper cables. This means that beyond 2-3 meters, the signal-to-noise ratio drops below the threshold for reliable detection, forcing the use of expensive retimers or active copper cables that add latency and power. For 224 Gbps SerDes (expected for 1.6T modules), the reach shrinks to under 1 meter.

Optical interconnects bypass these limitations entirely. Single-mode fiber can carry 400 Gbps per lane over kilometers with negligible loss. But the real revolution is in the packaging. Traditional pluggable optical modules (QSFP, OSFP) sit at the faceplate of the switch or GPU, requiring electrical traces from the ASIC to the module—traces that themselves become lossy at high speeds. Co-packaged optics (CPO) eliminates this by placing the optical engine directly on the same substrate as the switch ASIC or GPU, reducing the electrical trace length to millimeters. The result: lower power, higher bandwidth density, and better signal integrity.

Silicon photonics (SiPh) is the manufacturing enabler. By fabricating optical modulators, detectors, and waveguides using standard CMOS processes, SiPh allows dense integration of optical components at scale. Companies like Intel, Cisco, and Marvell have demonstrated SiPh transceivers with 8-16 channels per fiber, each running at 100-200 Gbps. The open-source community is also active: the OpenLight platform (GitHub repo: openlightplatform/photonics) provides PDK libraries for designing SiPh circuits, and the SiEPIC initiative (GitHub repo: lucas-santos/SiEPIC_EBeam_PDK) offers a free PDK for electron-beam lithography, enabling rapid prototyping of photonic integrated circuits. Both repos have seen growing stars (500+ and 200+ respectively) as the photonics design community expands.

Performance comparison: Copper vs. Optical at scale

| Metric | Copper (112G PAM4, 3m) | Optical (SiPh, 2km) | Improvement |
|---|---|---|---|
| Bandwidth density (Gbps/mm²) | 0.5 | 5.0 | 10x |
| Per-bit energy (pJ/bit) | 5-8 | 1.5-2.5 | 60-70% reduction |
| Reach (meters) | 2-3 | 2000+ | 1000x |
| Latency (ns per meter) | 5 | 5 | Equivalent |
| Thermal density (W/cm²) | 1.5 | 0.3 | 5x reduction |

Data Takeaway: Optical interconnects offer a 10x improvement in bandwidth density and a 60%+ reduction in power per bit, making them the only viable path for scaling beyond 100,000 GPU clusters. The latency parity is critical—optical does not add delay, it simply removes the distance constraint.

Key Players & Case Studies

NVIDIA is the most aggressive adopter. Its DGX SuperPOD architecture already uses optical transceivers for NVLink and InfiniBand interconnects at scale. For the GB200 NVL72 rack, NVIDIA moved to a fully optical backplane using co-packaged optics from Lumentum and Coherent Corp., achieving 1.8 TB/s per GPU pair. The company’s roadmap includes integrating SiPh engines directly onto the GPU substrate by 2027, a move that would eliminate the electrical bottleneck entirely.

Broadcom has taken a different approach, focusing on CPO for switch ASICs. Its Tomahawk 5 switch (51.2 Tbps) uses an optical engine co-packaged with the switch die, reducing faceplate power by 40% compared to pluggable optics. Broadcom’s Sian platform (GitHub: broadcom/sian-cpo) provides open-source firmware for controlling CPO modules, with over 300 stars and active contributions from hyperscalers.

Intel has been developing its Silicon Photonics 100G platform for years, now shipping 400G and 800G modules to cloud providers. Intel’s advantage is its integrated laser technology, which reduces the number of discrete components. However, Intel has struggled with yield and cost, keeping its market share below 10% in the pluggable module space.

Cisco and Marvell are competing in the 1.6T module race. Cisco’s Acacia division offers coherent pluggables for long-haul, while Marvell’s Nova platform targets intra-datacenter links. Both are betting on 224 Gbps per lane SiPh to reach 1.6T in a single OSFP module by late 2025.

Comparison of leading optical interconnect solutions

| Company | Technology | Bandwidth per module | Power efficiency (pJ/bit) | Key customer |
|---|---|---|---|---|
| NVIDIA (Lumentum/Coherent) | CPO + SiPh | 1.8 TB/s per GPU pair | 1.8 | Self (DGX) |
| Broadcom | CPO (Tomahawk 5) | 51.2 Tbps per switch | 2.0 | Hyperscalers |
| Intel | SiPh 100G | 800 Gbps | 2.5 | Cloud providers |
| Marvell | SiPh Nova | 1.6 Tbps | 2.2 | Meta, Microsoft |

Data Takeaway: NVIDIA and Broadcom lead in integration (CPO), while Intel and Marvell lead in discrete modules. The market is bifurcating: hyperscalers are pushing for CPO to reduce power, while traditional cloud providers still prefer pluggable modules for flexibility.

Industry Impact & Market Dynamics

The optical interconnect market is projected to grow from $15 billion in 2024 to $45 billion by 2028, according to industry estimates, driven entirely by AI cluster demand. This growth is not uniform—the fastest segment is intra-rack optical (within a single rack), which is expected to grow at 60% CAGR as CPO becomes standard.

The business model shift is dramatic. Historically, optical modules were commodity items with 10-15% margins, sold through distributors. Now, hyperscalers like Google, Microsoft, and Meta are signing multi-year, multi-billion-dollar design contracts with module makers, locking in supply and demanding custom specifications. This has given pricing power back to manufacturers: gross margins for 800G modules are now 25-30%, and for CPO solutions, they exceed 40%.

Market size and growth by segment (2024-2028)

| Segment | 2024 ($B) | 2028 ($B) | CAGR |
|---|---|---|---|
| Pluggable modules (400G/800G) | 10 | 18 | 16% |
| Co-packaged optics (CPO) | 2 | 15 | 65% |
| Silicon photonics (SiPh) | 3 | 12 | 41% |
| Total | 15 | 45 | 32% |

Data Takeaway: CPO is the fastest-growing segment, driven by hyperscaler demand for power efficiency. By 2028, CPO will surpass pluggable modules in revenue, signaling a permanent shift in architecture.

Risks, Limitations & Open Questions

Despite the promise, optical interconnects face three critical challenges:

1. Cost and yield: SiPh manufacturing is still maturing. Defect rates for integrated lasers and modulators are 2-3x higher than CMOS electronics, leading to lower yields and higher costs. A 1.6T CPO module currently costs $2,500-3,000, compared to $1,200 for a copper-based active cable of equivalent bandwidth. The cost crossover is not expected until 2027.

2. Thermal management: While optical interconnects reduce overall power, the laser sources themselves generate heat that must be managed. In CPO designs, the laser is co-located with the ASIC, increasing the thermal density of the package. Advanced liquid cooling solutions are required, adding complexity.

3. Standardization: The industry lacks a unified standard for CPO form factors. The COBO (Consortium for On-Board Optics) has proposed a standard, but NVIDIA, Broadcom, and Intel each have proprietary implementations. This fragmentation risks slowing adoption as customers fear vendor lock-in.

4. Reliability: Lasers have finite lifetimes (typically 50,000-100,000 hours), which is acceptable for most applications but problematic for AI clusters running 24/7 for years. Failure of a single laser in a CPO module could require replacing the entire switch or GPU board.

AINews Verdict & Predictions

Optical interconnects are not a future trend—they are the present necessity. The physics of copper at 112 Gbps and beyond is immutable; every major AI cluster built after 2025 will use optical links for all inter-node communication. The debate is not whether, but how quickly and at what cost.

Three predictions:

1. By 2027, 80% of new AI clusters will use CPO for intra-rack connectivity. The power savings alone (60% reduction) will justify the premium, especially as AI training costs become dominated by electricity.

2. Silicon photonics will become a standard CMOS process node. Just as RF CMOS enabled the smartphone revolution, SiPh will become a standard offering at TSMC and Intel Foundry by 2028, driving costs down and yields up.

3. The optical module supply chain will become as strategic as GPU supply. Expect hyperscalers to make direct investments in module fabs, similar to how they now invest in chip fabs. The first such deal—a $2 billion investment by a major cloud provider in a SiPh foundry—will be announced within 18 months.

The copper era is over. Light wins.

常见问题

这篇关于“Light Over Copper: Why Optical Interconnects Are the Only Future for AI Supercomputing”的文章讲了什么？

The era of copper-dominated data center interconnects is ending. When single-lane signaling rates exceed 112 Gbps, copper traces suffer irreversible signal degradation within just…

从“silicon photonics vs VCSEL for AI data centers”看，这件事为什么值得关注？

The physics of copper is the root cause of the bottleneck. At 112 Gbps PAM4 signaling—the current standard for 800G modules—the skin effect and dielectric losses cause the signal amplitude to decay by over 20 dB per mete…

如果想继续追踪“optical interconnect latency vs copper for GPU clusters”，应该重点看什么？

可以继续查看本文整理的原文链接、相关文章和 AI 分析部分，快速了解事件背景、影响与后续进展。