Breaking The Networking Wall In AI Infrastructure
Microsoft, Tuesday, September 9th, 2025
Memory and network bottlenecks are increasingly limiting AI system performance by reducing GPU utilization and overall efficiency, ultimately preventing infrastructure from reaching its full potential despite enormous investments. At the core of this challenge is a fundamental trade-off in the communication technologies used for memory and network interconnects.
Datacenters typically deploy two types of physical cables for communication between GPUs. Traditional copper links are power-efficient and reliable, but limited to very short distances (< 2 meters) that restrict their use to within a single GPU rack. Optical fiber links can reach tens of meters, but they consume far more power and fail up to 100 times as often as copper. A team working across Microsoft aims to resolve this trade-off by developing MOSAIC, a novel optical link technology that can provide low power and cost, high reliability, and long reach (up to 50 meters) simultaneously. This approach leverages a hardware-system co-design and adopts a wide-and-slow design with hundreds of parallel low-speed channels using microLEDs.