Computex Nvidia has opened the NVLink interconnect tech used to stitch its rack-scale compute platforms together to the broader ecosystem with the introduction of NVLink Fusion at Computex this week.
If you’re not familiar, Nvidia’s NVLink is a high-speed interconnect which enables multiple GPUs in a system or rack to behave like a single accelerator with shared compute and memory resources.
In its current generation, Nvidia’s fifth-gen NVLink fabrics can support up to 1.8 TB/s of bandwidth (900 GB/s in each direction) per GPU for up to 72 GPUs per rack. Until now this interconnect fabric has been limited to Nvidia GPUs and CPUs.
NVLink Fusion means the GPU will allow semi-custom accelerator designs to take advantage of the high-speed interconnect – even for non-Nvidia-designed accelerators.
According to Dion Harris, senior director of HPC, Cloud, and AI at Nvidia, the technology will be offered in two configurations. The first will be for connecting custom CPUs to Nvidia GPUs.
As we mentioned earlier, the advantage of using NVLink for CPU-to-GPU communications is that it offers 14x higher bandwidth compared to PCIe 5.0 (128 GB/s).
The second, and perhaps more surprising, configuration involves using NVLink to connect its Grace and, in the future, Vera CPUs to non-Nvidia accelerators.
This can either be achieved by integrating the NVLink IP into your design or via an interconnect chiplet packaged alongside a supported XPU.
In theory, this should open the door to superchip-style compute assemblies that feature any combination of CPUs and GPUs from the likes of Nvidia, AMD, Intel, and others, but only so long as Nv is involved. You couldn’t, for example, connect an Intel CPU to an AMD GPU using NVLink Fusion. Nvidia isn’t opening the interconnect standard entirely, and if you want to use its interconnect with your ASIC, then you’ll be using its CPU, or vice versa.
Of course, all of this depends on chipmakers extending support for NVLink Fusion in the first place. From a design standpoint, MediaTek, Marvell, AIchip, Astera Labs, Synopsys and Cadence, have committed to supporting the interconnect. Fujitsu and Qualcomm, meanwhile, plan to build custom CPUs using the tech.
Neither Intel or AMD are on the list, just yet, and they may never be. Both companies have thrown their weight behind the Ultra Accelerator Link standard, an open alternative to NVLink for scale up networks.
The Ultra Accelerator Link Consortium published the first specification UALink 200G for the interconnect fabric last month, and currently caps out at 200 Gbps or about 50 GB/s of bidirectional bandwidth per link to up to 1,024 accelerators.
Nvidia stitches together GPU bit-barns with DGX Cloud Lepton
Also at this year’s Computex mega-event in Taipei, Nvidia also lifted the veil on its DGX Cloud Lepton offering.
In a nutshell, the platform is a marketplace for deploying workloads across any number of GPU bit-barns that have agreed to rent out their compute on Lepton.
Alexis Bjorlin, VP of DGX Cloud at Nvidia, likens Lepton to a ridesharing app, but rather than connecting riders to drivers, it connects developers to GPUs.
At launch — the platform is currenly in early access — Nvidia says CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, Nscale, SoftBank, and Yotta have agreed to make “tens of thousands of GPUs” available for customers to deploy their workloads on.
Naturally, this being “DGX Cloud,” the GPU giant is taking the opportunity to push its suite of Nvidia Inference Microservices (NIMs), blueprints, and cloud functions.
If any of this sounds familiar, Nvidia wouldn’t be the first to try something like this. For example, Akash Network launched its decentralized compute marketplace, back in 2020, and by 2023 90 percent of the company’s business was driven by GPUs rentals. ®