The NVIDIA Vera Rubin platform is a comprehensive, full-stack AI supercomputing platform designed for complex, multi-step agentic AI workflows. It integrates a suite of advanced hardware components, aiming to provide a unified system for large-scale AI workloads, from training to inference. This platform represents a shift towards tightly integrated, rack-scale infrastructure for hyperscale data centers and AI "factories".
The core hardware components of the Vera Rubin platform include seven key chips: the NVIDIA Vera CPU, the NVIDIA Rubin GPU, the NVIDIA NVLink 6 Switch, the NVIDIA ConnectX-9 SuperNIC, the NVIDIA BlueField-4 DPU, the NVIDIA Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU (Language Processing Unit). These components are not merely individual chips but are co-designed to function as a single, coherent supercomputer. For example, the Vera CPU, NVIDIA’s first purpose-built data center CPU, features 88 custom Olympus cores with Spatial Multithreading and LPDDR5X memory, optimized for orchestration, reinforcement learning, and managing agentic AI workflows. The Rubin GPU, built on TSMC’s 3nm process, is the workhorse for training and inference, delivering significant performance improvements over previous generations with 288 GB of HBM4 memory. The Groq 3 LPU is specifically designed for inference acceleration, particularly for low-latency and large-context demands of agentic systems, featuring 256 LPUs per rack with 128 GB of on-chip SRAM.
These seven chips are deployed across five interlocking rack-scale systems that form the Vera Rubin POD (Platform on Demand). These specialized racks include the NVL72 for core training and inference, which integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6; the Groq 3 LPX rack for decode acceleration; the Vera CPU rack for reinforcement learning and orchestration; the BlueField-4 STX rack for KV cache storage; and the Spectrum-6 SPX rack for Ethernet networking. A full Vera Rubin POD can span 40 racks, incorporating 1,152 Rubin GPUs, nearly 20,000 NVIDIA dies, and delivering 60 exaflops of compute power. The entire architecture leverages the third-generation NVIDIA MGX rack architecture, emphasizing extreme co-design across compute, networking, and storage components to deliver high throughput, low latency, and energy efficiency for agentic AI workloads. The platform is designed to eliminate bottlenecks in communication and memory movement, enabling faster and more cost-effective inference for large language models and other complex AI tasks.