Milvus
Zilliz

Does Vera Rubin utilize specific GPU architectures?

NVIDIA’s Vera Rubin platform primarily utilizes the NVIDIA Rubin GPU architecture, which is a significant advancement designed for next-generation AI workloads. This new GPU microarchitecture, officially announced at GTC 2026, is a central component of the seven-chip platform aimed at scaling large AI factories and agentic AI. The Rubin GPUs are manufactured by TSMC using a 3 nm process and integrate HBM4 memory, providing enhanced bandwidth and capacity crucial for processing massive AI models. These GPUs are engineered to deliver substantial performance gains, with reported figures of 50 petaflops in FP4 for inference and 35 petaflops for training in NVFP4 format, representing a considerable improvement over the preceding Blackwell architecture.

The Vera Rubin platform is not merely a collection of GPUs but a holistic supercomputing system that tightly integrates various specialized chips. Alongside the Rubin GPUs, it features the NVIDIA Vera CPU, NVIDIA NVLink 6 Switch, NVIDIA ConnectX-9 SuperNIC, NVIDIA BlueField-4 DPU, NVIDIA Spectrum-6 Ethernet switch, and the newly integrated NVIDIA Groq 3 LPU. This integrated design ensures that different components are optimized for specific phases of AI workloads. For instance, the Rubin GPUs handle the compute-intensive prefill phase of inference, while the Groq 3 LPUs are purpose-built for the decode phase, generating output tokens with low latency, thereby maximizing efficiency and throughput. Such specialized hardware, including the high-speed interconnect provided by NVLink 6, is essential for orchestrating the complex data flows required by large-scale AI applications and vector database operations, enabling efficient retrieval and processing of high-dimensional vectors, similar to what is handled by a system like Milvus.

Further enhancements to the Rubin architecture are already planned, with the “Rubin Ultra” version expected in 2027. This iteration aims to double the performance of the Rubin GPU by effectively connecting two Rubin cores, addressing the ever-increasing demand for AI compute power. The design of Vera Rubin focuses on high-efficiency and cost-effectiveness for AI inference, claiming up to 10 times higher inference throughput per watt at one-tenth the cost per token compared to the Blackwell platform. This integrated and forward-looking approach to GPU architecture within the Vera Rubin platform underscores NVIDIA’s strategy to provide a comprehensive, full-stack solution for the development and deployment of agentic AI.

Like the article? Spread the word