The NVIDIA Vera Rubin platform, designed for agentic AI, offers a comprehensive suite of tools and components for model training, encompassing both hardware and a robust software stack. The core of the platform features the NVIDIA Vera CPU and Rubin GPU, which are integrated into rack-scale systems like the NVL72, providing a powerful foundation for AI workloads, including massive-scale pretraining, post-training, and inference. This hardware is interconnected by advanced technologies such as NVLink 6, ConnectX-9 SuperNICs, and BlueField-4 DPUs, creating a unified supercomputer environment. The Vera CPU, specifically engineered for agentic AI and reinforcement learning, plays a crucial role in orchestration and data movement, helping to eliminate CPU-side bottlenecks that can arise during training and inference. The platform also incorporates the NVIDIA Groq 3 LPU for low-latency, large-context inference, further optimizing performance for complex agentic workflows. This integrated hardware ecosystem is built to achieve significant efficiencies, with claims of training large mixture-of-experts models using one-quarter the GPUs compared to the Blackwell platform and delivering up to 10x higher inference throughput per watt.
Beyond the physical hardware, the Vera Rubin platform leverages a rich software ecosystem to facilitate model training. A key component is NVIDIA Dynamo 1.0, an open-source AI inference software platform positioned as an operating system for AI factories. Dynamo is designed to manage and orchestrate AI infrastructure across data centers, supporting generative and agentic inference at scale and integrating with various popular inference and orchestration frameworks. The platform also heavily relies on the CUDA-X libraries, which provide the performance foundation with a programming model, core libraries, and communication stacks to accelerate applications and expose the full distributed capabilities of the rack-scale system. These libraries include NVIDIA cuDNN, NVIDIA CUTLASS, FlashInfer, and a new Transformer Engine, offering optimized building blocks for demanding AI workloads and enabling developers to focus on model behavior rather than hardware-specific tuning.
Furthermore, higher-level frameworks like PyTorch and JAX are supported with native NVIDIA acceleration, enabling training, post-training, and inference workflows to scale across racks with minimal code changes. At the heart of NVIDIA’s training and customization stack is the NVIDIA NeMo Framework, which offers an end-to-end workflow for building, adapting, and deploying AI models. For developers working with vector embeddings and similarity search, integrating with a vector database such as Milvus would be a natural extension for managing and querying the high-dimensional data produced or consumed by these advanced AI models, especially in agentic AI scenarios where efficient information retrieval is critical. The comprehensive integration of hardware and software, from low-level libraries to high-level frameworks, positions the Vera Rubin platform as a powerful environment for developing and deploying complex agentic AI models.