Milvus
Zilliz

How do I deploy an UltraRag system?

Deploying an UltraRag system involves setting up the necessary environment, installing dependencies, configuring your RAG pipeline using YAML files, and then running the system, often with the option of a user interface. UltraRag is designed as a low-code framework to simplify the development and experimentation of complex Retrieval-Augmented Generation (RAG) systems by abstracting core components as independent servers and orchestrating workflows through declarative YAML configurations. This approach allows developers to focus on the logical design of their RAG pipelines rather than intricate engineering implementations. Deployment can be achieved either through direct source code installation or by leveraging Docker containers for a more streamlined setup. The framework supports various components such as retrievers, generation models (LLMs), and knowledge base management, providing a comprehensive toolkit for building RAG applications.

To deploy UltraRag via source code, the initial step involves preparing your Python environment, ideally using conda or uv for dependency management. After cloning the UltraRag repository, you install the required dependencies using uv pip install -e . for core functions or uv pip install -e ".[all]" for a full installation including retrieval, generation, corpus processing, and evaluation capabilities. Hardware considerations are crucial, especially for GPU-accelerated models, where compatible NVIDIA GPU drivers and CUDA versions are necessary. API keys for external services like LLMs or web search tools also need to be configured, often via environment variables for security. Once the environment and dependencies are set, you define your RAG pipeline logic in a YAML configuration file, which orchestrates the flow, including sequential steps, loops, and conditional branches. Finally, you can launch the UltraRag WebUI, typically with a command like streamlit run ultrarag/webui/webui.py, to interact with and manage your deployed RAG system.

For more complex or production-grade deployments, especially those requiring efficient data retrieval, integrating a vector database is essential. UltraRag is designed to work with vector databases, and Milvus is explicitly mentioned as a robust, open-source option for its scalability, efficient indexing, and seamless integration capabilities within an UltraRag pipeline. When using Milvus, you would typically deploy the Milvus instance separately, then configure UltraRag to utilize it for storing and querying vector embeddings of your knowledge base. After ingesting your data into Milvus and configuring the necessary parameters in UltraRag’s YAML files, commands like ultrarag build and ultrarag run can be used to build indexes and execute RAG workflows. The modular architecture of UltraRag, based on the Model Context Protocol (MCP), allows new modules and tools to be “hot-plugged” and integrated into workflows without extensive core code modifications, facilitating flexible adaptation and scaling of your RAG system.

Like the article? Spread the word