Hands-on Tutorial: Build a RAG-Powered Document Assistant in 10 Minutes with Dify and Milvus
What if you could turn your entire documentation libraryâthousands of pages of technical specs, internal wikis, and code documentationâinto an intelligent AI assistant that instantly answers specific questions?
Even better, what if you could build it in less time than it takes to fix a merge conflict?
Thatâs the promise of Retrieval Augmented Generation (RAG) when implemented the right way.
While ChatGPT and other LLMs are impressive, they quickly hit their limits when asked about your companyâs specific documentation, codebase, or knowledge base. RAG bridges this gap by integrating your proprietary data into the conversation, providing you with AI capabilities that are directly relevant to your work.
The problem? Traditional RAG implementation looks like this:
Write custom embedding generation pipelines
Configure and deploy a vector database
Engineer complex prompt templates
Build retrieval logic and similarity thresholds
Create a usable interface
But what if you could skip straight to the results?
In this tutorial, weâll build a simple RAG application using two developer-focused tools:
Dify: An open-source platform that handles the RAG orchestration with minimal configuration
Milvus: A blazing-fast open-source vector database purpose-built for similarity search and AI searches
By the end of this 10-minute guide, youâll have a working AI assistant that can answer detailed questions about any document collection you throw at it - no machine learning degree required.
What Youâll Build
In just a few minutes of active work, youâll create:
A document processing pipeline that converts any PDF into queryable knowledge
A vector search system that finds exactly the right information
A chatbot interface that answers technical questions with pinpoint accuracy
A deployable solution you can integrate with your existing tools
The best part? Most of it is configured through a simple user interface (UI) instead of custom code.
What Youâll Need
Basic Docker knowledge (just
docker-compose up -d
level)An OpenAI API key
A PDF document to experiment with (weâll use a research paper)
Ready to build something actually useful in record time? Letâs get started!
Building Your RAG Application with Milvus and Dify
In this section, we will build a simple RAG app with Dify, where we can ask questions about the information contained in a research paper. For the research paper, you can use any paper you want; however, in this case, we will use the famous paper that introduced us to the Transformer architecture, "Attention is All You Need."
We will use Milvus as our vector storage, where we will store all the necessary contexts. For the embedding model and the LLM, weâll use models from OpenAI. Therefore, we need to set up an OpenAI API key first. You can learn more about setting it up here.
Step 1: Starting Dify and Milvus Containers
In this example, weâll self-host Dify with Docker Compose. Therefore, before we begin, ensure that Docker is installed on your local machine. If you havenât, install Docker by referring to its installation page.
Once we have Docker installed, we need to clone the Dify source code into our local machine with the following command:
git clone <<https://github.com/langgenius/dify.git>>
Next, go to the docker
directory inside of the source code that youâve just cloned. There, you need to copy the .env
file with the following command:
cd dify/docker
cp .env.example .env
In a nutshell, .env
file contains the configurations needed to set your Dify app up and running, such as the selection of vector databases, the credentials necessary to access your vector database, the address of your Dify app, etc.
Since weâre going to use Milvus as our vector database, then we need to change the value of VECTOR_STORE
variable inside .env
file to milvus
. Also, we need to change the MILVUS_URI
variable to http://host.docker.internal:19530
to ensure that thereâs no communication issue between Docker containers later on after deployment.
VECTOR_STORE=milvus
MILVUS_URI=http://host.docker.internal:19530
Now we are ready to start the Docker containers. To do so, all we need to do is to run the docker compose up -d
command. After it finishes, youâll see similar output in your terminal as below:
docker compose up -d
We can check the status of all containers and see if theyâre up and running healthily with docker compose ps
command. If theyâre all healthy, youâll see an output as below:
docker compose ps
And finally, if we head up to http://localhost/install, youâll see a Dify landing page where we can sign up and start building our RAG application in no time.
Once youâve signed up, then you can just log into Dify with your credentials.
Step 2: Setting Up OpenAI API Key
The first thing we need to do after signing up for Dify is to set up our API keys that weâll use to call the embedding model as well as the LLM. Since weâre going to use models from OpenAI, we need to insert our OpenAI API key into our profile. To do so, go to âSettingsâ by hovering your cursor over your profile on the top right of the UI, as you can see in the screenshot below:
Next, go to âModel Provider,â hover your cursor on OpenAI, and then click âSetup.â Youâll then see a pop-up screen where youâre prompted to enter your OpenAI API key. Once weâre done, weâre ready to use models from OpenAI as our embedding model and LLM.
Step 3: Inserting Documents into Knowledge Base
Now letâs store the knowledge base for our RAG app. The knowledge base consists of a collection of internal documents or texts that can be used as relevant contexts to help the LLM generates more accurate responses.
In our use case, our knowledge base is essentially the âAttention is All You Needâ paper. However, we canât store the paper as it is due to multiple reasons. First, the paper is too long, and giving an overly long context to the LLM wouldnât help as the context is too broad. Second, we canât perform similarity searches to fetch the most relevant context if our input is raw text.
Therefore, there are at least two steps we need to take before storing our paper into the knowledge base. First, we need to divide the paper into text chunks, and then transform each chunk into an embedding via an embedding model. Finally, we can store these embeddings into Milvus as our vector database.
Dify makes it easy for us to split the texts in the paper into chunks and turn them into embeddings. All we need to do is upload the PDF file of the paper, set the chunk length, and choose the embedding model via a slider. To do all these steps, go to âKnowledgeâ and then click "Create Knowledge". Next, youâll be prompted to upload the PDF file from your local computer. Therefore, itâs better if you download the paper from ArXiv and save it on your computer first.
Once weâve uploaded the file, we can set the chunk length, indexing method, the embedding model we want to use, and retrieval settings.
In the âChunk Settingâ area, you can choose any number as the maximum chunk length (in our use case, weâll set it to 100). Next, for âIndex Method,â we need to choose the âHigh Qualityâ option as itâll enable us to perform similarity searches to find relevant contexts. For âEmbedding Model,â you can choose any embedding model from OpenAI you want, but in this example, weâre going to use the text-embedding-3-small model. Lastly, for âRetrieval Setting,â we need to choose âVector Searchâ as we want to perform similarity searches to find the most relevant contexts.
Now if you click on âSave & Processâ and everything goes well, youâll see a green tick appear as shown in the following screenshot:
Step 4: Creating the RAG App
Up until this point, we have successfully created a knowledge base and stored it inside our Milvus database. Now weâre ready to create the RAG app.
Creating the RAG app with Dify is very straightforward. We need to go to âStudioâ instead of âKnowledgeâ like before, and then click on âCreate from Blank.â Next, choose âChatbotâ as the app type and give your App a name inside the provided field. Once youâre done, click âCreate.â Now youâll see the following page:
Under the âInstructionâ field, we can write a system prompt such as âAnswer the query from the user concisely.â Next, as âContext,â we need to click on the âAddâ symbol, and then add the knowledge base that weâve just created. This way, our RAG app will fetch possible contexts from this knowledge base to answer the userâs query.
Now that weâve added the knowledge base to our RAG app, the last thing we need to do is choose the LLM from OpenAI. To do so, you can click on the model list available in the upper right corner, as you can see in the screenshot below:
And now weâre ready to publish our RAG application! In the upper right-hand corner, click âPublish,â and there you can find many ways to publish our RAG app: we can simply run it in a browser, embed it on our website, or access the app via API. In this example, weâll just run our app in a browser, so we can click on "Run App".
And thatâs it! Now you can ask the LLM anything related to the âAttention is All You Needâ paper or any documents included in our knowledge base.
Conclusion
Youâve now built a working RAG application using Dify and Milvus, with minimal code and configuration. This approach makes the complex RAG architecture accessible to developers without requiring deep expertise in vector databases or LLM integration. Key takeaways:
- Low setup overhead: Using Docker Compose simplifies deployment
- No-code/low-code orchestration: Dify handles most of the RAG pipeline
- Production-ready vector database: Milvus provides efficient embedding storage and retrieval
- Extensible architecture: Easy to add documents or adjust parameters For production deployment, consider:
- Setting up authentication for your application
- Configuring proper scaling for Milvus (especially for larger document collections)
- Implementing monitoring for your Dify and Milvus instances
- Fine-tuning retrieval parameters for optimal performance The combination of Dify and Milvus enables the rapid development of RAG applications that can effectively leverage your organizationâs internal knowledge with modern large language models (LLMs). Happy building!
Additional Resources
- What You'll Build
- What You'll Need
- Building Your RAG Application with Milvus and Dify
- Conclusion
- Additional Resources
On This Page
Try Managed Milvus for Free
Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.
Get StartedLike the article? Spread the word