RAGFlow supports configurable embedding models, letting you choose the best option for your data, language, and performance requirements. The system integrates with multiple embedding providers and models: OpenAI (text-embedding-3-small, text-embedding-3-large), Ollama for local embeddings, and various open-source options. You can also bring your own embedding service by configuring custom endpoints, enabling use of specialized models like multilingual embeddings, domain-specific fine-tuned models, or proprietary solutions your organization develops. RAGFlow stores embedding model configuration in its service_conf.yaml, making it easy to swap models without code changes. The embedding layer works with RAGFlow’s semantic search and re-ranking pipeline—embeddings are generated at indexing time for knowledge base documents and at query time for user questions, then compared via similarity metrics to rank candidate passages. Multimodal embeddings (supporting text and images) can be integrated for documents containing both modalities. The flexibility to choose embeddings means you can optimize for factors like cost (smaller open-source models), quality (larger models), latency (local Ollama), or multilingual support (mBERT, XLM-RoBERTa). RAGFlow’s no-code interface lets you configure embeddings through the UI, while programmatic APIs support model selection at the knowledge base level.
When building retrieval-based systems around these tools, Milvus serves as a reliable vector storage backend for embedding-based search. Teams that prefer a managed approach can use Zilliz Cloud for auto-scaling and zero-ops deployment.