🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
Home
  • User Guide
  • Home
  • Docs
  • User Guide

  • Indexes

  • Floating Vector Indexes

  • HNSW_PQ

HNSW_PQ

HNSW_PQ leverages Hierarchical Navigable Small World (HNSW) graphs with Product Quantization (PQ), creating an advanced vector indexing method that offers a controllable size-versus-accuracy trade-off. Compared to HNSW_SQ, this index type delivers a higher recall rate at the same compression level, albeit with lower query processing speed and longer index construction time.

Overview

HNSW_PQ combines two indexing techniques: HNSW for fast graph-based navigation and PQ for efficient vector compression.

HNSW

HNSW constructs a multi-layer graph where each node corresponds to a vector in the dataset. In this graph, nodes are connected based on their similarity, enabling rapid traversal through the data space. The hierarchical structure allows the search algorithm to narrow down the candidate neighbors, significantly accelerating the search process in high-dimensional spaces.

For more information, refer to HNSW.

PQ

PQ is a vector compression technique that breaks down high-dimensional vectors into smaller sub-vectors, which are then quantized and compressed. The compression dramatically reduces memory requirements and accelerates distance computations.

For more information, refer to IVF_PQ.

HNSW + PQ

HNSW_PQ combines the strengths of HNSW and PQ to enable efficient approximate nearest neighbor search. It uses PQ to compress the data (thus reducing memory usage), and then builds an HNSW graph on these compressed vectors to enable rapid candidate retrieval. During the search, the algorithm can optionally refine the candidate results using higher-precision data for improved accuracy. Here’s how the process works:

  1. Data Compression: PQ splits each vector into multiple sub-vectors and quantizes them using a codebook of centroids, controlled by parameters like m (sub-vector count) and nbits (bits per sub-vector).

  2. Graph Construction: The compressed vectors are then used to build an HNSW graph. Because the vectors are stored in a compressed form, the resulting graph is typically smaller, requires less memory, and can be traversed more quickly—significantly accelerating the candidate retrieval step.

  3. Candidate Retrieval: When a query is executed, the algorithm uses the compressed data in the HNSW graph to efficiently identify a pool of candidate neighbors. This graph-based lookup drastically reduces the number of vectors that must be considered, improving query latency compared to brute-force searches.

  4. (Optional) Result Refinement: The initial candidate results can be refined for better accuracy, based on the following parameters:

    • refine: Controls whether this refinement step is activated. When set to true, the system recalculates distances using higher-precision or uncompressed representations.

    • refine_type: Specifies the precision level of data used during refinement (e.g., SQ6, SQ8, BF16). A higher-precision choice such as FP32 can yield more accurate results but requires more memory. This must exceed the precision of the original compressed data set by sq_type.

    • refine_k: Acts as a magnification factor. For instance, if your top k is 100 and refine_k is 2, the system re-ranks the top 200 candidates and returns the best 100, enhancing overall accuracy.

For a full list of parameters and valid values, refer to Index params.

Build index

To build an HNSW_PQ index on a vector field in Milvus, use the add_index() method, specifying the index_type, metric_type, and additional parameters for the index.

from pymilvus import MilvusClient

# Prepare index building params
index_params = MilvusClient.prepare_index_params()

index_params.add_index(
    field_name="your_vector_field_name", # Name of the vector field to be indexed
    index_type="HNSW_PQ", # Type of the index to create
    index_name="vector_index", # Name of the index to create
    metric_type="L2", # Metric type used to measure similarity
    params={
        "M": 30, # Maximum number of neighbors each node can connect to in the graph
        "efConstruction": 360, # Number of candidate neighbors considered for connection during index construction
        "m": 384, 
        "nbits": 8,
        "refine": true, # Whether to enable the refinement step
        "refine_type": "SQ8" # Precision level of data used for refinement
    } # Index building params
)

In this configuration:

  • index_type: The type of index to be built. In this example, set the value to HNSW_PQ.

  • metric_type: The method used to calculate the distance between vectors. Supported values include COSINE, L2, and IP. For details, refer to Metric Types.

  • params: Additional configuration options for building the index. For details, refer to Index building params.

Once the index parameters are configured, you can create the index by using the create_index() method directly or passing the index params in the create_collection method. For details, refer to Create Collection.

Search on index

Once the index is built and entities are inserted, you can perform similarity searches on the index.

search_params = {
    "params": {
        "ef": 10, # Parameter controlling query time/accuracy trade-off
        "refine_k": 1 # The magnification factor
    }
}

res = MilvusClient.search(
    collection_name="your_collection_name", # Collection name
    anns_field="vector_field",  # Vector field name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],  # Query vector
    limit=3,  # TopK results to return
    search_params=search_params
)

In this configuration:

Index params

This section provides an overview of the parameters used for building an index and performing searches on the index.

Index building params

The following table lists the parameters that can be configured in params when building an index.

Parameter

Description

Value Range

Tuning Suggestion

HNSW

M

Maximum number of connections (or edges) each node can have in the graph, including both outgoing and incoming edges. This parameter directly affects both index construction and search.

Type: Integer Range: [2, 2048]

Default value: 30 (up to 30 outgoing and 30 incoming edges per node)

A larger M generally leads to higher accuracy but increases memory overhead and slows down both index building and search. Consider increasing M for datasets with high dimensionality or when high recall is crucial.

Consider decreasing M when memory usage and search speed are primary concerns.

In most cases, we recommend you set a value within this range: [5, 100].

efConstruction

Number of candidate neighbors considered for connection during index construction. A larger pool of candidates is evaluated for each new element, but the maximum number of connections actually established is still limited by M.

Type: Integer Range: [1, int_max]

Default value: 360

A higher efConstruction typically results in a more accurate index, as more potential connections are explored. However, this also leads to longer indexing time and increased memory usage during construction. Consider increasing efConstruction for improved accuracy, especially in scenarios where indexing time is less critical.

Consider decreasing efConstruction to speed up index construction when resource constraints are a concern.

In most cases, we recommend you set a value within this range: [50, 500].

PQ

m

The number of sub-vectors (used for quantization) to divide each high-dimensional vector into during the quantization process.

Type: Integer Range: [1, 65536]

Default value: None

A higher m value can improve accuracy, but it also increases the computational complexity and memory usage. m must be a divisor of the vector dimension (D) to ensure proper decomposition. A commonly recommended value is m = D/2.

In most cases, we recommend you set a value within this range: [D/8, D].

nbits

The number of bits used to represent each sub-vector's centroid index in the compressed form. It directly determines the size of each codebook. Each codebook will contain $2^{\textit{nbits}}$ centroids. For example, if nbits is set to 8, each sub-vector will be represented by an 8-bit centroid's index. This allows for $2^8$ (256) possible centroids in the codebook for that sub-vector.

Type: Integer Range: [1, 64]

Default value: 8

A higher nbits value allows for larger codebooks, potentially leading to more accurate representations of the original vectors. However, it also means using more bits to store each index, resulting in less compression. In most cases, we recommend you set a value within this range: [1, 16].

refine

A boolean flag that controls whether a refinement step is applied during search. Refinement involves reranking the initial results by computing exact distances between the query vector and candidates.

Type: Boolean Range: [true, false]

Default value: false

Set to true if high accuracy is essential and you can tolerate slightly slower search times. Use false if speed is a priority and a minor compromise in accuracy is acceptable.

refine_type

Determines the precision of the data used during the refinement process. This precision must be higher than that of the compressed vectors (as set by m and nbits parameters).

Type: String Range:[ SQ6, SQ8, BF16, FP16, FP32 ]

Default value: None

Use FP32 for maximum precision at a higher memory cost, or SQ6/SQ8 for better compression. BF16 and FP16 offer a balanced alternative.

Index-specific search params

The following table lists the parameters that can be configured in search_params.params when searching on the index.

Parameter

Description

Value Range

Tuning Suggestion

HNSW

ef

Controls the breadth of search during nearest neighbor retrieval. It determines how many nodes are visited and evaluated as potential nearest neighbors. This parameter affects only the search process and applies exclusively to the bottom layer of the graph.

Type: Integer Range: [1, int_max]

Default value: limit (TopK nearest neighbors to return)

A larger ef generally leads to higher search accuracy as more potential neighbors are considered. However, this also increases search time. Consider increasing ef when achieving high recall is critical and search speed is less of a concern.

Consider decreasing ef to prioritize faster searches, especially in scenarios where a slight reduction in accuracy is acceptable.

In most cases, we recommend you set a value within this range: [K, 10K].

PQ

refine_k

The magnification factor that controls how many extra candidates are examined during the refinement (reranking) stage, relative to the requested top K results.

Type: Float Range: [1, float_max)

Default value: 1

Higher values of refine_k can improve recall and accuracy but will also increase search time and resource usage. A value of 1 means the refinement process considers only the initial top K results.

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?