milvus-logo
LFAI
Home
  • User Guide

English

The english analyzer in Milvus is designed to process English text, applying language-specific rules for tokenization and filtering.

Definition

The english analyzer uses the following components:

  • Tokenizer: Uses the standard tokenizer to split text into discrete word units.

  • Filters: Includes multiple filters for comprehensive text processing:

    • lowercase: Converts all tokens to lowercase, enabling case-insensitive searches.

    • stemmer: Reduces words to their root form to support broader matching (e.g., “running” becomes “run”).

    • stop_words: Removes common English stop words to focus on key terms in text.

The functionality of the english analyzer is equivalent to the following custom analyzer configuration:

analyzer_params = {
    "tokenizer": "standard",
    "filter": [
        "lowercase",
        {
            "type": "stemmer",
            "language": "english"
        },{
            "type": "stop",
            "stop_words": "_english_",
        }
    ]
}

Configuration

To apply the english analyzer to a field, simply set type to english in analyzer_params, and include optional parameters as needed.

analyzer_params = {
    "type": "english",
}

The english analyzer accepts the following optional parameters:

Parameter

Description

stop_words

An array containing a list of stop words, which will be removed from tokenization. Defaults to _english_, a built-in set of common English stop words.

Example configuration with custom stop words:

analyzer_params = {
    "type": "english",
    "stop_words": ["a", "an", "the"]
}

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.

Example output

Here’s how the english analyzer processes text.

Original text:

"The Milvus vector database is built for scale!"

Expected output:

["milvus", "vector", "databas", "built", "scale"]
Table of contents

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?