About Milvus
Get Started
Concepts
User Guide
Data Import
Administration Guide
Tools
Integrations
Tutorials
FAQs
API Reference

Home
Docs
User Guide
Schema & Data Fields
Analyzer
Built-in Analyzers
Standard

Standard Analyzer

The standard analyzer is the default analyzer in Milvus, which is automatically applied to text fields if no analyzer is specified. It uses grammar-based tokenization, making it effective for most languages.

The standard analyzer is suitable for languages that rely on separators (such as spaces, punctuation) for word boundaries. However, languages like Chinese, Japanese, and Korean require dictionary-based tokenizations. In such cases, using a language-specific analyzer like chinese or custom analyzers with specialized tokenizers (such as lindera, icu) and filters is highly recommended to ensure accurate tokenization and better search results.

Definition

The standard analyzer consists of:

Tokenizer: Uses the standard tokenizer to split text into discrete word units based on grammar rules. For more information, refer to Standard Tokenizer.
Filter: Uses the lowercase filter to convert all tokens to lowercase, enabling case-insensitive searches. For more information, refer to Lowercase.

The functionality of the standard analyzer is equivalent to the following custom analyzer configuration:

Python Java NodeJS Go cURL

analyzer_params = {
    "tokenizer": "standard",
    "filter": ["lowercase"]
}

Map<String, Object> analyzerParams = new HashMap<>();
analyzerParams.put("tokenizer", "standard");
analyzerParams.put("filter", Collections.singletonList("lowercase"));

const analyzer_params = {
    "tokenizer": "standard",
    "filter": ["lowercase"]
};

analyzerParams := map[string]any{"tokenizer": "standard", "filter": []any{"lowercase"}}

# restful
analyzerParams='{
  "tokenizer": "standard",
  "filter": [
    "lowercase"
  ]
}'

Configuration

To apply the standard analyzer to a field, simply set type to standard in analyzer_params, and include optional parameters as needed.

Python Java NodeJS Go cURL

analyzer_params = {
    "type": "standard", # Specifies the standard analyzer type
}

Map<String, Object> analyzerParams = new HashMap<>();
analyzerParams.put("type", "standard");

const analyzer_params = {
    "type": "standard", // Specifies the standard analyzer type
}

analyzerParams = map[string]any{"type": "standard"}

# restful
analyzerParams='{
  "type": "standard"
}'

The standard analyzer accepts the following optional parameters:

Parameter	Description
`stop_words`	An array containing a list of stop words, which will be removed from tokenization. Defaults to `_english_`, a built-in set of common English stop words.

Example configuration of custom stop words:

Python Java NodeJS Go cURL

analyzer_params = {
    "type": "standard", # Specifies the standard analyzer type
    "stop_words", ["of"] # Optional: List of words to exclude from tokenization
}

Map<String, Object> analyzerParams = new HashMap<>();
analyzerParams.put("type", "standard");
analyzerParams.put("stop_words", Collections.singletonList("of"));

analyzer_params = {
    "type": "standard", // Specifies the standard analyzer type
    "stop_words", ["of"] // Optional: List of words to exclude from tokenization
}

analyzerParams = map[string]any{"type": "standard", "stop_words": []string{"of"}}

# restful

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For more information, refer to Example use.

Examples

Before applying the analyzer configuration to your collection schema, verify its behavior using the run_analyzer method.

Analyzer configuration

Python Java NodeJS Go cURL

analyzer_params = {
    "type": "standard",  # Standard analyzer configuration
    "stop_words": ["for"] # Optional: Custom stop words parameter
}

Map<String, Object> analyzerParams = new HashMap<>();
analyzerParams.put("type", "standard");
analyzerParams.put("stop_words", Collections.singletonList("for"));

// javascript

analyzerParams = map[string]any{"type": "standard", "stop_words": []string{"for"}}

# restful
analyzerParams='{
  "type": "standard",
  "stop_words": [
    "of"
  ]
}'

Verification using `run_analyzer`

Python Java NodeJS Go cURL

from pymilvus import (
    MilvusClient,
)

client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

# Sample text to analyze
sample_text = "The Milvus vector database is built for scale!"

# Run the standard analyzer with the defined configuration
result = client.run_analyzer(sample_text, analyzer_params)
print("Standard analyzer output:", result)

import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.service.vector.request.RunAnalyzerReq;
import io.milvus.v2.service.vector.response.RunAnalyzerResp;

ConnectConfig config = ConnectConfig.builder()
        .uri("http://localhost:19530")
        .token("root:Milvus")
        .build();
MilvusClientV2 client = new MilvusClientV2(config);

List<String> texts = new ArrayList<>();
texts.add("The Milvus vector database is built for scale!");

RunAnalyzerResp resp = client.runAnalyzer(RunAnalyzerReq.builder()
        .texts(texts)
        .analyzerParams(analyzerParams)
        .build());
List<RunAnalyzerResp.AnalyzerResult> results = resp.getResults();

// javascript

import (
    "context"
    "encoding/json"
    "fmt"

    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

client, err := milvusclient.New(ctx, &milvusclient.ClientConfig{
    Address: "localhost:19530",
    APIKey:  "root:Milvus",
})
if err != nil {
    fmt.Println(err.Error())
    // handle error
}

bs, _ := json.Marshal(analyzerParams)
texts := []string{"The Milvus vector database is built for scale!"}
option := milvusclient.NewRunAnalyzerOption(texts).
    WithAnalyzerParams(string(bs))

result, err := client.RunAnalyzer(ctx, option)
if err != nil {
    fmt.Println(err.Error())
    // handle error
}

# restful

Expected output

Standard analyzer output: ['the', 'milvus', 'vector', 'database', 'is', 'built', 'scale']

Standard Analyzer
Definition
Configuration
Examples
Analyzer configuration
Verification using run_analyzer
Expected output

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?

Standard Analyzer

Definition

Configuration

Examples

Analyzer configuration

Verification using run_analyzer

Expected output

Table of contents

Try Managed Milvus for Free

Feedback

Verification using `run_analyzer`