关于 Milvus
开始
概念
用户指南
- 数据库
- Collections
- Schema 和数据字段
  - Schema 解释
  - 主字段和自动识别
  - 密集向量
  - 二进制向量
  - 稀疏向量
  - 字符串字段
  - 数字字段
  - JSON 字段
  - 数组字段
  - 结构数组
  - 几何领域
  - TIMESTAMPTZ 字段
  - Dynamic Field
  - 可归零字段
  - 默认值
  - 分析仪
  - 更改 Collections 字段
  - 为现有 Collections 添加字段
  - 最佳做法
- 插入和删除
- 索引
- 搜索
- 功能与模型推理
- 存储优化
- 剪影
数据导入
人工智能工具
管理指南
工具
集成
教程
常见问题
API Reference

Home
Docs
用户指南
Schema 和数据字段
可归零字段

可空字段

Milvus 支持可空字段，允许字段值缺失或显式设置为空。可空字段在 Schema 层面进行定义，并一致适用于数据摄取、索引、搜索和查询操作。

在以下情况下使用可空字段

从允许缺失值的外部系统摄取数据。
某些元数据是可选的，或仅适用于部分数据集。
以异步方式生成向量 Embeddings，并在稍后插入。

限制

允许 NULL 值的向量字段不支持IS NULL 或IS NOT NULL 过滤表达式。不能根据向量字段值是否为 NULL 来显式过滤实体。
结构体数组字段不支持 NULL 值。您不能将 "结构数组 "字段或嵌套在其中的任何字段标记为可空。
nullable 属性是在创建字段时定义的，之后不能修改。无法为现有字段启用或禁用可归零属性。
标记为可归零的字段不能用作分区键。Partition Key 字段必须始终包含有效的非空值。有关详细信息，请参阅使用分区键。

什么是可空字段？

在 Milvus 中，一个字段是否允许存储 NULL 值由名为nullable 的 Schema 级字段属性控制。

当使用nullable=True 定义字段时，Milvus 允许在数据摄取过程中字段值缺失。在实际操作中，Milvus 将以下两种输入视为等价，并将字段值存储为 NULL：

输入实体中省略了字段。
字段被明确设置为 NULL（例如，Python 中的None ）。

如果字段未定义为可空（默认行为），则每个实体都必须为该字段提供一个有效值。省略字段或显式赋值为 NULL 会导致插入或导入操作失败。

在 Collections 模式中，标量字段和向量字段都支持 nullable 属性。但是，结构数组字段不支持 nullable 属性。

可归零属性决定了字段值是否可能丢失，但并不定义字段丢失时使用的值。

如果配置的可归零字段没有默认值，则省略该字段会导致存储 NULL 值。
如果配置了默认值，Milvus 可能会存储默认值。详情请参阅默认值。

在 Collections Schema 中定义可空字段

要使用可空字段，必须在定义 Collections 模式时启用可空属性。

在本例中，Collection 模式定义了一个名为embedding 的向量字段，其属性为nullable=True 。这样，在数据采集过程中，Collection 中的实体就可以省略向量值或显式将其设置为 NULL。

Python Java NodeJS Go cURL

from pymilvus import MilvusClient, DataType

client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

# Define schema fields
schema = client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True)  # Primary field
schema.add_field(
    field_name="embedding",
    datatype=DataType.FLOAT_VECTOR,
    dim=4,
    nullable=True,  # Enable the nullable attribute; defaults to False
)

client.create_collection(
    collection_name="my_collection",
    schema=schema,
)

import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.common.DataType;
import io.milvus.v2.service.collection.request.AddFieldReq;
import io.milvus.v2.service.collection.request.CreateCollectionReq;

MilvusClientV2 client = new MilvusClientV2(ConnectConfig.builder()
        .uri("http://localhost:19530")
        .token("root:Milvus")
        .build());

CreateCollectionReq.CollectionSchema schema = CreateCollectionReq.CollectionSchema.builder()
        .build();

schema.addField(AddFieldReq.builder()
        .fieldName("id")
        .dataType(DataType.Int64)
        .isPrimaryKey(true)
        .build());
schema.addField(AddFieldReq.builder()
        .fieldName("embedding")
        .dataType(DataType.FloatVector)
        .dimension(4)
        .isNullable(true)
        .build());

client.createCollection(CreateCollectionReq.builder()
        .collectionName("my_collection")
        .collectionSchema(schema)
        .build());

import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";

const client = new MilvusClient({
  address: "http://localhost:19530",
  token: "root:Milvus",
});

await client.createCollection({
  collection_name: "my_collection",
  fields: [
    {
      name: "id",
      data_type: DataType.Int64,
      is_primary_key: true,
      autoID: false,
    },
    {
      name: "embedding",
      data_type: DataType.FloatVector,
      dim: 4,
      nullable: true,
    },
  ],
});

import (
    "context"
    "fmt"

    "github.com/milvus-io/milvus/client/v2/entity"
    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

client, err := milvusclient.New(ctx, &milvusclient.ClientConfig{
    Address: "localhost:19530",
})
if err != nil {
    fmt.Println(err.Error())
    // handle error
}
defer client.Close(ctx)

schema := entity.NewSchema()
schema.WithField(entity.NewField().
    WithName("id").
    WithDataType(entity.FieldTypeInt64).
    WithIsPrimaryKey(true),
).WithField(entity.NewField().
    WithName("embedding").
    WithDataType(entity.FieldTypeFloatVector).
    WithDim(4).
    WithNullable(true),
)

err = client.CreateCollection(ctx,
    milvusclient.NewCreateCollectionOption("my_collection", schema))
if err != nil {
    fmt.Println(err.Error())
    // handle error
}

export TOKEN="root:Milvus"
export CLUSTER_ENDPOINT="http://localhost:19530"

export pkField='{
  "fieldName": "id",
  "dataType": "Int64",
  "isPrimary": true
}'

export embeddingField='{
  "fieldName": "embedding",
  "dataType": "FloatVector",
  "typeParams": {"dim": "4"},
  "nullable": true
}'

curl --request POST \
  --url "${CLUSTER_ENDPOINT}/v2/vectordb/collections/create" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --header "Request-Timeout: 10" \
  -d "{
    \"collectionName\": \"my_collection\",
    \"schema\": {
      \"fields\": [
        $pkField,
        $embeddingField
      ]
    }
  }"

在此 Schema 中：

embedding 字段被明确标记为可为空。
实体可省略embedding 字段或在插入时为其赋值为 NULL。
是否允许 NULL 值在创建 Collections 时就已决定。

为清楚起见，下面的示例主要针对可空向量字段 (embedding)。定义可空标量字段是可选项，并非本指南其余部分所要求的。

可选：定义可空标量域

标量字段也可以使用相同的nullable 属性定义为可归零字段，并在输入时遵循相同的规则。例如

Python Java NodeJS Go cURL

schema.add_field(
    field_name="age",
    datatype=DataType.INT64,
    nullable=True,
)

schema.addField(AddFieldReq.builder()
        .fieldName("age")
        .dataType(DataType.Int64)
        .isNullable(true)
        .build());

// Add to the fields array when calling createCollection:
// { name: "age", data_type: DataType.Int64, nullable: true },

schema.WithField(entity.NewField().
    WithName("age").
    WithDataType(entity.FieldTypeInt64).
    WithNullable(true),
)

# Add another field object to the schema "fields" array, for example:
# { "fieldName": "age", "dataType": "Int64", "nullable": true }

缺失值或空值的插入行为

一旦在 Collections Schema 中将字段定义为可归零，Milvus 就允许在数据摄取过程中将字段值缺失或显式设置为 NULL。

下面的示例将三个实体插入在集合模式中定义可空字段时创建的 Collection 中，演示了这些不同的情况。

Python Java NodeJS Go cURL

data = [
    {
        "id": 1,
        "embedding": [0.1, 0.2, 0.3, 0.4],
    },
    {
        "id": 2,
        "embedding": None,  # Explicitly set to NULL
    },
    {
        "id": 3,  # Field omitted → stored as NULL
    },
]

client.insert(
    collection_name="my_collection",
    data=data,
)

import com.google.gson.Gson;
import com.google.gson.JsonNull;
import com.google.gson.JsonObject;
import io.milvus.v2.service.vector.request.InsertReq;

import java.util.Arrays;
import java.util.List;

Gson gson = new Gson();

JsonObject row1 = new JsonObject();
row1.addProperty("id", 1);
row1.add("embedding", gson.toJsonTree(Arrays.asList(0.1f, 0.2f, 0.3f, 0.4f)));

JsonObject row2 = new JsonObject();
row2.addProperty("id", 2);
row2.add("embedding", JsonNull.INSTANCE); // Explicitly set to NULL

JsonObject row3 = new JsonObject();
row3.addProperty("id", 3); // Field omitted; stored as NULL

List<JsonObject> data = Arrays.asList(row1, row2, row3);

client.insert(InsertReq.builder()
        .collectionName("my_collection")
        .data(data)
        .build());

const data = [
  { id: 1, embedding: [0.1, 0.2, 0.3, 0.4] },
  { id: 2, embedding: null },
  { id: 3 },
];

await client.insert({
  collection_name: "my_collection",
  data: data,
});

import (
    "context"
    "fmt"

    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

// Assumes `client` is the Milvus client from the Go schema example above.
ctx := context.Background()

rows := []any{
    map[string]any{"id": int64(1), "embedding": []float32{0.1, 0.2, 0.3, 0.4}},
    map[string]any{"id": int64(2), "embedding": nil},
    map[string]any{"id": int64(3)},
}

_, err := client.Insert(ctx, milvusclient.NewRowBasedInsertOption("my_collection", rows...))
if err != nil {
    fmt.Println(err.Error())
}

curl --request POST \
  --url "${CLUSTER_ENDPOINT}/v2/vectordb/entities/insert" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --header "Request-Timeout: 10" \
  -d '{
    "collectionName": "my_collection",
    "data": [
      {"id": 1, "embedding": [0.1, 0.2, 0.3, 0.4]},
      {"id": 2, "embedding": null},
      {"id": 3}
    ]
  }'

在此示例中

实体id = 1提供了一个有效的向量值。
实体id = 2明确地为embedding 字段指定了一个空值。
实体id = 3完全省略了embedding 字段；Milvus 将其存储为 NULL。

可空字段的索引行为

插入数据后，你可以像往常一样在可空字段上建立索引。关键区别在于 Milvus 在索引构建过程中如何处理 NULL 值：

只有具有非空值的实体才会被添加到索引中。
具有 NULL 值的实体将被跳过，不参与索引构建。

对于可空向量字段，这意味着只有具有有效向量的实体才能通过向量相似性进行搜索。

Python Java NodeJS Go cURL

# Set index parameters
index_params = client.prepare_index_params()
index_params.add_index(
    field_name="embedding",
    index_type="AUTOINDEX",
    metric_type="COSINE",
)

# Create index
client.create_index(
    collection_name="my_collection",
    index_params=index_params,
)

# Load collection for future search operations
client.load_collection(collection_name="my_collection")

import io.milvus.v2.common.IndexParam;
import io.milvus.v2.service.collection.request.LoadCollectionReq;
import io.milvus.v2.service.index.request.CreateIndexReq;

import java.util.Collections;

IndexParam indexParam = IndexParam.builder()
        .fieldName("embedding")
        .indexName("embedding_index")
        .indexType(IndexParam.IndexType.AUTOINDEX)
        .metricType(IndexParam.MetricType.COSINE)
        .build();

client.createIndex(CreateIndexReq.builder()
        .collectionName("my_collection")
        .indexParams(Collections.singletonList(indexParam))
        .build());

client.loadCollection(LoadCollectionReq.builder()
        .collectionName("my_collection")
        .build());

await client.createIndex({
  collection_name: "my_collection",
  field_name: "embedding",
  index_name: "embedding_idx",
  index_type: "AUTOINDEX",
  metric_type: "COSINE",
});

await client.loadCollection({
  collection_name: "my_collection",
});

import (
    "context"
    "fmt"

    "github.com/milvus-io/milvus/client/v2/entity"
    "github.com/milvus-io/milvus/client/v2/index"
    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

// Assumes `client` is the Milvus client from the Go schema example above.
ctx := context.Background()

indexOption := milvusclient.NewCreateIndexOption("my_collection", "embedding",
    index.NewAutoIndex(entity.COSINE))

_, err := client.CreateIndex(ctx, indexOption)
if err != nil {
    fmt.Println(err.Error())
}

_, err = client.LoadCollection(ctx, milvusclient.NewLoadCollectionOption("my_collection"))
if err != nil {
    fmt.Println(err.Error())
}

curl --request POST \
  --url "${CLUSTER_ENDPOINT}/v2/vectordb/indexes/create" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --header "Request-Timeout: 10" \
  -d '{
    "collectionName": "my_collection",
    "indexParams": [
      {
        "fieldName": "embedding",
        "metricType": "COSINE",
        "indexType": "AUTOINDEX"
      }
    ]
  }'

curl --request POST \
  --url "${CLUSTER_ENDPOINT}/v2/vectordb/collections/load" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --header "Request-Timeout: 10" \
  -d '{"collectionName": "my_collection"}'

此时：

具有有效嵌入值的实体已编入索引并可供搜索。
嵌入值为 NULL 的实体仍保留在 Collections 中，但它们不会包含在向量索引中。

可空字段的搜索行为

对可空字段执行搜索操作时，Milvus 只评估搜索中使用的字段的非空值实体。向量字段为空的实体会被自动跳过。

对于可空向量字段，例如本例中的embedding ：

只对具有有效向量值的实体进行评估和排序。
向量为空的实体不会导致错误。
如果有效向量的数量小于所请求的topK (limit) ，Milvus 返回的结果可能少于limit 。

下面的示例对可空向量域embedding 执行了向量搜索：

Python Java NodeJS Go cURL

res = client.search(
    collection_name="my_collection",
    data=[[0.1, 0.2, 0.3, 0.4]],
    anns_field="embedding",
    limit=3,
    search_params={"metric_type": "COSINE"},
    output_fields=["embedding"],
)

print(res)

import io.milvus.v2.service.vector.request.SearchReq;
import io.milvus.v2.service.vector.request.data.FloatVec;
import io.milvus.v2.service.vector.response.SearchResp;

import java.util.Arrays;
import java.util.Collections;

SearchResp res = client.search(SearchReq.builder()
        .collectionName("my_collection")
        .data(Collections.singletonList(new FloatVec(Arrays.asList(0.1f, 0.2f, 0.3f, 0.4f))))
        .annsField("embedding")
        .limit(3)
        .outputFields(Collections.singletonList("embedding"))
        .build());

System.out.println(res);

const res = await client.search({
  collection_name: "my_collection",
  data: [[0.1, 0.2, 0.3, 0.4]],
  anns_field: "embedding",
  limit: 3,
  search_params: { metric_type: "COSINE" },
  output_fields: ["embedding"],
});

console.log(res);

import (
    "context"
    "fmt"

    "github.com/milvus-io/milvus/client/v2/entity"
    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

// Assumes `client` is the Milvus client from the Go schema example above.
ctx := context.Background()

query := []float32{0.1, 0.2, 0.3, 0.4}
resultSets, err := client.Search(ctx, milvusclient.NewSearchOption(
    "my_collection",
    3,
    []entity.Vector{entity.FloatVector(query)},
).WithANNSField("embedding").
    WithOutputFields("embedding"))
if err != nil {
    fmt.Println(err.Error())
}
fmt.Println(resultSets)

curl --request POST \
  --url "${CLUSTER_ENDPOINT}/v2/vectordb/entities/search" \
  --header "Authorization: Bearer ${TOKEN}" \
  --header "Content-Type: application/json" \
  --header "Request-Timeout: 10" \
  -d '{
    "collectionName": "my_collection",
    "data": [[0.1, 0.2, 0.3, 0.4]],
    "annsField": "embedding",
    "limit": 3,
    "searchParams": {"metricType": "COSINE"},
    "outputFields": ["embedding"]
  }'

在此搜索中

只有具有非空embedding 值的实体才被视为候选实体。
embedding 的空值实体将被排除在评估之外。
返回结果的数量取决于 Collections 中存在多少有效向量。

查询和过滤的影响

前面的示例主要针对向量字段。本节将介绍 NULL 值在标量过滤表达式中的表现。

标量字段可以用nullable=True 定义，并遵循与向量字段相同的摄取规则。不过，NULL 标量值在过滤表达式中的值总是 false。

例如，给定一个可为空的标量字段age ，下面的过滤器会选择年龄大于 18 岁的实体：

Python Java NodeJS Go cURL

expr = "age > 18"

String filter = "age > 18";

const expr = "age > 18";

filter := "age > 18"

# Use in query/search filter parameter, for example:
# "filter": "age > 18"

age 为 NULL 的实体将被排除在结果之外，因为 NULL 值不满足筛选条件。

同样，相等检查也不匹配 NULL 值。例如

Python Java NodeJS Go cURL

expr = 'status == "active"'

String filter = "status == \"active\"";

const expr = 'status == "active"';

filter := `status == "active"`

# "filter": "status == \"active\""

status 为 NULL 的实体不会出现在结果中。

可空字段和默认值

当nullable 和default_value 都为一个字段配置时，以下规则决定了 Milvus 在插入过程中如何处理 NULL 输入或丢失的字段值。

已启用	默认值	用户输入（NULL 或省略）	结果
是	是（非空）	空值或省略	使用默认值
使用默认值	无	NULL 或省略	存储为 NULL
无	是（非空）	空或省略	使用默认值
无	无	空或省略	抛出错误
无	是（默认为空）	NULL 或省略	抛出错误

主要启示

当字段具有非空默认值时，无论是否启用nullable ，都会使用该值。
当nullable=True 但未设置默认值时，字段存储 NULL。
当nullable=False 但未设置默认值时，插入失败并显示错误。
在不可为空的字段上设置 NULL 默认值是无效的，会导致错误。

有关默认值的完整示例和 API 使用，请参阅默认值。

想要更快、更简单、更好用的 Milvus SaaS服务？

Zilliz Cloud是基于Milvus的全托管向量数据库，拥有更高性能，更易扩展，以及卓越性价比

免费试用 Zilliz Cloud

反馈

此页对您是否有帮助?

可空字段

限制

什么是可空字段？

在 Collections Schema 中定义可空字段

缺失值或空值的插入行为

可空字段的索引行为

可空字段的搜索行为

查询和过滤的影响

可空字段和默认值

目录

想要更快、更简单、更好用的 Milvus SaaS服务 ？

反馈

想要更快、更简单、更好用的 Milvus SaaS服务？