How Bitset Enables the Versatility of Vector Similarity Search
Bitset Cover Image
By Yudong Cai and Angela Ni.
Various new essential features of a vector database are provided together with the release of Milvus 2.0. Among the new features, Time Travel, attribute filtering, and delete operations are correlated as these three features are achieved by one common mechanism - bitset.
Therefore, this article aims to clarify the concept of bitset in Milvus and explain how it works to support delete operations, Time Travel, and attribute filtering with three examples.
What is bitset?
A bitset is an array of bit numbers (“0” and “1”) that can be used to represent certain data information. With bitsets, you can store certain types of data compactly and efficiently as opposed to store them in Ints, floats, or chars. Bitsets work on boolean logic, according to which the value of an output is either valid or invalid, usually denoted by “1” and “0” respectively. “1” stands for valid, and “0” for invalid. Since bitsets are highly efficient and can save storage, they can also be used to achieve many features such as attribute filtering, delete operations, Time Travel, and more.
Starting from version 0.7.0, the concept of bitset has been introduced in Milvus to enable the delete function. More specifically, bitset is used to mark if each row in the segment is deleted. Deleted entities are marked with “1” in the corresponding bitset, and as a result, the deleted entities will not be computed during a search or query.
In the Milvus 2.0 version, the application of bitset is extended to enable more features, like attribute filtering and Time Travel. The general principle in a bitset remains the same. That is, if an entity is marked with “1” in the corresponding bitset, the entity will be ignored during a search or query. Bitsets are used to enable 3 features in Milvus:
- Attribute filtering
- Data deletion
- Query with Time Travel
How does bitset work in Milvus?
The examples below are used to illustrate how bitset works in Milvus.
Prerequisites
Suppose there is a segment with eight entities and a series of data manipulation language (DML) events happens in the order shown in the figure below.
- Four of the entities, whose
primary_keys
are [1, 2, 3, 4] respectively, are inserted when the timestampts
equals 100. - The remaining four entities, whose
primary_keys
are [5, 6, 7, 8], are inserted when the timestampts
equals 200. - Entities whose
primary_keys
are [7, 8] are deleted when the timestampts
equals 300. - Only entities, whose
primary_keys
are [1, 3, 5, 7], satisfy the conditions of attribute filtering.
DML events
Case one
Suppose the value a user sets for time_travel
is 150. In other words, the user conducts a query on the data stored in Milvus when ts
= 150. The bitset generation process is illustrated by Figure 1.
During the initial filtering stage, the result of the filter_bitset
should be [1, 0, 1, 0, 1, 0, 1, 0] as entities [1, 3, 5, 7] are valid filtering results and marked as “1” in the bitset. However, entities [4, 5, 6, 7] were not even inserted to the vector database when ts
equals 150. Therefore, these four entities should be marked as “0” regardless of the filtering condition. Now the bitset result should be [1, 0, 1, 0, 0, 0, 0, 0]. Since in Milvus, the general principle of bitset computing is that entities marked with “1” in the bitset are ignored during a search or query, the bitset result after Time Travel and attribute filtering needs to be flipped in order to be combined with the deletion bitmap. The flipped result of filter_bitset
should be [0, 1, 0, 1, 1, 1, 1, 1].
As for the deletion bitset del_bitset
, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 are not deleted until ts
is 300. Therefore, when ts
is 150, entities 7 and 8 are still valid. As a result, the del_bitset
value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 1, 1, 1, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 1, 1, 1, 1]. That is to say, only entities 1 and 3 will be computed in the following search or query stage.
Figure 1
Case two
Suppose the value the user sets for time_travel
is 250. In other words, the user conducts a query on the data stored in Milvus when ts
= 250. The bitset generation process is illustrated by Figure 2.
Like in case one, the resultant filter_bitset
of the initial attribute filtering stage should be [1, 0, 1, 0, 1, 0, 1, 0].
All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted to the vector database when ts
= 250. Therefore, the previous result of filter_bitset
remains the same. Again, we need to flip the result of the filter_bitset
, and we will get [0, 1, 0, 1, 0, 1, 0, 1].
As for the deletion bitset del_bitset
, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 were not deleted until ts
is 300. Therefore, when ts
is 250, entities 7 and 8 are still valid. As a result, the del_bitset
value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 0, 1, 0, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 0, 1, 0, 1]. That is to say, only entities [1, 3, 5, 7] will be computed in the following search or query stage.
Figure 2
Case three
Suppose the value the user sets for time_travel
is 350. In other words, the user conducts a query on the data stored in Milvus when ts
= 350. The bitset generation process is illustrated by Figure 3.
Same as case one and two, the resultant filter_bitset
of the initial attribute filtering stage is [0, 1, 0, 1, 0, 1, 0, 1].
All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted to the vector database when ts
= 350. Therefore, the final flipped result of the filter_bitset
is [0, 1, 0, 1, 0, 1, 0, 1], the same as in case two.
As for the deletion bitset del_bitset
, since entities 7 and 8 are already deleted when ts
=350, therefore, the result of del_bitset
should be [0, 0, 0, 0, 0, 0, 1, 1].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 0, 1, 0, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 1, 1]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 0, 1, 1, 1]. That is to say, only entities [1, 3, 5] will be computed in the following search or query stage.
Figure 3
What’s next?
In the 2.0 new feature series blog, we aim to explain the design of the new features. Read more in this blog series!
Try Managed Milvus for Free
Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.
Get StartedLike the article? Spread the word