🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

Why are embeddings called "dense representations"?

Embeddings are called “dense representations” because they encode information into compact, continuous-valued vectors where most dimensions contain meaningful data. This contrasts with “sparse” representations, like one-hot encoding, where vectors are high-dimensional and mostly filled with zeros. For example, in natural language processing (NLP), a word like “cat” represented as a one-hot vector might occupy a 10,000-dimensional space with a single “1” and the rest “0s.” An embedding, however, compresses this into a dense vector of, say, 300 dimensions, where each value is a learned floating-point number. These values are not arbitrary—they capture semantic or contextual relationships, allowing similar items (e.g., “cat” and “dog”) to have vectors closer in the embedding space.

A key reason for using dense vectors is their ability to generalize. Sparse representations treat each item as independent, making it hard for models to recognize patterns or similarities. Dense embeddings, on the other hand, are trained to place related items near each other. For instance, in Word2Vec or GloVe embeddings, words with similar meanings or usage contexts (like “king” and “queen”) end up with similar vector values. This density also enables mathematical operations: subtracting the “man” vector from “king” and adding “woman” might yield a vector close to “queen.” Such operations aren’t feasible with sparse vectors because they lack the continuous, structured relationships dense embeddings provide.

From a computational perspective, dense embeddings are efficient. Sparse vectors with thousands of dimensions require significant memory and processing power, while dense vectors reduce dimensionality without losing critical information. For example, in recommendation systems, representing users or items as 100-dimensional embeddings instead of sparse one-hot vectors drastically reduces the model’s input size, speeding up training and inference. Dense embeddings also help models generalize better by forcing them to learn compressed, shared representations. This is why modern architectures like transformers (e.g., BERT) rely on dense embeddings—they enable handling complex relationships in text while keeping computational costs manageable. The “density” here refers to both the compactness of the vector and the richness of the information each dimension holds.

Like the article? Spread the word