🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • What role do special tokens (such as [CLS] or [SEP]) play in Sentence Transformer models?

What role do special tokens (such as [CLS] or [SEP]) play in Sentence Transformer models?

Special tokens like [CLS] and [SEP] play specific structural and functional roles in Sentence Transformer models, which are built on transformer architectures like BERT. These tokens help the model process and organize input text effectively, enabling tasks like sentence embedding generation or pairwise sentence comparison. While their roles are inherited from the original transformer designs, they adapt to the unique needs of sentence-level tasks in Sentence Transformers.

The [CLS] token (short for “classification”) is typically placed at the beginning of the input sequence. In models like BERT, the final hidden state of this token is often used as a summary representation for classification tasks. However, in Sentence Transformers, the [CLS] token’s role can vary. Some implementations use it as the basis for generating sentence embeddings by applying pooling operations (e.g., mean or max pooling) over its output. For example, in the bert-base-nli-mean-tokens model, embeddings are derived by averaging all token outputs, including [CLS]. However, other models might directly use the [CLS] token’s output as the sentence representation if fine-tuned for that purpose. The [SEP] token (short for “separator”) is used to mark boundaries between sentences, especially when processing pairs of sentences. For instance, in a semantic similarity task, two sentences might be concatenated with a [SEP] token between them, allowing the model to distinguish where one sentence ends and the next begins. This separation is critical for cross-encoder architectures that compare sentence pairs directly.

The presence of these tokens ensures compatibility with pre-trained transformer models and maintains consistency in input formatting. For developers, this means following tokenization conventions when preparing inputs. For example, when encoding a single sentence, you might format it as [CLS] sentence [SEP], while a sentence pair would be [CLS] sentence1 [SEP] sentence2 [SEP]. Neglecting these tokens can lead to suboptimal performance, as the model expects them to structure its understanding of the input. However, in some Sentence Transformer implementations, the handling of these tokens is abstracted away by libraries like sentence-transformers, which automatically add them during preprocessing. Understanding their role helps developers debug issues, customize tokenization for specific tasks, or adapt models to new use cases where input structures might differ from standard setups.

Like the article? Spread the word