Zero-shot learning (ZSL) in deep learning refers to a scenario where a model is trained to recognize classes it has never explicitly seen during training. Unlike traditional supervised learning, which requires labeled examples for every class the model will encounter, ZSL enables generalization to unseen categories by leveraging auxiliary information. For example, a model trained to recognize animals like cats and dogs could later identify a “zebra” without ever seeing a zebra image, provided it understands semantic relationships (e.g., “zebras have stripes, are horse-like, and live in savannas”). This approach is useful when acquiring labeled data for all possible classes is impractical, such as in large-scale image recognition or niche domains.
ZSL typically relies on embedding spaces or attribute-based frameworks to bridge seen and unseen classes. A common method involves mapping input features (e.g., image pixels) to semantic representations like word embeddings (e.g., from Word2Vec or GloVe) or manually defined attributes (e.g., “has wings,” “is metallic”). During training, the model learns a function to align visual features with these semantic descriptors. For instance, if a bird is described as “feathered” and “capable of flight,” the model associates those attributes with bird images. At inference time, when presented with an unseen class (e.g., “penguin”), the model predicts by comparing its input features to the semantic descriptions of all classes, even those absent from the training data. This requires the semantic space to encode meaningful relationships between classes, such as hierarchical taxonomies or linguistic similarities.
ZSL has practical applications in areas like image classification, natural language processing, and multilingual translation. For example, in NLP, a translation model could handle a low-resource language by leveraging shared linguistic features with related languages. However, challenges remain. Performance depends heavily on the quality of the semantic representations: poorly defined attributes or noisy word embeddings degrade results. Another issue is bias toward seen classes—models might incorrectly assign unseen examples to familiar categories. Techniques like generative adversarial networks (GANs) can mitigate this by synthesizing features for unseen classes during training. Datasets like Animals with Attributes, which maps animals to traits like “striped” or “aquatic,” are often used to benchmark ZSL methods. While not a replacement for supervised learning, ZSL offers a flexible alternative when labeled data for all classes is unavailable.