Generative Multi-Label Zero-Shot Learning
aRXIV 2021

Paperswithcode Badges

overview

Our proposed method CLF, current state-of-the-art for ZSL and GZSL on NUS-WIDE Dataset. Please do consider adding recent ZSL or GZSL results to the same.

Abstract

Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.

overview
overview

Classification Results

Below you will find quantitative results for ZSL and GZSL classification in comparison with the previous methods.


overview

Below you will find qualitative results for ZSL and GZSL task on examples from NUS-WIDE. in comparison with the previous methods. Alongside each example is a superset of top-5 predictions from our ALF, FLF and CLF. The true- and false-positive classes are enclosed in green and red boxes. For each fusion approach, a green tick (✅) and a red cross (❌) is shown for true- and false-positive labels in its top-5 predictions, respectively. Labels absent in the top-5 predictions of a fusion have no ✅ or ❌.


overview
overview

Detection Results

Below you will find detection results for GZSD on example images from MS-COCO using our multi-label CLF-based detection approach. The seen and unseen class detections are shown in red and blue.


overview

Citation

Acknowledgements

I thank Dat Huynh for discussions and feedback regarding the evaluation protocol and sharing details for the baseline zero-shot methods. I thank Aditya Arora for suggestions on the figure aesthetics.