Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

最新推荐文章于 2025-11-24 19:11:26 发布

原创

最新推荐文章于 2025-11-24 19:11:26 发布 · 973 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #机器学习 #神经网络

本文提出了一种名为LisGAN的新方法，利用生成对抗网络直接从随机噪声生成不可见特征，并通过灵魂样本进行约束。灵魂样本是类别的元表示，用于确保生成样本与同一类别保持接近。在零样本识别阶段，通过级联分类器实现精细化结果，超越了现有方法的性能。

PDF:Leveraging the Invariant Side of Generative Zero-Shot Learning
code:implement by pytorch

摘要

Conventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, ﬁguratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classiﬁers, which are deployed in a cascade way, to achieve a coarse-to-ﬁne result. Experiments on ﬁve popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with signiﬁcant improvements.
本文利用条件WGAN生成不可见类的feature，然后利用可见类训练集中的feature与生成的feature训练一个分类器，利用该分类器即可完成零样本学习的预测。
本文有两个创新点：
1.提出用soul samples解决visual object的multi-view的质量问题(详见下文)，同时其可以约束GAN的generator生成的fake feature.
2.在训练分类器时，提出使用串联的分类器，以达到得到由粗到细的效果。做法是：将第一个分类器输出具有高确信度的feature加入到第一个分类器的输入数据中，然后训练第二个分类器。加入的数据中可能包含不可见类的feature。（提升0.5%~1%）