Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

本文提出了一种名为LisGAN的新方法,利用生成对抗网络直接从随机噪声生成不可见特征,并通过灵魂样本进行约束。灵魂样本是类别的元表示,用于确保生成样本与同一类别保持接近。在零样本识别阶段,通过级联分类器实现精细化结果,超越了现有方法的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

PDF:Leveraging the Invariant Side of Generative Zero-Shot Learning
code:implement by pytorch

摘要

Conventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, figuratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classifiers, which are deployed in a cascade way, to achieve a coarse-to-fine result. Experiments on five popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with significant improvements.
本文利用条件WGAN生成不可见类的feature,然后利用可见类训练集中的feature与生成的feature训练一个分类器,利用该分类器即可完成零样本学习的预测。
本文有两个创新点
1.提出用soul samples解决visual object的multi-view的质量问题(详见下文),同时其可以约束GAN的generator生成的fake feature.
2.在训练分类器时,提出使用串联的分类器,以达到得到由粗到细的效果。做法是:将第一个分类器输出具有高确信度的feature加入到第一个分类器的输入数据中,然后训练第二个分类器。加入的数据中可能包含不可见类的feature。(提升0.5%~1%)

网络框图

在这里插入图片描述

标记:
soul samples:由可见类聚类取簇中心得来(K=3)。

为什么用soul example?

在这里插入图片描述

训练过程(代码和论文稍有一点不一样)

WGAN的generator和discriminator交替更新权重:
L D = E [ D ( G ( z , a ) ) ] − E

### Zero-Shot Learning Chain in Machine Learning and NLP Applications Zero-shot learning (ZSL) refers to the ability of a model to make accurate predictions about previously unseen classes or data points without any explicit training on these specific instances. In the context of natural language processing (NLP), zero-shot capabilities allow models to understand and generate responses for tasks they have not been specifically trained on, leveraging pre-existing knowledge from related domains. #### Conceptual Overview In traditional supervised learning scenarios, models require labeled datasets corresponding directly to each task at hand. However, with zero-shot approaches, especially within chains or sequences of operations, models can generalize across different types of inputs by relying on abstract reasoning skills learned during initial training phases[^1]. This characteristic is particularly valuable when dealing with rapidly evolving application areas where new categories emerge frequently but obtaining sufficient annotated samples might be challenging. For instance, consider an intelligent assistant that needs to respond appropriately even if it encounters novel user queries outside its original dataset; this would involve chaining together multiple components such as intent recognition, entity extraction, dialogue management—all operating under a unified framework capable of handling unknown elements effectively. #### Practical Implementation Example To demonstrate how one could implement a zero-shot chain using modern large language models like those mentioned earlier which tend towards being closed-source[^3], here’s a simplified Python code snippet demonstrating interaction between two hypothetical modules: ```python from langchain import LangChainModel # Hypothetical API-based access point def perform_zero_shot_task(input_text): """ Demonstrates performing a zero-shot operation utilizing chained services. Args: input_text (str): Input string provided by end-user requiring analysis. Returns: dict: Dictionary containing results after passing through various stages. """ # Initialize service clients based on available APIs classifier_client = LangChainModel(api_key="your_api_key", endpoint="/classify") generator_client = LangChainModel(api_key="your_api_key", endpoint="/generate") classification_result = classifier_client.predict(text=input_text) generated_response = generator_client.generate(prompt=classification_result['label']) return { "input": input_text, "predicted_class": classification_result["label"], "response": generated_response } ``` This example shows how easily adaptable systems built around composability principles can handle diverse requests while maintaining flexibility regarding underlying technologies used—whether open source or proprietary solutions accessed via web interfaces. --related questions-- 1. How does prompt engineering influence performance in zero-shot settings? 2. What are some common challenges faced when implementing real-world applications involving zero-shot learning chains? 3. Can you provide examples of industries benefiting most significantly from adopting zero-shot methodologies? 4. Are there particular architectural designs better suited than others for supporting efficient implementation of zero-shot workflows?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值