论文阅读(九)：多实例学习问题中的包编码策略（2018）

本文链接：https://blog.youkuaiyun.com/qq_39443703/article/details/120953643

文章探讨了多实例学习中的三种编码策略：k-means编码、路径编码和终端节点编码，用于将包转换为单向量进行分类。这些策略通过实例特征空间的划分和树结构来捕获信息，其中基于树的编码在效率、准确性和鲁棒性方面表现出色，且在大规模MIL问题上具有高度可扩展性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

摘要
编码方式
分类

摘要

原文：
Multiple instance learning (MIL) deals with supervised learning tasks, where the aim is to learn from a set of labeled bags containing certain number of instances. In MIL setting, instance label information is unavailable, which makes it difficult to apply regular supervised learning. To resolve this problem, researchers devise methods focusing on certain assumptions regarding the instance labels. However, it is not a trivial task to determine which assumption holds for a new type of MIL problem. A bag-level representation based on instance characteristics does not require assumptions about the instance labels and is shown to be successful in MIL tasks. These approaches mainly encode bag vectors using bag-of-features representations. In this paper, we propose tree-based encoding strategies that partition the instance feature space and represent the bags using the frequency of instances residing at each partition. Our encoding implicitly learns generalized Gaussian Mixture Model (GMM) on the instance feature space and transforms this information into a bag- level summary. We show that bag representation using tree ensembles provides fast, accurate and robust representations. Our experiments on a large database of MIL problems show that tree-based encoding is highly scalable, and its performance is competitive with the state-of-the-art algorithms.
译文：
多实例学习 (MIL) 处理监督学习任务，其目的是从一组包含一定数量实例的标记包中学习。在 MIL 设置中，实例标签信息不可用，这使得应用常规监督学习变得困难。为了解决这个问题，研究人员设计了一些方法，重点是关于实例标签的某些假设。然而，确定哪种假设适用于新型 MIL 问题并非易事。基于实例特征的袋子级表示不需要对实例标签进行假设，并且在 MIL 任务中证明是成功的。这些方法主要使用特征袋表示对袋子向量进行编码。在本文中，我们提出了基于树的编码策略，该策略划分实例特征空间并使用驻留在每个分区的实例频率来表示袋子。我们的编码隐式地学习实例特征空间上的广义高斯混合模型（GMM），并将此信息转换为包级摘要。我们展示了使用树集成的包表示提供了快速、准确和稳健的表示。我们在大型 MIL 问题数据库上的实验表明，基于树的编码具有高度的可扩展性，其性能可与最先进的算法相媲美。
小结：总的来说，就是先将包通过某个策略编码成单向量，然后进行分类。

编码方式

论文中一共给出了三种编码方式：分别为 $k - m e a n s - e n c o d i n g$ 、 $p a t h - e n c o d i n g$ 和 $t e r m i n a l n o d e - e n c o d i n g$ 。下图总结了这三种编码方式：
请添加图片描述

k-means-encoding

这个编码方式思路很简单：先将所有训练实例进行k-means聚类。然后包被表示为驻留在每个簇的实例数量。举个实例：先将实例分成K簇，包 $X_i$ 驻留在每一个簇的实例个数分别为 $Node_1,Node_2,\cdots,Node_k$ ，那么改包就会被编码成 $[Node_1,Node_2,\cdots,Node_k]$ 。

terminal node-encoding

请添加图片描述
先用树学习器以随机方式划分特征空间。给定树的深度 (h)，在每个树构建步骤中选择一个随机特征和一个随机分裂点。为了从特征空间的不同区域捕获信息，训练 $\tau$ 颗树。对于一个包 $X_i$ ，如图(a)，有7个实例，根据特征划分，落在终端节点处的实例数分别为 $[1, 1, 1, 0, 1, 2, 1, 0]$ ，那么通过这棵树，改包被表示为向量 $[1, 1, 1, 0, 1, 2, 1, 0]$ ，同理，每一个包将会被 $\tau$ 颗树编码成这样的向量。然后将这些向量拼接起来作为新的包表示向量。

Path-encoding

树的终端节点意味着特征空间中的一个区域，可以将其视为一个簇。然而，由于每个终端节点被认为是一个单独的集群，这些集群之间的关系信息丢失了。为了避免这个问题，文中提出了一种基于路径的表示，如图(b)所示。树的每一个子节点会包中实例访问的次数，因此，最后包会被编码成 $[3, 4, 2, 1, 3, 1, 1, 1, 1, 0, 1, 2, 1, 0]$ ；同理，每一个包将会被 $\tau$ 颗树编码成这样的向量。然后将这些向量拼接起来作为新的包表示向量。