self-training and co-training

最新推荐文章于 2024-01-27 22:22:28 发布

转载最新推荐文章于 2024-01-27 22:22:28 发布 · 3.6k 阅读

文章标签：

5 篇文章

订阅专栏

本文介绍了半监督学习中广泛使用的方法，包括EM算法结合生成混合模型、自我训练(self-training)、协同训练(co-training)、传导支持向量机和基于图的方法等。详细探讨了自我训练与协同训练的工作原理及其应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Semi-supervised learning methods widely used include:

1.EM with generative mixture models

2.self-training

3.co-training

4.transductive support vector machines

5.graph-based methods

self-training:

A classifier is first traind with the small amount of labeled data. The classifier is then used to classify the unlabeled

data. Typically the most confident unlabeled data points, together with their predicted labels, are added to the

training set. The classifier is re-trained and the procedure repeated.

When the existing supervised classifier is complicated and hard to modify, self-training is a practical wrapper method.

applied to several natural language processing tasks, word sense disambiguation, parsing, machine translation and

object detection system from images.

co-training

Co-training assumes that features can be split into two sets. Each sub-features is sufficient to train a good classifier.

The two sets sre conditionally independent given the class. Initially two seperate classifiers are trained with the

labeled data, on the two sub-features sets respectively. Each classifier then classifies the unlabeled data, and

'teaches' the other classifier with the few unlabeled examples(and the predicted labels) they feel most confident.

Each classifier is retrained with the additional training examples given by the other classifer, and the process

repeats.

When the features naturally split into two sets, co-training may be appropriate.

Reference:

Xiaojin Zhu. Semi-Supervised Learning with Graphs.