监督学习，非监督学习，半监督学习

最新推荐文章于 2025-06-25 21:15:15 发布

原创最新推荐文章于 2025-06-25 21:15:15 发布 · 635 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #数据挖掘 #监督学习 #半监督学习

算法同时被 2 个专栏收录

1 篇文章

订阅专栏

机器学习

1 篇文章

订阅专栏

本文介绍了机器学习的基本分类：监督学习、无监督学习及半监督学习。通过实例解释了每种学习方式的应用场景，并强调了它们之间的区别。

部署运行你感兴趣的模型镜像

机器学习是一门比较综合的课程，本文简单介绍机器学习的概念以及它的分类。机器学习跟数据挖掘在很多地方是重合的，比如他们都用相同的算法，但是两者的区别在于机器学习侧重于预测从训练数据得到的属性，而数据挖掘则侧重于挖掘数据中未知的属性。

1. Supervised Learning

SL is always used to classify some labeled data. Imagine we have a bunch of labeled data sets T = {(l,c) belongs to I x C} and their predefined classes: C ={c1, …, ck}. The task here is to find a mapping of labeled data and classes, so that any labeled data l from I, m(l) = c. After the training of our model, we can present new instances to m and compute the classes.

For instance, you are a cancer doctor, each day you see many patients.We want to use a mode to predict whether the cancer is benign or malignant. We can at first use the cancer records in the past to build a model according to the characteristics, which will be used to pre-check all patients at first with his ill descriptions. E.g. according to the age of patient and the size of tumor to classify the patient. If the patient is older than 55 years old and the tumor is bigger than 10mm, this tumor will be classified malignant.

2. Unsupervised Learning

USL is used to do clustering, for instance we can build the different clustering with different topics. In contrast to SL, there is not predefined group longer, what’s given is just only the describing instances. The problem we are facing is how to build the clustering, in other words, what’s thecriteria for clustering, by similarity or distance between instances? Formally,let dataset I be given, our task is to group this dataset I, g(l) = C and find a mapping m between I and C, so that so that any labeled data l from I, m(l) = c .

3. Semi-supervised Learning

It locates between SL and USL, namely the training data consists of some labeled data and some unlabeled data, our task here is either to classifying or toclustering. This learning method comes because the unlabeled data is cheaper and labeled data is hard to get. E.g. sometimes the human expert gives thewrong label to instances or some special methods or devices will be needed to label a data. In this algorithm the number of unlabeled data is quite bigger than the labeled data.