监督学习,非监督学习,半监督学习

本文介绍了机器学习的基本分类:监督学习、无监督学习及半监督学习。通过实例解释了每种学习方式的应用场景,并强调了它们之间的区别。

机器学习是一门比较综合的课程,本文简单介绍机器学习的概念以及它的分类。机器学习跟数据挖掘在很多地方是重合的,比如他们都用相同的算法,但是两者的区别在于机器学习侧重于预测从训练数据得到的属性,而数据挖掘则侧重于挖掘数据中未知的属性。


1.     Supervised Learning

SL is always used to classify some labeled data. Imagine we have a bunch of labeled data sets T = {(l,c) belongs to I x C} and their predefined classes: C ={c1, …, ck}. The task here is to find a mapping of labeled data and classes, so that any labeled data l from I, m(l) = c. After the training of our model, we can present new instances to m and compute the classes.

For instance, you are a cancer doctor, each day you see many patients.We want to use a mode to predict whether the cancer is benign or malignant. We can at first use the cancer records in the past to build a model according to the characteristics, which will be used to pre-check all patients at first with his ill descriptions. E.g. according to the age of patient and the size of tumor to classify the patient. If the patient is older than 55 years old and the tumor is bigger than 10mm, this tumor will be classified malignant.


2.     Unsupervised Learning

USL is used to do clustering, for instance we can build the different clustering with different topics. In contrast to SL, there is not predefined group longer, what’s given is just only the describing instances. The problem we are facing is how to build the clustering, in other words, what’s thecriteria for clustering, by similarity or distance between instances? Formally,let dataset I be given, our task is to group this dataset I,  g(l) = C and find a mapping m between I and C, so that so that any labeled data l from I, m(l) = c .


3.     Semi-supervised Learning

It locates between SL and USL, namely the training data consists of some labeled data and some unlabeled data, our task here is either to classifying or toclustering. This learning method comes because the unlabeled data is cheaper and labeled data is hard to get. E.g. sometimes the human expert gives thewrong label to instances or some special methods or devices will be needed to label a data. In this algorithm the number of unlabeled data is quite bigger than the labeled data.


All of above is the basic categories of machine learning.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值