多分类任务实质上可以使用多个二分类器来解决。这篇博客主要介绍三种使用二分类器解决多分类任务的方法。虽然softmax之后使用交叉熵损失也可以解决多分类任务,但这篇博客不介绍这种方法。这篇博客主要介绍以下三种方法,这三种方法均是基于对训练集的拆分来进行操作的。
(1)一对一(One vs One,简称OvO)
设数据集为,
。OvO的思想就是使每两个类构造一个二分类器,之后使用投票方式来进行预测。假设现在只有四个类
,那么分类器的构造如下表所示:
正例 | 反例 | 分类器 |
总共构造个二分类器,之后对于新的测试样本
,只需将样本输入至以上构造的每个二分类器中,最后采用投票方式取预测得到票数最多的那一类,如下表所示,每个分类器对于测试样本进行预测,
类得票最多,则将其预测为
类:
分类器 | 预测类别 | 最终预测类别 |
(2)一对其余(One vs Rest,简称OvR)
数据集还采用(1)中的数据集,OvR的思想是依次将每个类别作为正例,其余类别统一作为反例来构造二分类器,显然这样可以构造个二分类器。在对于新的测试样本进行预测时,这
个二分类器中只有1个会将其预测为正例,那么这个正例所对应的类别就是这个测试样本的预测类别。我们依然假设只有4个类别,则OvR如下表所示:
正例(+) | 反例(-) | 分类器 |
在对测试样本进行预测时
分类器 | 预测结果 | 最终预测结果 |
- | ||
+ | ||
- | ||
- |
(3)多对多(Many vs Many,简称MvM)
MvM的思想是对于训练集做M次划分,每次划分选择一些类别作为正例,剩下的类别作为反例来训练得到一个二分类器,于是可以训练得到M个二分类器。于是针对原数据中的一个类别
,运用这M个二分类器中的每一个分类器对其进行预测可以得到一个预测结果串[+1, -1, -1, +1, ...],称之为编码串,于是对于N个类别进行预测得到一个编码矩阵,其形状为(N,M),这里还假设原数据只有4个类别,总共对数据做了5次划分,于是得到5个二分类器
,则编码矩阵如下所示:
-1 | +1 | -1 | -1 | +1 | |
+1 | +1 | -1 | +1 | -1 | |
-1 | -1 | -1 | +1 | +1 | |
+1 | +1 | +1 | -1 | -1 |
对于测试样本,将其分别输入至这5个二分类器中会得到一个编码串[+1, +1, -1, +1, +1],用这个编码串分别与各个类别的编码求取距离(可以求欧氏距离,或海明距离),这里求取欧式距离,与其距离最小的那个类别就作为这个测试样本的最终预测类别。对于[+1, +1, -1, +1, +1],其欧氏距离如下表所示
类别 | 距离 |
2.83 | |
2.00 | |
2.83 | |
3.46 |
显然,与类别编码距离最小,因此将测试样本预测为类别
。
对于以上三种方法,代码实现以手写数字预测为例,二分类器采用LogisticRegression,如下所示:
# coding: utf-8
# In[36]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import numpy as np
from numpy import random as rd
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
# In[37]:
x, y = load_digits()["data"], load_digits()["target"]
# In[38]:
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42, test_size=0.3)
# In[39]:
class Multi_class(object):
def __init__(self, x_train, x_test, y_train, y_test):
self.x_train = x_train
self.x_test = x_test
self.y_train = y_train
self.y_test = y_test
self.class_unique = np.unique(self.y_train)
def OvO(self):
model_lst = []
for i in range(len(self.class_unique) - 1):
# i选择正例
for j in range(i + 1, len(self.class_unique)):
# j选择返例
select_index_positive = self.y_train == self.class_unique[i]
select_index_negative = self.y_train == self.class_unique[j]
y_train_ = np.concatenate([self.y_train[select_index_positive], self.y_train[select_index_negative]])
x_train_ = np.concatenate([self.x_train[select_index_positive], self.x_train[select_index_negative]], axis=0)
cls = LogisticRegression()
cls.fit(x_train_, y_train_)
model_lst.append(cls)
return model_lst
def OvR(self):
model_lst = []
for c in self.class_unique:
select_index_positive = self.y_train == c
select_index_negative = np.logical_not(select_index_positive)
y_train_ = list(self.y_train[select_index_positive]) + [-1] * np.sum(select_index_negative.astype(np.int8))
x_train_ = np.concatenate([self.x_train[select_index_positive], self.x_train[select_index_negative]], axis=0)
cls = LogisticRegression()
cls.fit(x_train_, y_train_)
model_lst.append(cls)
return model_lst
def MvM(self):
# 随机做30次划分
model_lst = []
coding_matrix = []
half_class_counts = int(len(self.class_unique) / 2)
y_1 = np.array([1] * len(self.y_train))
y_0 = np.array([-1] * len(self.y_train))
for i in range(30):
class_unique_after_shuffle = rd.permutation(self.class_unique)
positive_labels = class_unique_after_shuffle[:half_class_counts]
negative_labels = class_unique_after_shuffle[half_class_counts:]
y_train_ = []
x_train_ = []
for pl, nl in zip(positive_labels, negative_labels):
y_train_.extend(np.sum((self.y_train == pl).astype(np.int8)) * [1])
y_train_.extend(np.sum((self.y_train == nl).astype(np.int8)) * [-1])
x_train_.append(self.x_train[self.y_train == pl])
x_train_.append(self.x_train[self.y_train == nl])
x_train_ = np.concatenate(x_train_, axis=0)
cls = LogisticRegression()
cls.fit(x_train_, y_train_)
model_lst.append(cls)
for c in self.class_unique:
label = []
select_index = self.y_train == c
x_train_ = self.x_train[select_index]
for model in model_lst:
predict_result = model.predict(x_train_)
if np.sum(predict_result) > 0:
label.append(1)
else:
label.append(-1)
coding_matrix.append(label)
coding_matrix = np.array(coding_matrix)
return coding_matrix, model_lst
def test_OvO(self, model_lst):
predict_label_of_every_model = []
predict_label = []
for model in model_lst:
predict_label_of_every_model.append(model.predict(self.x_test))
predict_label_of_every_model = pd.DataFrame(predict_label_of_every_model)
for i in range(self.x_test.shape[0]):
counts = predict_label_of_every_model[i].value_counts()
counts = counts.sort_values(ascending=False)
predict_label.append(list(counts.index)[0])
accur = np.mean((np.array(predict_label) == self.y_test).astype(np.int))
print("OvO测试集准确率为%.2f%s" % (accur * 100, "%"))
def test_OvR(self, model_lst):
predict_label_of_every_model = []
for model in model_lst:
predict_label_of_every_model.append(model.predict(self.x_test))
predict_label_of_every_model = np.array(predict_label_of_every_model)
predict_label = np.max(predict_label_of_every_model, axis=0)
accur = np.mean((predict_label == self.y_test).astype(np.int8))
print("OvR测试集准确率为%.2f%s" % (accur * 100, "%"))
def test_MvM(self, coding_matrix, model_lst):
predict_label = []
predict_result = []
for model in model_lst:
result = model.predict(self.x_test)
predict_result.append(result)
predict_result = np.array(predict_result).T
for result in predict_result:
predict_label.append(self.class_unique[np.argmin(np.sqrt(np.sum(np.square(result - coding_matrix), axis=1)).ravel())])
accur = np.mean((predict_label == self.y_test).astype(np.int8))
print("MvM测试集准确率为%.2f%s" % (accur * 100, "%"))
def test(self):
print("#############OvO################")
OvO_model_lst = self.OvO()
self.test_OvO(OvO_model_lst)
print("#############OvR################")
OvR_model_lst = self.OvR()
self.test_OvR(OvR_model_lst)
print("#############MvM################")
coding_mat, MvM_model_lst = self.MvM()
self.test_MvM(coding_mat, MvM_model_lst)
# In[40]:
mc = Multi_class(x_train, x_test, y_train, y_test)
mc.test()
# In[ ]:
# In[ ]: