感知机

最新推荐文章于 2022-10-31 16:11:29 发布

BlueCitizen

最新推荐文章于 2022-10-31 16:11:29 发布

阅读量496

点赞数

CC 4.0 BY-SA版权

分类专栏：机器学习

本文链接：https://blog.youkuaiyun.com/BlueCitizen/article/details/56488064

机器学习专栏收录该内容

5 篇文章

订阅专栏

本文详细介绍了感知机模型，它是一种二分类的线性分类器，适用于线性可分数据集。文章阐述了感知机的定义、学习策略，包括数据集的线性可分性和感知机学习算法，特别是原始形式与对偶形式的理论及代码实现。此外，还讨论了算法的收敛性，表明对于线性可分数据集，感知机能够找到正确的超平面。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

感知机

感知机(perceptron)是二分类的线性分类模型，其输入为实例的特征向量，输出为实例的类别，取+1和-1二值。感知机对应于输入空间中将实例划分为正负两类的分离超平面，属于判别模型。

感知机

定义

　　假设输入空间是（特征空间）是，输出空间是 y={+1，-1}。输入 x 表示特征向量，输出 y 表示实例的类别。那么由输入到输出的如下函数

f (x) = s i g n (w \cdot x + b)

$f(x) = sign(w\cdot x + b)$
称为感知机。其中，

w $w$ 和

b $b$ 为感知机模型参数，这里写图片描述

叫做权值或权值向量，这里写图片描述

叫做偏执，

w⋅x $w \cdot x$ 表示

w $w$ 和

x $x$ 的内积，

sign $sign$ 是符号函数，即

s i g n (x) = {+ 1, - 1, x > 0 x \leq 0

$sign(x)= \begin{cases} +1,&\mbox x>0\\ -1,&\mbox x\leq0 \end{cases}$
　　感知机是一种线性分类模型，属于判别模型。感知机模型的假设空间是定义在特征空间的所有线性分类模型或线性分类器，即函数集合

{f|f(x)=w⋅x+b} $\{f\,|\,f(x)=w\cdot x+b\}$ 。
　　感知机有如下几何解释：线性方程

w \cdot x + b = 0

$w\cdot x+b=0$
对应于特征空间的一个超平面

S $S$ ，其中

w $w$ 是超平main的法向量，

b $b$ 是超平面的截距。这个超平面将特征空间一份为二。

　　　　　　　　　这里写图片描述

感知机的学习策略

数据集的线性可分性

　　给定一个数据集

T = {(x 1, y 1), (x 2, y 2), . . ., (x n, y n)}

$T = \{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}$
其中，

xi∈χ=Rn $x_i\in \chi = R^{n}$ ，

yi∈γ={−1,+1} $y_i \in\gamma = \{-1,+1\}$ ，

i=1,2,...,N $i = 1,2,...,N$ 。如果存在某个超平面 S

w \cdot x + b = 0

$w\cdot x+b=0$
能够将数据集的正实例点和负实例点完全正确地划分到超平面的两侧，则称数据集 T 为线性可分数据集。否则称数据集 T 为线性不可分。

感知机学习策略

　　假设训练数据集是线性可分的，为了找到这样的超平面，需要定义一个学习策略，即定义损失函数并将损失函数极小化。
　　
　　损失函数选为误分类点到超平面的总距离。输入空间中任一点 $x_o$ 到超平面 $S$ 的距离：

1 | | w | | | w \cdot x o + b |

$\frac{1}{||w||}|\,w\cdot x_o+b\,|$
这里，

||w|| $||w||$ 为

w $w$ 的

L2 $L_2$ 范数。
　　其次，对于误分类的数据

(xi,yi) $(x_i, y_i)$ 来说，

- y i (w \cdot x i + b) > 0

$-y_i(w\cdot x_i + b)>0$
成立，因此误分类点到平面

S $S$ 的距离是：

- 1 | | w | | y i (w \cdot x i + b)

$-\frac{1}{||w||}y_i(w\cdot x_i + b)$
　　这样，假设超平面的误分类点集合为

M $M$ ，那么所有误分类点到超平面的总距离为

- 1 | | w | | \sum x i \in M y i (w \cdot x i + b)

$-\frac{1}{||w||}\sum_{x_i\in M}y_i(w\cdot x_i + b)$
不考虑

1||w|| $\frac{1}{||w||}$ ，就可以得到感知机学习的损失函数：

L (w, b) = - \sum x i \in M y i (w \cdot x i + b)

$L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i + b)$
其中

M $M$ 为误分类点的集合。这个损失函数就是感知机学习的经验风险函数。

　　感知机的学习策略就是在假设空间里选取使损失函数最小的模型参数 $w,b$ ，即感知机模型。

感知机学习算法

原始形式

理论

　　感知机学习算法是对以下最优化问题的算法：

min w, b L (w, b) = - \sum x i \in M y i (w \cdot x i + b)

$\min_{w,b}L(w,b)=-\sum_{x_i\in M}y_i(w\cdot x_i + b)$
其中，

M $M$ 为误分类点的集合。

　　感知机学习算法是误分类驱动的，具体采用随机梯度下降法（stochastic gradient descent）。首先，任意选择一个超平面 $w_o,b_o$ ，然后用梯度下降法不断地极小化目标函数，极小化过程中不是以此使 $M$ 中所有的误分类点的梯度下降，而是以此随机选择一个误分类点使其梯度下降。

　　假设误分类点集合 $M$ 是固定的，那么损失函数由

\nabla w L (w, b) = - \sum x i \in M y i x i \nabla b L (w, b) = - \sum x i \in M y i

$\begin{gather} \nabla_wL(w,b)=-\sum_{x_i\in M}y_i x_i\\ \nabla_bL(w,b)=-\sum_{x_i\in M}y_i \end{gather}$
给出。
　　随机选择一个误分类点

(xi,yi) $(x_i,y_i)$ ，对

w,b $w,b$ 进行更新：

w \leftarrow w + η y i x i b \leftarrow b + η y i

$\begin{gather} w \leftarrow w + \eta y_i x_i\\ b \leftarrow b + \eta y_i \end{gather}$
式中

η（0≤η≤1) $\eta（0\leq \eta \leq 1)$ 是步长。在统计学习中又称为学习率。这样，通过迭代可以期待损失函数不断减小，直到为0。综上所述，得到如下原始形式的感知机学习算法：
这里写图片描述

　　这种学习算法直观上有如下解释：当一个实例点被误分类，即位于分离超平面的错误一侧，则调整 $w,b$ 的值，使分离超平main向该误分类点一侧移动，以减少该误分类点与超平面的距离，直至超平面越过该误分类点使其被正确分类。

　　对于以上原始形式的学习算法，采用以下一个例子来进行python的代码实现。
这里写图片描述

代码实现

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import os

traning_set = [[3,3,1],
                [4,3,1],
                [1,1,-1]]
init_w = np.array([0,0])
init_b = 0
step = 1
history = []#record w,b

# the path to save gif
dirname = os.path.dirname(__file__)

'''
Perceptron Algorithm Class
: Method train, run
: Attr w,b,step
'''
class Perceptron(object):
    def __init__(self,step=1,init_w=np.array([]),init_b=0):
        self.w = init_w
        self.b = init_b
        self.step = step
    '''
    calculate the function distance
    : @param   item: a test data(x, y), 1d-array
    : @return  function distance, float
    '''
    def __calc(self, item):
        # yi*(w*xi + b)
        return item[-1] * (np.dot(self.w, item[:-1]) + self.b)

    '''
    update the w,b
    : @param   item: a test data(x, y), 1d-array
    : @param   i: iterate count
    : @param   count: the index of test data in traning_set
    : @return  None
    '''
    def __update(self,item, i, count):
        self.w = self.w + self.step * item[-1] * item[:-1]#w = w + eta*yi*xi
        self.b = self.b + self.step * item[-1]# b = b + eta*yi
        print("[%d][%d]: w: " % (i,count), self.w, " b: ", self.b)
        global histroy
        history.append([self.w.copy(),self.b])

    '''
    perceptron learning algorithm. Train the model to classify the traning_set data
    : @param   dataset: traning data(X,Y), 2d-array
    : @return  None
    '''
    def train(self, dataset):
        N, M = dataset.shape; M = M - 1
        if self.w.size < M:
            self.w = np.hstack(self.w, np.zeros(M - self.w.size()))
        else:
            self.w = self.w.reshape(M,)

        for i in range(1000):
            flag = False
            count = 1
            for item in dataset:
                if self.__calc(item) <= 0:
                    flag = True
                    self.__update(item,i,count)
                    count = count + 1
            if not flag:
                print("\n\nResult: w: ", self.w, " b: ", self.b)
                break
    '''
    Use the trained perception to classify new data
    : @param    data: new data(x,y), 1d-array
    : @return   classLabel: {-1,+1}
    '''
    def run(self, data):
        val = np.dot(self.w, data) + self.b
        if val > 0: return 1
        else: return -1


if __name__ == "__main__":
    perceptron = Perceptron(step,init_w,init_b)
    perceptron.train(np.array(traning_set))

    # make a gif to animate the training process
    fig = plt.figure()

    ax = plt.axes(xlim = (0,2), ylim = (0,2))
    line, = ax.plot([],[],'g', lw=2)
    label = ax.text([],[],'')

    def init():
        line.set_data([],[])
        x,y,x_,y_ = [],[],[],[]
        for item in traning_set:
            if item[-1] > 0:
                x.append(item[0])
                y.append(item[1])
            else:
                x_.append(item[0])
                y_.append(item[1])

        plt.plot(x,y,'bo',x_,y_,'rx')
        plt.axis([-6,6,-6,6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Perceptron Algorithm Exercise')
        return line, label

    def animate(i):
        global history

        w = history[i][0]
        b = history[i][1]
        if w[1] == 0: return line, label
        x1 = -7; y1 = -(b + w[0] * x1) / w[1]
        x2 = 7; y2 = -(b + w[0] * x2) / w[1]
        line.set_data([x1,x2],[y1,y2])
        x1 = 0; y1 = -(b + w[0] * x1) / w[1]
        label.set_text("(%s, %s)" % (history[i][0][:],history[i][1]))
        label.set_position([x1,y1])
        return line, label

    print(history)
    anim = animation.FuncAnimation(fig,animate,init_func=init,
        frames=len(history), interval=1000,blit=True)

    plt.show()
    anim.save(os.path.join(dirname, 'perceptron.gif'), fps=2, 
        writer='imagemagick')

    #use the perception to classify new data
    print(perceptron.run(np.array([2,2])))

这里写图片描述
　
　
　　从以上gif可以看出，分割平面被误分类点吸引从而不断向误分类点靠近。然而代码并没有按照随机的方式取误分类点，而是按照固定的顺序。假如误分类点按照随机的方式选取，该算法会因此得到不一样的结果。可见，感知机学习算法由于采用不同的初值或选取不同的误分类点，解可以不同。

算法的收敛性

　　对于线性可分的数据集感知机学习算法原始形式收敛，即经过有限次迭代可以得到一个将训练数据集完全正确划分的分离超平面及感知机模型。感知机算法在训练数据集上的误分类次数 $k$ 满足不等式

k \leq (R γ) 2

$k \leq (\frac{R}{\gamma})^2$
其中，

R=max1≤i≤N||x^i|| $R=\max_{1\leq i \leq N}||\hat x_i||$ ，

γ $\gamma$ 为所有点到分离平面最小的函数距离：

y i (w o p t \cdot x i + b o p t) \geq γ > 0

$y_i(w_{opt}\cdot x_i + b_{opt}) \geq \gamma > 0$
　　详细的证明过程参考《统计学习方法》第二章。当训练集线性不可分时，感知机学习算法不收敛，迭代结果发生震荡。

对偶形式

理论

　　对偶形式的基本想法是，将 $w$ 和 $b$ 表示为实例 $x_i$ 和标记 $y_i$ 的线性组合的形式，通过求解其系数而求得 $w$ 和 $b$ ，不失一般性，在算法中可以假设初始值 $w_0,\,b_0$ 均为0。对误分点 $(x_i,\,y_i)$ 通过

w \leftarrow w + η y i x i b \leftarrow b + η y i

$\begin{gather} w \leftarrow w + \eta y_ix_i\\ b \leftarrow b + \eta y_i \end{gather}$
逐步修改

w,b $w,\,b$ ，设修改n次，则

w,b $w,\,b$ ，关于

(xi,yi) $(x_i,\,y_i)$ 的增量分别是

αiyixi $\alpha_i y_ix_i$ 和

αiyi $\alpha_i y_i$ ，这里

α=niη $\alpha = n_i\eta$ . 这样，从学习的过程中不难看出，最后学习到的

w,b $w,\,b$ 可以分别表示为

w = \sum i = 1 N α i y i x i b = \sum i = 1 N α i y i

$\begin{gather} w = \sum_{i=1}^{N}\alpha_iy_ix_i\\ b = \sum_{i=1}^{N}\alpha_iy_i \end{gather}$ 这里，

αi≥0,i=1,2,...,N $\alpha_i \geq 0,\, i=1,2,...,N$ ，当

η=1 $\eta = 1$ 时，表示第 i 个实例点由于误分类而进行更新的次数。实例点更新次数越多，意味着它距离分离超平面越近，也就云安正确分类。换句话说，这样的实例对学习结果影响最大。下面是对偶形式感知机学习算法的描述。
这里写图片描述

代码实现

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import os

# the path to save gif
dirname = os.path.dirname(__file__)
history = []

'''
the dual perceptron learning algorithm
'''
class Perceptron(object):
    def __init__(self,step=1):
        self.alpha = []
        self.b = 0
        self.step = step
        self.gram = np.array([])
        self.w = []
    '''
    calculate the gram matrix
    : @param    X: training data point    2d-array
    : @return   None
    '''
    def __calcGram(self, X):
        N = X.shape[0]
        self.gram.resize((N,N))
        for i in range(N):
            for j in range(N):
                self.gram[i][j] = np.dot(X[i], X[j])
    '''
    calculate the function distance
    : @param    ind: the index of the selected point in training_set
    : @param    classLabels: the classLabels of training data    1d-array
    : @return   function distance    float
    '''
    def __calc(self,ind,classLabels):
        # yi*(sum(alpha_j * y_j * x_j * x_i) + b)(1 <= j <= N)
        return classLabels[ind] * (np.dot(self.alpha * classLabels, self.gram[:,ind]) + self.b)

    '''
    update the alpha and b
    : @prama    ind: the index of the selected point in training_set
    : @param    classLabels: the classLabels of training data     1d-array
    : @param    X: traning data point    2d-array
    : @return   None
    '''
    def __update(self,ind,classLabels,X):
        self.alpha[ind] = self.alpha[ind] + self.step#alpha = alpha + eta
        self.b = self.b + self.step * classLabels[ind]#b = b + y_i*eta

        ##calculate the new w to make animation
        new_w = np.zeros(self.w.shape[0])
        for k in range(classLabels.shape[0]):
            new_w += self.alpha[k] * classLabels[k] * X[k]
        self.w = new_w
        global history
        history.append((self.w.copy(), self.b, X[ind]))

    '''
    dual perceptron learing algorithm
    : @prama    dataset: the training_set(X,Y)    2d-array
    : @return   None
    '''
    def train(self,dataset):
        X = dataset[:,:-1];classLabels = dataset[:,-1]
        self.__calcGram(X)
        self.alpha = np.zeros(X.shape[0])
        self.w = np.zeros(dataset.shape[1]-1)

        for i in range(1000):
            flag = False
            count = 1
            for k in range(classLabels.shape[0]):
                if self.__calc(k,classLabels) <= 0:
                    flag = True
                    self.__update(k,classLabels,X)
                    print("alpha: ", self.alpha, "b: ", self.b)
            if not flag:
                print("\n\nResult: alpha: ", self.alpha, "b: ", self.b)
                break;



if __name__ == "__main__":
    perceptron = Perceptron(1)
    dataset = np.array([[3,3,1],[4,3,1],[1,1,-1]])
    perceptron.train(dataset)

    print("history:")
    for item in history:
        print("w: %s, b: %s, x: %s" % item)

    #make animation to animate the traning process
    fig = plt.figure()
    ax = plt.axes(xlim=(-6,6), ylim=(-6,6))
    line, = ax.plot([],[],'g-',lw=2,)
    label = ax.text([],[],'')

    def init():
        line.set_data([],[])
        x1,x2,_x1,_x2 = [],[],[],[]
        for item in dataset:
            if item[-1] > 0:
                x1.append(item[0])
                x2.append(item[1])
            else:
                _x1.append(item[0])
                _x2.append(item[0])
        plt.plot(x1,x2,'bo',_x1,_x2,'rx')
        for item in dataset:
            plt.text(item[0],item[1]+0.5,"(%s, %s)"%(item[0],item[1]), ha='center',va='bottom')
        plt.xlabel('x1');plt.ylabel('x2')
        plt.title('Perceptron Algorithm')
        plt.grid(True)

        return line,label

    def animate(i):
        global history

        w = history[i][0]
        b = history[i][1]
        if w[1] == 0: return line,
        x1 = -7; y1 = -(b + w[0] * x1) / w[1]
        x2 = 7; y2 = -(b + w[0] * x2) / w[1]
        line.set_data([x1,x2],[y1,y2])
        x1 = 0; y1 = -(b + w[0] * x1) / w[1]

        label.set_position([x1,y1])

        return line,label

    anim = animation.FuncAnimation(fig,animate,init_func=init,frames=len(history), interval=1000,blit=True)

    anim.save(os.path.join(dirname, "perceptron_2.gif"),fps=2,
        writer="imagemagick")

    plt.show()