机器学习基石---第二周PLA

最新推荐文章于 2021-11-07 16:52:31 发布

维格堂406小队

最新推荐文章于 2021-11-07 16:52:31 发布

阅读量677

点赞数 1

CC 4.0 BY-SA版权

分类专栏： ★★★机器学习 # ★★台大机器学习

本文链接：https://blog.youkuaiyun.com/wendaomudong_l2d4/article/details/78843169

★★★机器学习同时被 2 个专栏收录

62 篇文章

订阅专栏

★★台大机器学习

13 篇文章

订阅专栏

本文解析了机器学习中感知机学习算法（PLA）的工作原理及其实现过程，包括变量定义、迭代更新规则及其背后的数学证明，并通过实例演示了算法如何逐步找到最优分割超平面。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

knitr::opts_chunk$set(echo = TRUE)

台大《机器学习基石》第二周课的笔记，只整理部分重要内容。希望能把课上学的，做一个精简的记录。

变量说明

存在两类数据，标记为 $y$ ，取值为 ${-1,1}$ 。特征向量记为 $x$ ， $x=(x_0,x_1,x_2,...,x_d)$ 。其中 $x_0$ 为常量1，其余为具体特征值。存在超平面 $w^Tx=0$ ，其中 $w=(w_0,w_1,...,w_d)$ ，可以正确分开两类数据。共有 $N$ 个样本数据。

迭代过程

PLA采取知错就改的策略。遍历所有样本，如果发现分类错误，采用如下方式如下方式更新 $w$

F o r t = 0, 1, . . . N 1. f i n d a m i s t a k e o f w t c a l l e d (x n (t), y n (t)) s i g n (w T t x n (t)) \neq y n (t) 2. (t r y t o) c o r r e c t t h e m i s t a k e b y w t + 1 \leftarrow w t + y n (t) x n (t) . . . u n t i l n o m o r e m i s t a k e s r e t u r n l a s t w (c a l l e d w P L A) a s g

$\begin{array}{l} For\;t = 0,1,...N\\ 1.\;find\;a\;mistake\;of\;{w_t}\;called\;\left( {{x_{n(t)}},{y_{n(t)}}} \right)\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;sign(w_t^T{x_{n(t)}}) \ne {y_{n(t)}}\\ 2.\;\left( {try\;to} \right)\;correct\;the\;mistake\;by\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{w_{t + 1}} \leftarrow \;{w_t} + \;{y_{n(t)}}{x_{n(t)}}\\ ...\;until\;no\;more\;mistakes\\ return\;last\;w\left( {called\;{w_{PLA}}} \right)\;as\;g \end{array}$

更新理由

这里写图片描述
判断类别的公式：

s i g n (w T t x n (t)) = s i g n (∥ ∥ w T t ∥ ∥ ∥ ∥ x n (t) ∥ ∥ cos (θ))

$sign(w_t^T{x_{n(t)}}) = sign(\left\| {w_t^T} \right\|\left\| {{x_{n(t)}}} \right\|\cos \left( \theta \right))$
如果正类被误判，则

cos(θ)<0 $cos(\theta)<0$ ，即

θ∈(π2,π) $\theta \in (\frac{\pi }{2},\pi )$ ，所以要缩小法向量和特征向量之间的夹角。故采用上图方法迭代

w $w$ 的值。

证明

证明线性可分数据集，PLA算法一定能够经过有限次的迭代，得到一个完美的分割超平面。

每一次迭代 $w_t$ 更接近 $w_f$
1. $w_f$ 为完美分类器
2. $(x_n,y_n)$ 为错分的样本
3. $(x_{n(t)},y_{n(t)})$ 为第t次迭代时， $w_t$ 错分的样本
因为 $w_f$ 是完美分类器，则一定有：

$y n (t) w T f x n (t) \geq m i n n y n w T f x n > 0$ $y_{n(t)}w_f^Tx_{n(t)}\geq \mathop{min}_ny_nw_f^Tx_n> 0$
利用任意一个错判样本 $(x_{n(t)},y_{n(t)})$ 进行第 $t+1$ 次迭代之后,计算:

$w T f w t + 1 ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥ = w T f ( w t + y n ( t ) x n ( t ) ) ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥ = w T f w t + y n ( t ) w T f x n ( t ) ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥ \geq w T f w t + min n y n ( t ) w T f x n ( t ) ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥ > w T f w t + 0 ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥ = w T f w t ∥ ∥ w T f ∥ ∥ ∥ w t + 1 ∥$ $\begin{array}{l} \frac{{w_f^T{w_{t + 1}}}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}} = \frac{{w_f^T({w_t} + {y_{n(t)}}{x_{n(t)}})}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \frac{{w_f^T{w_t} + {y_{n(t)}}w_f^T{x_{n(t)}}}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \ge \frac{{w_f^T{w_t} + \mathop {\min }\limits_n {y_{n(t)}}w_f^T{x_{n(t)}}}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; > \frac{{w_f^T{w_t} + 0}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}} = \frac{{w_f^T{w_t}}}{{\left\| {w_f^T} \right\|\left\| {{w_{t + 1}}} \right\|}} \end{array}$
从余弦相似度的角度看，通过错判样本对 $w_t$ 的修正，使得迭代后的 $w$ 更接近于完美的分割超平面。

每一次迭代 $w_t$ 的模增长较小
$∥ w t + 1 ∥ 2 = ∥ ∥ w t + y n (t) x n (t) ∥ ∥ 2 = ∥ w t ∥ 2 + 2 y n (t) w T t x n (t) + ∥ ∥ y n (t) x n (t) ∥ ∥ 2 \leq ∥ w t ∥ 2 + 0 + ∥ ∥ y n (t) x n (t) ∥ ∥ 2 \leq ∥ w t ∥ 2 + max n ∥ y n x n ∥ 2$ $\begin{array}{l} {\left\| {{w_{t + 1}}} \right\|^2}\; = \;\;{\left\| {{w_t} + {y_{n(t)}}{x_{n(t)}}} \right\|^2}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\; = \;\;{\left\| {{w_t}} \right\|^2} + 2{y_{n(t)}}w_t^T{x_{n(t)}} + {\left\| {{y_{n(t)}}{x_{n(t)}}} \right\|^2}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\; \le \;\;{\left\| {{w_t}} \right\|^2} + 0 + {\left\| {{y_{n(t)}}{x_{n(t)}}} \right\|^2}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\; \le \;\;{\left\| {{w_t}} \right\|^2} + \mathop {\max }\limits_n {\left\| {{y_n}{x_n}} \right\|^2}\; \end{array}$

迭代次数有限

假设 $w_0=0$ ,经过 $T$ 次迭代之后:

w T f w T ∥ ∥ w f ∥ ∥ ∥ w T ∥ = w T f ( w T - 1 + y n ( T - 1 ) x n ( T - 1 ) ) ∥ ∥ w f ∥ ∥ ∥ w T ∥ = w T f ( w T - 1 + y n ( T - 1 ) x n ( T - 1 ) ) ∥ ∥ w f ∥ ∥ ∥ w T ∥ = w T f w T - 1 + y n ( T - 1 ) w T f x n ( T - 1 ) ∥ ∥ w f ∥ ∥ ∥ w T ∥ \geq w T f w T - 1 + min n y n w T f x n ∥ ∥ w f ∥ ∥ ∥ w T ∥ \geq w T f w T - 2 + y n ( T - 2 ) w T f x n ( T - 2 ) + min n y n w T f x n ∥ ∥ w f ∥ ∥ ∥ w T ∥ \geq w T f w T - 2 + 2 min n y n w T f x n ∥ ∥ w f ∥ ∥ ∥ w T ∥ \dots \geq T min n y n w T f x n ∥ ∥ w f ∥ ∥ ∥ w T ∥ F u r t h e r : w T f w T \geq T min n y n w T f x n T \leq w T f w T min n y n w T f x n T 2 \leq ( w T f w T ) 2 ( min n y n w T f x n ) 2 = ∥ ∥ w f ∥ ∥ 2 ∥ w T ∥ 2 sin 2 ( θ ) ( min n y n w T f x n ) 2 \leq ∥ ∥ w f ∥ ∥ 2 ∥ w T ∥ 2 ( min n y n w T f x n ) 2 \leq ∥ ∥ w f ∥ ∥ 2 * max ∥ y n x n ∥ n 2 ( min n y n w T f x n ) 2 = ∥ ∥ w f ∥ ∥ 2 * max ∥ x n ∥ n 2 ( min n y n w T f x n ) 2

$\begin{array}{l} \frac{{w_f^T{w_T}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\;\;\; = \;\;\;\frac{{w_f^T({w_{T - 1}} + {y_{n(T - 1)}}{x_{n(T - 1)}})}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{ = }}\;\;\;\frac{{w_f^T({w_{T - 1}} + {y_{n(T - 1)}}{x_{n(T - 1)}})}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{ = }}\;\;\;\frac{{w_f^T{w_{T - 1}} + {y_{n(T - 1)}}w_f^T{x_{n(T - 1)}}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \ge \;\;\;\frac{{w_f^T{w_{T - 1}} + \mathop {\min }\limits_n {y_n}w_f^T{x_n}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \ge \;\;\;\frac{{w_f^T{w_{T - 2}} + {y_{n(T - 2)}}w_f^T{x_{n(T - 2)}} + \mathop {\min }\limits_n {y_n}w_f^T{x_n}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \ge \;\;\;\frac{{w_f^T{w_{T - 2}} + 2\mathop {\min }\limits_n {y_n}w_f^T{x_n}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \cdots \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \ge \;\;\;\frac{{T\mathop {\min }\limits_n {y_n}w_f^T{x_n}}}{{\left\| {{w_f}} \right\|\left\| {{w_T}} \right\|}}\\ {\rm{Further}}:\\ w_f^T{w_T}\; \ge \;\;T\mathop {\min }\limits_n {y_n}w_f^T{x_n}\\ T\;\;\;\; \le \;\;\frac{{w_f^T{w_T}\;}}{{\mathop {\min }\limits_n {y_n}w_f^T{x_n}}}\\ {T^{\rm{2}}}\;\; \le \;\;\frac{{{{(w_f^T{w_T})}^2}\;}}{{{{(\mathop {\min }\limits_n {y_n}w_f^T{x_n})}^{\rm{2}}}}}\; = \;\frac{{{{\left\| {{w_f}} \right\|}^2}{{\left\| {{w_T}} \right\|}^2}{{\sin }^2}(\theta )}}{{{{(\mathop {\min }\limits_n {y_n}w_f^T{x_n})}^{\rm{2}}}}}\\ \;\;\;\;\;\; \le \;\;\frac{{{{\left\| {{w_f}} \right\|}^2}{{\left\| {{w_T}} \right\|}^2}}}{{{{(\mathop {\min }\limits_n {y_n}w_f^T{x_n})}^{\rm{2}}}}}\\ \;\;\;\;\;\; \le \;\;\frac{{{{\left\| {{w_f}} \right\|}^2}*{{\mathop {\max \left\| {{y_n}{x_n}} \right\|}\limits_n }^2}}}{{{{(\mathop {\min }\limits_n {y_n}w_f^T{x_n})}^{\rm{2}}}}}\; = \;\frac{{{{\left\| {{w_f}} \right\|}^2}*{{\mathop {\max \left\| {{x_n}} \right\|}\limits_n }^2}}}{{{{(\mathop {\min }\limits_n {y_n}w_f^T{x_n})}^{\rm{2}}}}} \end{array}$
所以迭代次数

T有上界。

案例

构造数据集

构造数据集，验证算法。

x11 <- 1:10
x21 <- x11 + runif(10, 0, 1) + 3
x22 <- x11 - runif(10, 0, 1)
example_data <- data.frame(x1 = rep(x11, 2),
x2 = c(x21, x22),
label = rep(c(1, -1), each = 10))
example_data$label <- as.factor(example_data$label)
library(ggplot2)
ggplot(data = example_data, aes(
x = x1,
y = x2,
color = label,
shape = label
)) +
geom_point()

这里写图片描述

PLA算法

## 参数:数据集、标签名称

PLA_f <- function(dataset, label) {
  ## 样本数
  row_num <-  nrow(dataset)
  w <- rep(1, ncol(dataset))
  w0 <- matrix(w, 1, 3, byrow = T)
  real_label <- as.numeric(as.vector(dataset[, label]))
  feature_matrix <-
    as.matrix(data.frame(x0 = rep(1, row_num), cbind(dataset[, setdiff(colnames(dataset), label)])))
  i <- 1
  j <- 0
  while (i < row_num & j == 0) {
    i <- 1
    j <- 0
    for (i in 1:row_num) {
      ## 判断是否有误判
      if (as.vector(feature_matrix[i,] %*% t(w0)) * real_label[i] <= 0) {
        ## 存在误判，修正w0
        w0 <- w0 + real_label[i] * feature_matrix[i,]
        w <- c(w, w0)
        j <- 1
      }
      if(j == 1){
        j <- 0
        i <- row_num-1
        break()}
    }
  }
  w_data <- data.frame(matrix(w,ncol=ncol(dataset),byrow = TRUE))
  colnames(w_data) <- paste0("x",0:(ncol(feature_matrix)-1))
  w_data <- dplyr::mutate(w_data,
                          slope = -x1 / x2,
                          intercept = -x0 / x2)
  return(w_data)
}

求解

w_data <- PLA_f(dataset = example_data, label = "label")
w_data

   x0 x1           x2        slope    intercept
1   1  1  1.000000000   -1.0000000   -1.0000000
2   0  0  0.495471116    0.0000000    0.0000000
3  -1 -1 -0.009057768 -110.4024725 -110.4024725
4   0  0  4.912654036    0.0000000    0.0000000
5  -1 -1  4.408125152    0.2268538    0.2268538
6  -2 -2  3.903596268    0.5123481    0.5123481
7  -3 -4  1.915120282    2.0886417    1.5664812
8  -2 -1  8.363856425    0.1195621    0.2391241
9  -3 -2  7.859327541    0.2544747    0.3817120
10 -4 -4  5.870851555    0.6813322    0.6813322
11 -5 -9  1.747566727    5.1500179    2.8611211
12 -4 -8  6.669278532    1.1995300    0.5997650

动图

library(animation)
## 指定ImageMagic目录位置，注意是magick.exe，之前版本貌似一致是convert.exe
ani.options(convert = "D:/ImageMagic/ImageMagick-7.0.7-Q16/magick.exe")
saveGIF(
  expr = {
    library(ggplot2)
    for (i in 1:nrow(w_data)) {plot(
      x = example_data$x1[1:10],
      y = example_data$x2[1:10],
      pch = 15,
      col = "red",
      xlim = c(0, 20),
      ylim = c(0, 15),
      xlab = "x1",
      ylab = "x2",main = paste0("Picture",i)
    )
      lines(x = example_data$x1[11:20],
            y = example_data$x2[11:20],
            type = "p",
            pch = 17,
            col = "blue")
      abline(coef=c(w_data$intercept[i],w_data$slope[i]),lwd=2)
      }
  },
  ## GIF文件名，注意文件后缀名要加上
  movie.name = "PLA.gif",
  ## 时间间隔
  interval = 1,
  ## 图形设置
  ani.width = 600,
  ani.height = 600,
  ## 文件输出在当前目录
  outdir = getwd()
)