Perceptron
Perceptron is a basic model for neural network and SVM(support vector machine). It is just a linear classification often used in the separated two-class data.
1. Primal problem
(i) Distance between a point and a hyperplane
- Since
x0
is on the hyperplane, it satisfies
w⋅x0+b=0
(∥w∥2=1)
.
b=−w⋅x0. - Then project vector
x1−x0
on
w
, which gives the distance between
x1 and hyperplane w⋅x+b=0 ,
w⋅(x1−x0)=w⋅x1−w⋅x0=w⋅x1+b - Note that if
x1
is on the other side of the hyperplane, the distance will be in the form of
w⋅(x0−x1)=w⋅x0−w⋅x1=−(w⋅x1+b)
(ii) Perceptron model
Assume that we have a data set, and or . In addition, the data set can be separated by a hyperplane. Then we use the distances between misclassification data to the hyperplane as loss function.
(iii) Algorithm of perceptron
To minimize the loss function
stochastic gradient descent method is used.
Note that the derivatives of are
- Suppose now the parameters are
- Select a data which is misclassifized by hyperplane , then update parameters
(iv) Convergence of algorithm
If we denote and , the hyperplace can be rewritten as
Theorem: Let , and there exists a constant satisfies for all (since the data set are separated). Then the maximum steps to be used in the stochastic gradient descent satisfies
Proof:(1) Combining the updating of
we have
(2) Recall than in the updating form , the data are misclassifized by hyperplane , which says
Then
(3) Combining the two results from (1) and (2), we have
So
2. Dual form
We can use the dual form to eliminate the computations, which means using to represent .
In last section, we see the updating form of stochastic gradient descent is and . If we denote the number of updating each data in the algorithm as , we have
So we can rewrite the model as
We also use stochastic gradient descent method to update .
- Denote now
- Select a misclassifized data , which satisfies
then update as