Pereceptron Learning Algorithm (PLA) is a binary classifier which can partition the linear separable points into two classes.
Based on the Perceptron Convergence Theorem, we have:
For any finite set of linearly separable labeled examples, the PLA will halt after a finite number of iterations.
But why and when perceptron halts?
Next, we will prove the Perceptron Convergence Theorem step by step.
Notations:
: the weight of
step
: the example point used at
step
: the perfect weight corresponding to the target function, which means
: the angle between
: the cos value of angle between
: margin, i.e. the Euclidean distance of the point
from the plane
, where
is strictly positive since all points are classified correctly.
: the minimal margin relative to the separation hyperplane
.
Assume at the step,
, then the weight
is updated by
.
So we have , and
.
Then the numerator of is:
After applying the above inequality above n times, starting from , to get
(here we get the numerator of
)
If n is large enough, then we have
Consider the denominator of ,
where
Apply the above inequality n times, we get
if n is large enough, then we get (here we get the denominator of
)
Based on the inequality of both numerator and denominator of , we get
We also know , so
and
Now we get the maximum step is less than