start CalTech machine learning, video 6
theory of generalization
8:21 2014-09-24
we're going to bound the growth function
by a polynomial
8:37 2014-09-24
B(N, k) // I give you N points, and k is the break point
8:40 2014-09-24
you still make an upperbound statement
8:43 2014-09-24
recursive bound on B(N, k)
8:53 2014-09-24
the upperbound of growth function
9:17 2014-09-24
for a given hypotheses set, the break point k
is a fixed
9:37 2014-09-24
NN == Neural Network
9:38 2014-09-24
the break point for the neural network is 17
9:38 2014-09-24
2D perceptron
9:44 2014-09-24
outline:
* Proof that mH(N) is polynomial
* Proof that mH(N) can replace M
9:46 2014-09-24
How does the growth function mH(N) relate to overlap?
9:50 2014-09-24
space of data sets
9:52 2014-09-24
for the 1st hypothesis, you get this bad region
9:54 2014-09-24
union bound
9:54 2014-09-24
union bound => VC bound
9:55 2014-09-24
now they're overlapping, the total area,
which is the bad area, is a small fraction
of the whole thing, and I can learn.
9:56 2014-09-24
how is the growth function going to characterize
the overlaps?
9:57 2014-09-24
what the growth function tells you is that,
9:58 2014-09-24
if you take a dichotomy, it's not the full hypotheses,
but hypothese on a finite set of points
9:59 2014-09-24
what to do about Eout?
10:01 2014-09-24
instead of picking one sample, I'm going to
pick two samples independently.
they're coming fromt the same distribution.
10:02 2014-09-24
does Ein(h) tracks Ein'(h)?
each of them tracks Eout(h).
10:03 2014-09-24
so the mathematical ramfication is:
if you characterize using two samples,
then I'm completely in the realm of dichotomies.
10:07 2014-09-24
because now I'm not appealing to Eout(h) any more.
I'm only appealing to what happens in the sample
10:08 2014-09-24
It's a bigger sample, I now have 2 N samples instead
of N samples.
10:08 2014-09-24
now the characterization is full.
I'm ready to go.
10:08 2014-09-24
these are the only two component you
need to worry about as you read the proof.
10:09 2014-09-24
Not quite, but rather, because I have two
sample now.
10:10 2014-09-24
now we have a polynomial, a bigger polynomial,
but can do the job we want.
10:13 2014-09-24
but the basic message is that:
here is a statement holds true for any hypothese
sets that have a break point.
10:15 2014-09-24
you will eventually learn. Ein(h) tracks Eout(h) correctly.
10:15 2014-09-24
The Vapnik-Chervonenkis Inequality
10:16 2014-09-24
if you have a break point, it guarantees the learning.
10:39 2014-09-24
for this hypothesis set over the input space, what
is the break point?
10:41 2014-09-24
How much resource do you need for learning?
10:43 2014-09-24
traing data, real data
10:49 2014-09-24
* Ein(h) to track Eout(h)
* try to minimize Ein(h)
10:50 2014-09-24
VC inequality
10:50 2014-09-24
N(N, k): maximum number of dichotomies on N points,
with break point k
10:56 2014-09-24
what is the maximum number of dichotomies you can
get without any other constraints?
B(N, k) // use this to bound mH(N)(the growth function)
theory of generalization
8:21 2014-09-24
we're going to bound the growth function
by a polynomial
8:37 2014-09-24
B(N, k) // I give you N points, and k is the break point
8:40 2014-09-24
you still make an upperbound statement
8:43 2014-09-24
recursive bound on B(N, k)
8:53 2014-09-24
the upperbound of growth function
9:17 2014-09-24
for a given hypotheses set, the break point k
is a fixed
9:37 2014-09-24
NN == Neural Network
9:38 2014-09-24
the break point for the neural network is 17
9:38 2014-09-24
2D perceptron
9:44 2014-09-24
outline:
* Proof that mH(N) is polynomial
* Proof that mH(N) can replace M
9:46 2014-09-24
How does the growth function mH(N) relate to overlap?
9:50 2014-09-24
space of data sets
9:52 2014-09-24
for the 1st hypothesis, you get this bad region
9:54 2014-09-24
union bound
9:54 2014-09-24
union bound => VC bound
9:55 2014-09-24
now they're overlapping, the total area,
which is the bad area, is a small fraction
of the whole thing, and I can learn.
9:56 2014-09-24
how is the growth function going to characterize
the overlaps?
9:57 2014-09-24
what the growth function tells you is that,
9:58 2014-09-24
if you take a dichotomy, it's not the full hypotheses,
but hypothese on a finite set of points
9:59 2014-09-24
what to do about Eout?
10:01 2014-09-24
instead of picking one sample, I'm going to
pick two samples independently.
they're coming fromt the same distribution.
10:02 2014-09-24
does Ein(h) tracks Ein'(h)?
each of them tracks Eout(h).
10:03 2014-09-24
so the mathematical ramfication is:
if you characterize using two samples,
then I'm completely in the realm of dichotomies.
10:07 2014-09-24
because now I'm not appealing to Eout(h) any more.
I'm only appealing to what happens in the sample
10:08 2014-09-24
It's a bigger sample, I now have 2 N samples instead
of N samples.
10:08 2014-09-24
now the characterization is full.
I'm ready to go.
10:08 2014-09-24
these are the only two component you
need to worry about as you read the proof.
10:09 2014-09-24
Not quite, but rather, because I have two
sample now.
10:10 2014-09-24
now we have a polynomial, a bigger polynomial,
but can do the job we want.
10:13 2014-09-24
but the basic message is that:
here is a statement holds true for any hypothese
sets that have a break point.
10:15 2014-09-24
you will eventually learn. Ein(h) tracks Eout(h) correctly.
10:15 2014-09-24
The Vapnik-Chervonenkis Inequality
10:16 2014-09-24
if you have a break point, it guarantees the learning.
10:39 2014-09-24
for this hypothesis set over the input space, what
is the break point?
10:41 2014-09-24
How much resource do you need for learning?
10:43 2014-09-24
traing data, real data
10:49 2014-09-24
* Ein(h) to track Eout(h)
* try to minimize Ein(h)
10:50 2014-09-24
VC inequality
10:50 2014-09-24
N(N, k): maximum number of dichotomies on N points,
with break point k
10:56 2014-09-24
what is the maximum number of dichotomies you can
get without any other constraints?
B(N, k) // use this to bound mH(N)(the growth function)