Scouting:
Scouting is done by testing the classifiers in the pool using a training set T of N multidimensional data points x.
We test and rank all classifiers in the expert pool by charging a cost any
time a classifier fails(a miss), and a cost
every time a classifier provides
the right label(a success or "hit"). We require
so that misses are penalized
more heavily than hits. It might seem strange to penalize a hit with non-zero cost, but as long as the penalty of success is smaller than the penalty for a miss
everything
is fine. This kind of error function different from the usual squared Euclidian distance to the classification target is called an exponential loss function.
AdaBoost uses exponential error loss as error criterion.
The main idea of AdaBoost is to proceed systematically by extracting one classifier from the pool in each of M iterations. The drafting process concentrates in selecting new classifiers for the committee focusing on which can help with the still misclassified examples. The best team players are those which can provide new insights to the committee. Classifiers being drafted should complement each other in an optimal way.
Drafting:
In each iteration we need to rank all classifiers, so that we can select the current best out of the pool. At the m-th iteration we have already included m-1 classifiers in the committee and we want to draft the next one. The current linear combination of classifiers is
We define the total cost, or total error, of the extended classifier as the exponential loss
where are yet to be determined in an optimal
way. Since our intention is to draft
we rewrite the above expression as
for i = 1, ..., N. In the first iteration for
i=1,...,N. During later iterations, the vector
represents the weights assigned
to each data point in the training set at iteration m. We can split the above Eq into two sums
This means that the total cost is the weighted cost of all hits plus the weighted cost of all misses.
Writing the first summand as and the second as
we
simplify the notation to
Now, is the total sum W of the weights of all
data points, that is, a constant in the current iteration. The right hand side of the equation is minimized when at the m-th iteration we pick the classifier with the lowest total cost
(that
is the lowest rate of weighted error). Intuitively this makes sense, the next draftee, km, should be the one with the lowest penalty given the current set of weights.
Weighting:
That is, the pool of classifiers dose not need to be given in advance, it only needs to ideally exist.