The winnow algorithm is a technique from machine learning for learning a linear classifier from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses an additive weight-update scheme, but winnow uses a multiplicative weight-update scheme that allows it to perform much better when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane. It can also be used in the online learning setting, where the learning and the classification phase are not clearly separated.
15-859(B) Machine Learning Theory 01/20/10 Online learning contd * The Winnow algorithm for disjunctions * Winnow for k-of-r functions and general LTFs in terms of L_1 margin * If time: Infinite-attribute model, string-valued features ======================================================================= WINNOW ALGORITHM ================ If you think about the problem of learning an OR-function, we saw an algorithm: "list all features and cross off bad ones on negative examples" that makes at most n mistakes. But, what if most features are irrelevant? E.g., if representing a document as vector indicating which words appear in it and which don't, then n is pretty large! What if the target is an OR of r relevant features where r is a lot smaller than n. Can we get a better bound in that case? What could we do if computation time were no object? How many bits to describe an OR of r variables, where r << n? Ans: O(r log n). So, in principle, we'd like to obtain a bound like this. Winnow will give us a bound of O(r log n) mistakes efficiently. So, this means you only have a small penalty for "throwing lots of features at the problem". In general, will say that an algorithm with only polylog dependence on n is "attribute-efficient". Winnow Algorithm: (basic version) 1. Initialize the weights