In this class, let's talk about a new large scale machine learning setting called the online learning setting. It allows us to model problems where we have a continuous flood of data coming in and we would like the algorithm to learn from that. Today, many of the largest website companies use different versions of online learning algorithms to learn from the flood of users that keep on coming to, back to the website.

Suppose you run a shipping service. Users come and ask you to help ship their package from location A to location B. Your website offers to ship the package for some asking price. Based on the price you offer to the users, the users sometimes chose to use a shipping service, that's a positive example (); sometimes they go way and do not choose to purchase your shipping service(
). So let's say that we want a learning algorithm to help us to optimize what is the asking price that we want to offer to our users.
Specifically, let's say we come up with some sort of features that capture properties of the users: the origin and destination of the package, and the price we happen to offer to them for shipping the package. What we want to do is to learn what is the probability that they will elect to ship the package using our shipping service given these features, note that the feature x also captures the price that we're asking for. So if we could estimate the chance that they'll agree to use our service for any given price, then we can try to pick a price so that they have a pretty high probability of choosing our website while simultaneously hopefully offering us a fair return, offering us a fair profit for shipping their package. We can choose logistic regression or neural network or something else. But let's start with logistic regression.

In figure-2, it shows what an online learning algorithm would do. The "repeat forever" means our website is going to keep on staying up. Occasionally, a user will come and for the user that comes, we get some pair corresponding to a user on the website. So the feature x are the origin and destination specified by the user and the price we happened to offer to them this time. And y is either 0 or 1 depending on whether or not they chose to use our shipping service. Once we get this
pair, the online learning algorithm then updates the parameters
using just this example
as
, where j=0,...n.
Note that here for online learning algorithm, I didn't write the example as , the reason is we throw the example away after we learn from that example. We never use it again.
One interesting effect of this sort of online learning algorithm is that it can adapt to changing user preferences. In particular, if over time because of changes in the economy, maybe users start to become more price sensitive and less willing to pay high prices. Or if they become less price sensitive and they're willing to pay higher prices. Or if different things become more important to users, if new types of users coming to your website. If your pool of users changes, then these updates to your parameters will just slowly adapt your parameters to whatever your latest pool of users looks like.

Let's look at another online learning example. This is an application on product search in which we want to apply the learning algorithm to give good search listings to a user. Let's say you run an online store that sells cell phones. And you have a user interface where a user can come to your website and type in a query like "Android phone 1080p camera". Suppose we have a hundred phones in our store. And because of the way our website is laid out, when a user types in a search query, we would like to find a choice of ten different phones to show what to offer to the user. What we would like to do is have a learning algorithm help us figure out what are the ten phones out of the 100 we should return the user in response to a user search query like the one here.
Here's how we can go about the problem:
For each phone and given a specific user query, we can construct a feature vector . The feature vector
may capture different properties of the phone. It may capture things like how similar the user search query is to the phones. We capture things like how many words in the user search query match the name of the phone, how many words in the query match the description of the phone and so on. So the feature
captures the properties of the phone and how similar or how well the phone matches the user query along different dimensions. What we would like to do is estimate the probability that a user will click on the link for a specific phone, because we want to show the user phones that they have high probability of clicking on in the web browser. I'll define y=1 if user clicked on the link and y=0 otherwise. The problem of learning this is actually called the problem of learning the predicted click-through rate (CTR). Just means learning the probability that the user will click on the specific link that you offer them. We can compute the probability
for each of the 100 phones, and just select 10 phones that the user is most likely to click on, this would be a pretty reasonable way to decide what ten results to show to the user.
Just to be clear, suppose every time a user does a search, we return ten results. What that will do is it actually gives us 10 pairs. So each time a user comes, you'll get 10 examples, and then use an online learning algorithm to update the parameters using essentially 10 steps. And then you can throw the data away.
Last, I'll quickly mention a few other examples.
- If you have a website and trying to decide what special offer to show the user.
- If you have a website and you're a news aggregator and you show different users different articles.
- Closely related to special offers, we have product recommendations
And in fact, if you have a collaborative filtering system, you can even imagine the system gives you additional features to feed into a logistic regression classifier to try to predict the click through rate for different products that you might recommend to a user.
<end>