What is Likelihood in terms of continuous probability distribution?

最新推荐文章于 2023-10-31 00:22:00 发布

原创最新推荐文章于 2023-10-31 00:22:00 发布 · 988 阅读

0 ·

CC 4.0 BY-SA版权

数学研究同时被 2 个专栏收录

24 篇文章

订阅专栏

机器学习

11 篇文章

订阅专栏

本文解释了连续概率分布中似然函数的意义与用途。通过将连续空间离散化为无限小区间，使得我们可以比较不同模型产生的无限小概率，从而确定最佳模型。

People usually get confused about the meaning or purpose of a Likelihood function for a continuous probability distribution (say, Gaussian). What the hack is that? Indeed, any attempt to try to find some meaning in it will doom to failure. Why? because the Likelihood it self is meaningless!

By definition, the Likelihood is the probability to choose some model given the output data, so in discrete situation it's obvious to get it's context, but thing got quite confused in the continuous case. Given a data point x, in discrete case, we can immediately know it's probability from the distribution function. However, if x is in some continuous space, the function is actually a probability density, and by definition we know that for any given data point, the probability to generate it is 0! In other words, the Likelihood should be 0 in any cases...But how could one use a 0 function as Likelihood?

The trick is to divide the continuous space into infinitely small discrete space.So the probability is not 0, but an infinitely small amount that approximate to 0! Formally, given an enough small positive number e>0, then Pr(x) = ep(x), i.e. some very tiny amount multiple by p(x), the density at x. Although the Likelihood now is still make no sense to us, but it's not 0 anymore, it's an infinitely small amount. Recall theL'Hospital's Law, it actually allow us tocompare two infinitely small number. So, the point here is that even we don't know the exactly value of the Likelihood (i.e. an infinitely small amount), we can actually compare two Likelihoods and decide which one is better!

So, what happens when we compare two infinitely small Likelihood? We have a ration R=ep(x)/eq(x)=p(x)/q(x), this is actually whatL'Hospital's Law says. So, the point here is that to know which one's Likelihood is better, we only need to know the density functions, to compare the density is equivalent to compare the probability! In other words, since the infinitely small number e will be eliminated during the division, and we still got the right answer, that means, technically, we can ignore this term.Only the value of density function matters!

So, instead of define Likelihood as a probability, we redefine it in terms of probability density function. And notice that this definition itself is meaning less, we can't explain what is the value of a density based Likelihood function mean. But, mathematically, it's correctness for the purpose of comparison among different Likelihood functions derives directly from theL'Hospital's Law. So, now we can understand a Likelihood in general case:

Given a serial of i.d.d data points x1,x2...xN. And a family of model, distinguished by parameter t. What is the Likelihood of some model t can generate the data?

Traditionally, we use Pr(x1,x2...xN;t) = Pr(x1;t) Pr(x2;t)...Pr(xN;t) = p(x1;t)p(x2;t)...p(xN;t) e^N = 0 (e->0)

But now, since we know that only the value of density matters, we can actually define that L(x1,x2...xN|t) = p(x1;t)p(x2;t)...p(xN;t).

And we also know that the one with maximum L(x1,x2...xN|t), will also have the maximum infinitely small probability, compared with other infinitely small probabilities.