1. Generative v Discriminant
Generally speaking,
- A generative learning algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumption, which category is most likely to generate this signal?
- A discriminant analysis algorithm does not care about how the data was generated, it simple categorizes a given signal.
In statistical view, if you have input data
x
and you want to classify the data into category
- A generative learning algorithm learns the joint probability distribution P(x,y) and then uses Bayes rules to transform P(x,y) into P(y|x) for classification.
- A discriminant analysis algorithm learns the conditional probability distribution
P(y|x)
, which is the natural classification distribution for classify a given example
x
into a class
y . This is why algorithms that models this directly are called discriminant algorithm.
2. Generative learning algorithm
The main goal of generative learning algorithm is to learn the conditional distribution
P(x|y)
. Then together with prior
p(y)
, using Bayes rule, we have
And make prediction by
3. Two generative learning algorithm
2.1 Gaussian discriminant analysis
2.1.1 Core assumption of DGA
The core assumption of Gaussian discriminant analysis is the conditional distribution p(x|y) is Gaussian, which is to say that x is continuous-value.
Here we take 2-class as example. Assume
And set the prior distribution as
Now we use maximum likelihood method to estimate parameters
(ϕ,μ0,μ1,Σ)
. The log-likelihood function is
Here are the maximum likelihood estimates of those parameters
Then using Bayes rule, we get the classification rule
2.1.2 GDA V logistic
We give an alternative classification rule of GDA,
where θ=Σ−1(μ1−μ0) and
So the probability of
P(y=1|x)
can be written as
which is the exactly form of logistic function.
So far, we see that Gaussian discriminant analysis is a specific situation of logistic regression. More generally, if we assume the conditional probability of P(x|y) is an exponential distribution, then it is logistic regression.
Which is better of GDA v logistic?
GDA makes stronger modeling assumptions, and is more data efficient (i.e., requires less training data to learn “well”) when the modeling assumptions are correct or at least approximately correct. Logistic regression makes weaker assumptions, and is significantly more robust to deviations from modeling assumptions. Specifically, when the data is indeed non-Gaussian, then in the limit of large datasets, logistic regression will almost always do better than GDA. For this reason, in practice logistic regression is used more often than GDA.
2.2 Naive Bayes
2.2.1 Core assumption of NB
The core assumption of NB is that
xi
are conditionally independent given y, which says
and features of xi are discrete numbers.
Then the classification rule is
If we assume
θijk=P(xi=xij|y=k)
where
x~ij
takes
2
values and k takes 2 values. So we need to estimate
and the maximum likelihood estimates are
Finally, we can classify
x
to
2.2.2 Laplace smoothing
There is a potential danger of the algorithm describe in the last section, since it sometimes results in some
θij0=0,θij1=0
if the training data does not happen to contain any examples satisfying the condition in the numerator, which leads to a problem:
and the classification rule will be y^=argmax{0,0} . Then we do not know how to make predictions. Now we use laplace smoothing to over come this problem, which estimates
If in the multinomial classification, we use
本文对比分析了生成式与判别式学习算法的区别,并详细介绍了两种典型的生成式学习算法:高斯判别分析和朴素贝叶斯算法。通过数学公式推导了这两种算法的核心原理及其参数估计方法。
1万+

被折叠的 条评论
为什么被折叠?



