(二) linear discriminant analysis- part I

最新推荐文章于 2023-01-08 23:25:05 发布

MyProgramingLife

最新推荐文章于 2023-01-08 23:25:05 发布

阅读量965

点赞数

分类专栏： Machine Learning

本文链接：https://blog.youkuaiyun.com/MyProgramingLife/article/details/41889585

版权

Machine Learning 专栏收录该内容

9 篇文章

订阅专栏

本文深入探讨了生成式分类器中的高斯判别分析（GDA）与线性判别分析（LDA），并对比了这两种方法与逻辑回归之间的联系与区别。通过实例解析，展示了如何使用这些技术进行分类预测，并讨论了它们在不同假设下的表现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

（一） 基础了解

产生式分类器

对于算法模型，我们学习其模型形式为：，在给定x 时学习其条件分布y，即算法尝试去直接学习由输入X到标签{0, 1}的映射关系，这是discriminative learning algorithms

举个栗子：logistic regression (回头讲，重新再看)

若我们学习模型为：（和），即如果y 表明0 是 class one, 1 表明 calss two, 然后是class one 特征分布，是class two 特征分布。在模型和作用下，算法可以表示为y 关于x 的后验分布：

在进行make a prediction时，分母不用算也能判断，因此此时有

多元正太分布（n 维）（MVN）

一般形式两个参数 mean vector ， covariance matrix

指数部分exp是数据x和均值

马氏距离（Mahalanobis distance）（后续会在这里讲解，这里不太理解）

可以表示为2d的椭圆，能够观察到等高线是等概率密度的，特征向量决定椭圆方向，特征值决定细长程度。

其中

和

可以通过参数估计（MLE）得到，如果我们有N个样本（iid）,

, 则有

(详细推到过程可参考Machine Learning A Probabilistic Perspective P99-100)（每次回看记得推导，有利于后续更深的理解）

（二）GDA(Gaussian discriminant analysis)

当我们有一个分类问题，输入特征x是连续随变量机，我们用GDA模型，其中图片2 利用多元正太分布，模型为：

分布为：

我们由MLE得 log-likelihood 有：

可以用似然估计参数，分别求导得:

编程实践，实现上述过程，有图有真相，理解更深刻。

首先读取数据，一个heightweight 分类的问题，并进行可视化（在进行分类时，最好首先可视化下，然后选择分类器）：

%% Gauss density of example

%% read the data, and get the sample x, and y labels
read_data = load('heightWeight'); 
%%struct-> data matrix
names = fieldnames(read_data);
read_data = read_data.(names{1});
%%label Y and samples X 
class_data.Y = read_data(:,1);
class_data.X = [read_data(:,2) read_data(:,3)];

%% plot the sample in different class
number_class = length(unique(class_data.Y));
male_number_index = find(class_data.Y == 1);
female_number_index = find(class_data.Y == 2);
class_number_index = {male_number_index, female_number_index};

% plot in different color and mark
figure; 
colors = 'br';
mark = 'xo';
for c = 1:number_class
    str = sprintf('%s%s',  mark(c), colors(c));
    X = class_data.X(class_number_index{c}, :);
    h = scatter(X(:, 1), X(:, 2), 100, str);   
    hold on;    
end
xlabel('height'); ylabel('weight')
title('red = female, blue=male');

利用估计的参数，进行画出其等概率估计的椭圆，其中椭圆的边界是2倍标准差的位置

%% MLE likehood the param
for c = 1:number_class
    X = class_data.X(class_number_index{c}, :);
    mu = mean(X)'; sigma = cov(X);  %get the mean vector, and covariance matrix
    
    % take the 2倍标准差 95% in the ellipse of Gaussian density
    [U, D] = eig(sigma);
    n = 100;
    t = linspace(0, 2*pi, n);
    xy = [cos(t); sin(t)];
    
    %use the X^2 是相似度拟合程度的描述, get the 95%
    k = sqrt(chi2inv(0.95, 2));
    
    %make the ellipse as the eigenvetor*sqrt(eigenvalue)
    w = (k * U * sqrt(D)) * xy;
    
    %plas the mu, make the mu as the center
    z = repmat(mu, [1, n]) + w;
    
    plot(z(1, :), z(2, :), colors(c), 'linewidth', 2);
    hold on;
    hh = plot(mu(1, :), mu(2, :), 'x')
    set(hh,'color',colors(c), 'linewidth', 2,  'markersize', 13);
end

对其进行分类，其中两类利用整个样本的cov matrix

%% make the discrimination analysis, plot the decision boundary
% the two class use the same cov matrix
figure;
for c = 1:number_class
    X = class_data.X(class_number_index{c}, :);
    mu = mean(X)';  sigma = cov(class_data.X);
    
    str = sprintf('%s%s',  mark(c), colors(c));
    h = scatter(X(:, 1), X(:, 2), 100, str);   
    hold on;    
    
    %make the contour of each class
    n = 100;
    range = myShowWindowAxis(class_data.X);
    [x, y] = meshgrid(linspace(range(1), range(2), n), linspace(range(3), range(4), n));
    [row , col] = size(x);
    contour_X = [reshape(x, row*col, 1), reshape(y, row*col, 1)];
    
    %compute the gaussian probability of the contour
    dimension = size(sigma, 2);
    contour_X = reshape(contour_X, [], dimension);
    contour_X = bsxfun(@minus, contour_X, rowvec(mu));  %rowvec is return the mu into row vector. and then every x minus the mu()
    log_exp = zeros(size(contour_X, 1), 1);
    for i = 1:size(contour_X,1)   %every sample
        log_exp(i) = (-1/2)*contour_X(i, :)*inv(sigma)*contour_X(i, :)';
    end
    log_constant = -1/2*dimension*log(2*pi) - 1/2*log(det(sigma));
    log_probability = bsxfun(@plus, log_exp, log_constant);
    probability = exp(log_probability);
    gaussian_prob{c} = reshape(probability, row, col);
    
    contour(x, y, gaussian_prob{c}, colors(c));
end
xlabel('height'); ylabel('weight')
%plot the decision boundry
[cc, decision_line] = contour(x, y, gaussian_prob{1} - gaussian_prob{2}, [0, 0], '-k');
set(decision_line,'linewidth',3);  %make the boundary more width
title('tied covariance')

最终其误分率计算如下：error_rate = 0.1238 (12.38%)

%% compute the error rate
for c = 1:number_class
    X = class_data.X(class_number_index{c}, :);
    mu = mean(X)';  sigma = cov(class_data.X);
    
    full_X = bsxfun(@minus, class_data.X, rowvec(mu));
    log_exp = zeros(size(full_X, 1), 1);
    for i = 1:size(full_X,1)   %every sample
        log_exp(i) = (-1/2)*full_X(i, :)*inv(sigma)*full_X(i, :)';
    end
    log_constant = -1/2*dimension*log(2*pi) - 1/2*log(det(sigma));
    log_probability = bsxfun(@plus, log_exp, log_constant);
    gaussian_prob_error{c} = exp(log_probability);
end
gaussian_error_male = gaussian_prob_error{1} - gaussian_prob_error{2};
gaussian_male = gaussian_error_male(male_number_index);
error_index_male = find(gaussian_male < 0);
gaussian_error_female = gaussian_prob_error{2} - gaussian_prob_error{1};
gaussian_female = gaussian_error_female(female_number_index);
error_index_female = find(gaussian_female < 0);
error_rate = (length(error_index_male) + length(error_index_female))/size(gaussian_error_male, 1);
fprintf('the error_rate is %d', error_rate);

GDA and logistic regression

GDA 与 logistic 回归有很有趣的关系，用

来说明关于x 的函数变换

，

即可以利用 logistic regression 来进行分类判断，当

> 0,.5 ，属于类 y=1，否则另一类。（这个和GDA就有的聊了。）

GDA 的predict 是stronger modeling, 是假设这些数据是高斯模型，这样GDA会有很好的结果，这是他比logistic regression效果好。

但如果我们进行weaker modeling 假设，这是logistic regression 会更加鲁棒，并且less sensitive 对正确的模型，因为许多集合的假设都可化为logistic regression 的形式，

如：

这时 p（y|x）也是logistic 形式。所以这时 logistic regression 效果更好。

结论：因而在实际应用中 logistic regression 应用更加广泛。

（三） Linear Discriminant Analysis (LDA)

前面的GDA 分类，其实就是使用的LDA，这里在系统说下。

现在有特征样本集X，类标签Y.

假设先验概率(or marginal pmf) 对类k 是

，其中

的估计与朴素贝叶斯相同。

然后利用贝叶斯（MAP估计），以及化简得到：

类密度估计方法

依赖于你所用的算法，你将会有不同的类密度估计。

在LDA中，我们假设每类的密度是高斯分布。这样我们估计概率密度时就有方法可以选择：

1)在LDA中，我们不同的类共用同一个cov matrix, 在QDA 二次判别分析中，每类分别用自己估计的cov matrix,

2)Mixtrures of Gaussians:

单高斯在类密度中可能不够充分，因此我们在类中估计是利用混合高斯来估计，

3)General Nonparametric Density Estimates: 比如核估计

4)Naive Bayes: 假设每类的密度是有边缘（marginal）密度的,全部变量是独立分布的。

优化分类

现在我们已经有了估计到的参数cov matrix,贝叶斯方法告诉我们选择一个根据特征X计算得到的最大后验概率，我们取log,得到最大的后验，进而化为高斯分布的协方差矩阵相加在分别乘以其先验概率。即

又因为：

化简得：

这是最终的分类器公式。通常类别比较少，通常两类。

LDA给我们一个线性边界，因为二次项被削掉了。

这里定义线性判别函数：

得到决定边界在class k and l满足：

即相当于

例如在二分类实际问题中，让k = 1, l = 2, 我们定义长常数

，并且

和

是两类的先验概率，

和

是两类的均值。

分类结果为：

在参数估计的时候，对于cov matrix, 可以除以N-K 得到无偏估计:

问题说明

这里LDA我们做了强假设，假设协方差矩阵在不同类中一样，并且密度分布是高斯模型，但是如果密度函数违反了那，分类会很坏。

下面的图片（a） ,是假设正确的，两类有一样的协方差矩阵，并且其分布也是高斯分布；图片（b）违反了LDA的所有假设，他们的类不是a single Gaussian distribution，而是一个混合双高斯分布模型，整个模型是一个混合四高斯分布模型，并且有不同的协方差矩阵。

如果我们用LDA进行边界判断，非常接近理想判别线，（对于理想分类线，是用贝叶斯法则利用正确的模型算出的）

对另外的栗子：

对图(C)红点仍然包含两个混合高斯分布，蓝点类是分离的，如果我们仍用LDA判别，结果非常糟糕；在画图（d)的分类线时，每类的密度估计是用两个混合高斯模型进行估计的，利用的是贝叶斯法则，结果是两条曲线，分离的结果提高了很多。

结论：因此在进行判别分类之前，最好先观察图中点的分布，在进行算法的选取。

LDA扩展基应用

事实上，利用LDA也可得到二次分类边界如果进行基扩展。

现在我们有两维

和

，我们可以增加输入的特征包括

，

和

。这样我们又引入了三维。

我们能用五维的数据作为输入向量：

，真正的只有两维，通过引入这些非线性的基，我们可以假设数据是五维的。

因而用LDA得到mean vector:

从而得到五维的cov matrix:

分类方程将会变为：

可得到分类结果：

其中虚曲线是QDA分类结果，虚直线是LDA两维分类结果，实直线是LDA扩展基分类结果（通过上述方程蓝点是大于零时得到的）。

这个结果比前两个虚线要好，起码在训练数据中是这样的。

(四)总结

1.LDA在分类时，应该根据其点列的分布来进行估计算法的选择，会得到更好的效果

2.其他的估计方法的尝试比较，分类，这样在后续会补全的。

Reference

1.Linear Discriminant Analysis - Part I ：https://onlinecourses.science.psu.edu/stat557/node/35

2.Machine Learning A Probabilistic Perspective

3.Generative Learning algorithms - Andrew Ng