Learning of Gaussian Mixture Model(GMM)
Part 1. Outline
The Gaussian mixture model has applications in image segmentation, object recognition, video analysis, etc. For any given data sample set, according to its distribution probability, the probability distribution of each sample data vector can be calculated. Then we could classify our samples by the probability distribution of each sample data vector. However, these probability distributions are mixed together. To separate the probability distribution of a single sample from it, the sample data clustering is realized, and the description of probability distribution can be achieved by utilizing the Gaussian function.
Part 2. Mathematical Principles
From the subject of probability theory, we can get the expression of multivariate Gaussian distribution:
Besides,the mixed formula of finite probability density is shown below.
Finite probability density mixed formula:
f ( x / θ ) = ∑ i = 0 k p i f i ( x / a i ) \displaystyle{f(x/\theta)=\sum\limits_{i=0}^{k}p_if_i(x/a_i)} f(x/θ)=i=0∑kpifi(x/ai)
in which ∑ i = 0 k p i = 1 \sum\limits_{i=0}^{k}p_i=1 i=0∑kpi=1 and a i = μ i ∑ i a_i=\mu_i \sum_i ai=μi∑i
Besides, f ( x ) = ∑ i = 0 k p i ψ ( x , μ i , ∑ i ) f(x)=\sum\limits_{i=0}^{k}p_i\psi(x,\mu_i, \sum_i) f(x)=i=0∑kpiψ(x,μi,∑i)
To estimate the maximum likelihood of probabilities, we need to formulate the number k of clusters and the mean of each cluster, and the covariance corresponding to a prior probability.
Thus, the estimated expectations (step E) is:
Part 3. Algorithm Steps
-
Initialization variable definition-specified number of clusters K and data dimension D
-
Initialized mean, covariance, a priori probability distribution
-
Iterative E-M steps
-E step to calculate expectation.
-M-step to update mean, covariance and prior probability distribution.
-Check whether the stop condition is reached (the maximum number of iterations and the minimum error are met), and exit the iteration if it is reached, otherwise continue the E-M step
-
Print the final classification result
Part 4. Example Programme
Plot the density estimates of the two Gaussian mixtures.
The data is generated from two Gaussians with different centers and covariance matrices.
Details can be seen in the notes of the code.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from sklearn import mixture
n_samples = 300
# generate random sample, two components
np.random.seed(0)
# generate spherical data centered on (20, 20)
shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20])
# generate zero centered stretched Gaussian data
C = np.array([[0., -0.7], [3.5, .7]])
stretched_gaussian = np.dot(np.random.randn(n_samples, 2), C)
# concatenate the two datasets into the final training set
X_train = np.vstack([shifted_gaussian, stretched_gaussian])
# fit a Gaussian Mixture Model with two components
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(X_train)
# display predicted scores by the model as a contour plot
x = np.linspace(-20., 30.)
y = np.linspace(-20., 40.)
X, Y = np.meshgrid(x, y)
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -clf.score_samples(XX)
Z = Z.reshape(X.shape)
CS = plt.contour(X, Y, Z, norm=LogNorm(vmin=1.0, vmax=1000.0),
levels=np.logspace(0, 3, 10))
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.scatter(X_train[:, 0], X_train[:, 1], .8)
plt.title('Negative log-likelihood predicted by a GMM')
plt.axis('tight')
plt.show()
result:
Part 5. Comparison between GMM and K-means.
GMM:
Step1. Calculate the responsiveness of all data to each sub-model
Step2. Calculate the parameters of each sub-model according to the responsivity
Step3. Iterate
K-means:
Step1.Calculate the distance of all data to K points, and take the closest point as the class to which it belongs.
Step2.According to the category of the previous step, divide the position of the update point (the position of the point can be regarded as the model parameter)
Step3. Iterate
It can be seen that GMM and K-means still have a lot in common.
1.The responsivity of the data in GMM to the Gaussian component is equivalent to the distance calculation in K-means, and the calculation of the Gaussian component parameters according to the responsiveness in GMM is equivalent to the position of the classification point in K-means.
2.They all achieve optimality through continuous iteration.
The difference is that the GMM model gives the probability of which Gaussian component is generated for each observation point. K-means directly gives which type of observation point belongs to.
--credit by Dora 2020.5.16
Reference
https://zhuanlan.zhihu.com/p/40991784
https://blog.youkuaiyun.com/qq_43744752/article/details/104775623
https://blog.youkuaiyun.com/livan1234/article/details/80871308
https://www.cnblogs.com/mmziscoming/p/5750849.html