Basic introduction on an area of image processing--face recognition technique based on PCA

本文深入探讨了主成分分析(PCA)在人脸识别技术中的核心作用。PCA作为一种统计过程，通过正交转换将可能相关的变量转化为线性不相关的主成分，有助于数据集的维度缩减和特征提取。文章详细介绍了如何使用协方差矩阵进行PCA，以及协方差在衡量变量间关系中的重要性。此外，还阐述了人脸识别系统的原理，即通过比较图像样本与数据库中面部特征来识别或验证个人身份。

Basic introduction on an area of image processing--face recognition technique based on PCA

Yifan Wu(吴祎凡） Glasgow College, University of Electronic Science and Technology of China

Introduction on PCA

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called Principal Components. This transformation is defined in such a way that the first principal component has the largest possible variance. In other words, the component should account for the most variability in the data. And each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors are a set of uncorrelated orthogonal basis.
It is a useful tool in exploratory data analysis (EDA), and for making predictive models.
PCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipsoid is small, then the variance along that axis is also small, and by omitting that axis and its corresponding principal component from our representation of the dataset, we lose only a commensurately small amount of information. PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
The results of PCA depend on the scaling of the variables. A scale-invariant form of PCA has been developed.
The applicability of PCA is limited by certain assumptions made in its derivation. The other limitation is the mean-removal process before constructing the covariance matrix for PCA. In fields such as astronomy, all the signals are non-negative, and the mean-removal process will force the mean of some astrophysical exposures to be zero, which consequently creates unphysical negative fluxes, and forward modeling has to be performed to recover the true magnitude of the signals. As an alternative method, non-negative matrix factorization focusing only on the non-negative elements in the matrices, which is well-suited for astrophysical observations. See more at Relation between PCA and Non-negative Matrix Factorization.

Using covariance to do PCA

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. It is known that the mean value, standard deviation and variance can illustrate the relationship of a set of numbers.
However, the information can be obtained from these data is limited. Thus, covariance is needed to reflect the relationship between two or more variables.
Furthermore, it can be concluded that the two variables become larger or smaller at the same time, the value of covariance will be positive. In the opposite case, if one increases while the other is decreasing, the value of covariance is negative. The covariance between two jointly distributed real-valued random variables X and Y with finite second moments is defined as the expected product of their deviations from their individual expected values:
cov(X ,Y)= E[(X − E[ X ])(Y − E[Y])]

After simplifying it, the equation is changed into:
cov(X ,Y )= E[XY]− E[X ]E[Y ]
As the author have mentioned before, the nonzero result indicates the two variables
have connection, otherwise, namely, if the result equals to zero, the two variables have no relationships absolutely. The application of covariance is broad, for instance, it can be used in many practical mathematical problems, like agriculture. But if the variables in the definition of covariance is substituted by random vectors, the concept of covariance matrix is generated.

The covariance matrix

Variance is a mathematical quantity which describes how far a set of random numbers are spread out from their average value in probability theory. The disposal of a two-dimensional data need to use covariance. Because covariance is the statistical magnitude that measure the relationship between two random variables. Assume that we have a series of data. There are ‘n’ categories. In every category, there are ‘m’ samples. In this case, we can simulate it to a n*m（n，m≠1）matrix. The objective is to get the relationship between the two categories and how much they affect the final result. Due to the multidimensional data, variance in not capable of this condition. By using the method of covariance convert the irregular matrix into covariance matrix. The covariance matrix is a useful tool in many different areas. From it, a transformation matrix can be derived, called a whitening transformation, that allows one to completely decorrelation the data or, from a different point of view, to find an optimal basis for representing the data in a compact way. It is a development from scalar variable to high-dimension random vector. In covariance matrix, every factor represents the relationship of the two random variables.

Face recognition

Face recognition is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. There are multiples methods in which facial recognition systems work, but in general, they work by comparing selected facial features from given image with faces within a database. Finding features of faces is a qualitative thought. When the qualitative thought converts to quantitative method, the ideal of and eigenvectors embody in the research. In this part, one of the fundamental method of face recognition will be generally presented.
When the computer processes images, the images are treated as vectors. In simple cases, if we treat images as Grayscale images which the value of each pixel is a single sample representing only an amount of light. Each pixel can be represented by a value. All these values are lined up in a vector which could be thousands dimensional. As mentioned, finding feature is the fundamental ideal in face recognition. Therefore, establishing a facial database is the first step of facial recognition. Calculating a mean vector by using the vectors obtaining from the facial samples. The mean vector can be viewed as a mean face. After establishing the facial database, comparing other images with the standard to recognize face in the images.
However, it is hard to processing initial vectors obtaining directly by images. This is the reason that covariance matrix is important in the process. Covariance matrix depict image feature. First, calculate a mean value. Then, using the mean value to calculate the covariance matrix. The covariance matrix is easy to obtain the eigenvalue and eigenvector. Using the eigenvector, we can obtain an orthogonal basis. Then next step is principal component analysis. Find the principal eigenvalues and omit the other eigenvalues to realize data dimension reduction. The last step is comparing. If the coordinate point of the image vector after transformation above is close to the mean eigenvector, we can conclude that the image is a face. On the contrary, if the coordinate point is far from the mean eigenvector, the image is not a face.

reference:

[1]. Rice, John (2007). Mathematical Statistics and Data
Analysis. Belmont, CA: Brooks/Cole Cengage Learning. p. 138. ISBN 978-0534-39942-9.

[2]. Oxford Dictionary of Statistics,
Oxford University Press, 2002, p. 104.

[3]. NetEase Open Course, UESTC Open Course, Matrix Operation and
Face Recognition http://open.163.com/movie/2015/4/4/4/MAL6BDF3J_MALBVKD44.html

[4]. Wikipedia, Face recognition system, last edited on 2 June 2018

[5]. ShenZiheng My understanding of covariance matrix (2016.10) 优快云 blog