Chapter 1.2 : Probability Theory
PRML, OXford University Deep Learning Course, Machine Learning, Pattern Recognition
Christopher M. Bishop, PRML, Chapter 1 Introdcution
- Chapter 12 Probability Theory
- Uncertainty
- Example discussed through this chapter
- Basic Terminology
- An Important Interpretation of Bayes Theorem
- Bayesian Probability
- Maximum-likelihood Estimation MLE for a univariate Gaussian Case
- Curve fitting re-visited
- Bayesian Curve fitting
- Curve fitting为例子演示三种方法 See Ref-1
- References
1. Uncertainty
A key concept in the field of pattern recognition is that of uncertainty. It arises both through noise on measurements, as well as through the finite size of data sets. Probability theory provides a consistent framework for the quantification and manipulation of uncertainty and forms one of the central foundations for pattern recognition. When combined with decision theory, discussed in Section 1.5 (see PRML), it allows us to make optimal predictions given all the information available to us, even though that information may be incomplete or ambiguous.
2. Example discussed through this chapter
We will introduce the basic concepts of probability theory by considering a simple example. Imagine we have two boxes, one red and one blue, and in the red box we have 2 apples and 6 oranges, and in the blue box we have 3 apples and 1 orange. This is illustrated in Figure 1.9.
Now suppose we randomly pick one of the boxes and from that box we randomly select an item of fruit, and having observed which sort of fruit it is we put it back in the box from which it came. We could imagine repeating this process many times. Let us suppose that in so doing we pick the red box 40% of the time and we pick the blue box 60% of the time, and that when we pick an item of fruit from a box we are equally likely to select any of the pieces of fruit in the box.
In this example, the identity of the box that will be chosen is a random variable, which we shall denote by B . This random variable can take one of two possible values, namely
3. Basic Terminology
3.1 Probability densities
PDF, Probability Density Function: If the probability of a real-valued variable x falling in the interval
(x,x+δx) is given by p(x)δx for δx→0 , then p(x) is called the probability density over x.
and pdf p(x) must satisfy the two conditionsp(x)∫∞−∞p(x)dx≥0(1.25)=1(1.26)PMF, Probability Mass Function: Note that if x is a discrete variable, then
p(x) is called a probability mass function because it can be regarded as a set of “probability masses” concentrated at the allowed values of x .- CDF, Cumulative Distribution Function: The probability that
x lies in the interval (−∞,z) is given by the cumulative
distribution function defined byP(z)=∫z∞p(x)dx(1.28)
which satisfies P′(x)=p(x) .
3.2 Expectations and covariances
Expectation of f(x) : the average value of some function f(x) under a probability distribution p(x) is called the expectation of f(x) and will be denoted by E[f] , shown as E[f]=∑xp(x)f(x) and E[f]=∫p(x)f(x)dx , for discrete variables and continuous variables, respectively.
Approximating expectation using sampling methods: if we are given a finite number N of points drawn from the pdf, then the expectation can be approximated as a finite sum over these points
Expectations of functions of several variables: here we can use a subscript to indicate which variable is being averaged over, so that for instance
Ex[f(x,y)] denotes the average of the function f(x,y) with respect to the distribution of x . Note thatEx[f(x,y)] will be a function of y .Variance of
f(x) : is defined by var[f]=E[(f(x)−E[f(x)])2] , and provides a measure of how much variability there is in f(x) around its mean value E[f(x)] . Expanding out the square, we get var[f]=E[f2(x)]−E2[f(x)] .- Variance of the variable x itself: var[x]=E[x2]−E2[x] .
Covariance of two r.v. x and
y : is defined by
Covariance of two vecotrs of r.v.’s x and y : is defined by
- Covariance of the components of a vector x with each other: then we use a slightly simpler notation cov[x]≡cov[x,x] .
3.3 Joint, Marginal, Conditional Probability
In order to derive the rules of probability, consider the following example shown in Figure 1.10 involving two random variables X and