机器学习学习笔记 PRML Chapter 1.2 : Probability Theory

Chapter 1.2 : Probability Theory

PRML, OXford University Deep Learning Course, Machine Learning, Pattern Recognition
Christopher M. Bishop, PRML, Chapter 1 Introdcution


1. Uncertainty

A key concept in the field of pattern recognition is that of uncertainty. It arises both through noise on measurements, as well as through the finite size of data sets. Probability theory provides a consistent framework for the quantification and manipulation of uncertainty and forms one of the central foundations for pattern recognition. When combined with decision theory, discussed in Section 1.5 (see PRML), it allows us to make optimal predictions given all the information available to us, even though that information may be incomplete or ambiguous.


2. Example discussed through this chapter

We will introduce the basic concepts of probability theory by considering a simple example. Imagine we have two boxes, one red and one blue, and in the red box we have 2 apples and 6 oranges, and in the blue box we have 3 apples and 1 orange. This is illustrated in Figure 1.9.

这里写图片描述

Now suppose we randomly pick one of the boxes and from that box we randomly select an item of fruit, and having observed which sort of fruit it is we put it back in the box from which it came. We could imagine repeating this process many times. Let us suppose that in so doing we pick the red box 40% of the time and we pick the blue box 60% of the time, and that when we pick an item of fruit from a box we are equally likely to select any of the pieces of fruit in the box.

In this example, the identity of the box that will be chosen is a random variable, which we shall denote by B . This random variable can take one of two possible values, namely r (corresponding to the red box) or b (corresponding to the blue box). Similarly, the identity of the fruit is also a random variable and will be denoted by F . It can take either of the values a (for apple) or o (for orange). To begin with, we shall define the probability of an event to be the fraction of times that event occurs out of the total number of trials, in the limit that the total number of trials goes to infinity. Thus the probability of selecting the red box is 4/10 .


3. Basic Terminology

3.1 Probability densities

  • PDF, Probability Density Function: If the probability of a real-valued variable x falling in the interval (x,x+δx) is given by p(x)δx for δx0 , then p(x) is called the probability density over x.
    这里写图片描述
    and pdf p(x) must satisfy the two conditions

    p(x)p(x)dx0(1.25)=1(1.26)

  • PMF, Probability Mass Function: Note that if x is a discrete variable, then p(x) is called a probability mass function because it can be regarded as a set of “probability masses” concentrated at the allowed values of x .

  • CDF, Cumulative Distribution Function: The probability that x lies in the interval (,z) is given by the cumulative
    distribution function defined by

    P(z)=zp(x)dx(1.28)

    which satisfies P(x)=p(x) .

3.2 Expectations and covariances

  • Expectation of f(x) : the average value of some function f(x) under a probability distribution p(x) is called the expectation of f(x) and will be denoted by E[f] , shown as E[f]=xp(x)f(x) and E[f]=p(x)f(x)dx , for discrete variables and continuous variables, respectively.

  • Approximating expectation using sampling methods: if we are given a finite number N of points drawn from the pdf, then the expectation can be approximated as a finite sum over these points
    这里写图片描述

  • Expectations of functions of several variables: here we can use a subscript to indicate which variable is being averaged over, so that for instance Ex[f(x,y)] denotes the average of the function f(x,y) with respect to the distribution of x . Note that Ex[f(x,y)] will be a function of y .

  • Variance of f(x) : is defined by var[f]=E[(f(x)E[f(x)])2] , and provides a measure of how much variability there is in f(x) around its mean value E[f(x)] . Expanding out the square, we get var[f]=E[f2(x)]E2[f(x)] .

  • Variance of the variable x itself: var[x]=E[x2]E2[x] .
  • Covariance of two r.v. x and y : is defined by
    这里写图片描述

  • Covariance of two vecotrs of r.v.’s x and y : is defined by
    这里写图片描述

  • Covariance of the components of a vector x with each other: then we use a slightly simpler notation cov[x]cov[x,x] .

3.3 Joint, Marginal, Conditional Probability

In order to derive the rules of probability, consider the following example shown in Figure 1.10 involving two random variables X and Y . We shall suppose that X can take any of the values xi where i=1,...,M , and Y can take the values yj , where j=1,...,L . Consider a total of N trials in which we sample both of the variables X and Y , and let the number of such trials in which

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值