Probabilistic Graphical Modeling概率图模型学习笔记
0. learning materials
1. Introduction
A great amout of problems we have are to model the real world with some kind of functions (the most direct example is estimation fitting problems, and further our deep learning, machine learning algorithms are mostly fitting the real problem with a function).
- But most of the time, the measurement involves a significant amont of uncertainty (in another work “error”). As a result, our measurements are actually following a probability distribution. This introduces the Probability Theory.
- And the measurements are dependenting on each other, and sometimes we cannot find the exact expression of these relationship, but we know it exists, and we can know some properties of their relation, by prior knowledges. And this introduces the Graph modeling.
As a result, Probabilistic Graphical Modeling is concived to solve such kinds of questions.
There are three main elements in PGM :
- Representation : How to specify a model ? Normally, we have Bayesian network for a Directed Acyclic Graph; and Markov Random Field for a Undirected graph representation. (And of course, we have other models)
- Inference : How to ask the model questions ? For example, the Marginal inference telling the probability of a given variable, when we sum over every other variables. And Maximum a posterior inference to tell the most likely assignment of variables.
- Learning : How to fit a model to real-world data ? Inference and learning have a special link. Inference is a key to learning.
2. Representation
2.1 Bayesian network
It is a directed acyclic graph, and it can deal with variables with causality (the variables, which have directed relationship).
p ( x 1 , x 2 , x 3 , . . , x n ) = p ( x 1 ) p ( x 2 ∣ x 1 ) . . . p ( x n ∣ x n − 1 , . . . , x 2 , x 1 ) p(x_{1}, x_{2}, x_{3},..,x_{n}) = p(x_{1})p(x_{2} | x_{1}) ...p(x_{n}|x_{n-1}, ...,x_{2},x_{1}) p(x1,x2,x3,..,xn)=p(x1)p(x2∣x1)...p(xn∣xn−1,...,x2,x1)
Based on these relationship, a directed graph could be built, and further the probabilty expression could be formed. That the variable only depends on some of the ancestors A i A_{i} Ai.
p ( x i ∣ x