Introduction
0
- Jian Tang
- tangjianpku@gmail.com
1
- History 1950-1970 logic rules; 1980-1990 knowledge acquisition; 2010
-. machine learning
- DeepLearning⊂MachineLearning⊂ArtificialIntelligenceDeepLearning⊂MachineLearning⊂ArtificialIntelligence
- machine learning
- use statistical techniques, “learn” with data
- extract features automatically, instead of by domain experts
- learn automatically, instead of explicit programming
- Big Data-Big Computation-Big Model : Why deep learning now
- usage
- …
2Probability
Bayes’ Theorem
- p(Y|X)=p(X|Y)p(Y)p(X),p(X)=∑Yp(X|Y)p(Y)p(Y|X)=p(X|Y)p(Y)p(X),p(X)=∑Yp(X|Y)p(Y)
- posterior ∝∝ likelihood * prior
variables
- E[f] := the average value of f(X) under the distribution p(x)
- E[f]=∑xp(x)f(x)E[f]=∑xp(x)f(x)
- V[f], cov[x, y]
distributions
- binomial distribution
- Bin(m|N,μ)=(Nm)μm(1−μ)N−mBin(m|N,μ)=(Nm)μm(1−μ)N−m
- E[m]=Nμ,var[m]=Nμ(1−μ)E[m]=Nμ,var[m]=Nμ(1−μ)
multinomial variables
- x可以取k种值,x=(0,0,1,0,0,0)Tx=(0,0,1,0,0,0)T表示x取了六种中的第三种
μ=(μ1,μ2,...,μk)Tμ=(μ1,μ2,...,μk)T,对应x向量每个位置上为1的概率
从而某个特定的x出现的概率 p(x|μ)=∏k=1Kμxkkp(x|μ)=∏k=1Kμkxk (也就是μkμk)
E[x|mu]=∑xp(x|μ)x=(μ1,μ2,...,μk)T=μE[x|mu]=∑xp(x|μ)x=(μ1,μ2,...,μk)T=μ
maximum likelihood estimation
μk=mkN,mk=∑Nxnk观察值的矩阵的每列和μk=mkN,mk=∑Nxnk观察值的矩阵的每列和
gaussian univariate distribution正态分布
- multivariate gaussian distribution
- maximum likelihood estimation
- mixture of gaussians-可以模拟其他各种分布
gradient descent梯度下降
- a way to minimize an object function J(θ)J(θ)
- ηη: learning rate, which determines the size of the steps we take to reach a local minimum
- update equation: θ=θ−η∗∇θJ(θ)θ=θ−η∗∇θJ(θ)

本文介绍了机器学习的发展历程及其与人工智能的关系,并深入探讨了概率论基础知识,包括贝叶斯定理等核心概念。此外,还详细讲解了几种重要的概率分布及最大似然估计方法。

被折叠的 条评论
为什么被折叠?



