四.信息论基础知识

1.自信息

对一个随机变量 X X X进行编码,概率分布为 P ( x ) P(x) P(x),自信息 I ( x ) I(x) I(x)表示了 X = x X=x X=x时的信息量:
I ( x ) = − l o g P ( x ) I(x)=-logP(x) I(x)=logP(x)

2.熵

熵衡量了随机变量的平均信息量,即自信息的数学期望:
H ( X ) = E x ( I ( X ) ) = − ∑ x ∈ X P ( x ) log ⁡ P ( x ) H(X)=E_{x}(I(X))=-\sum_{x\in X}P(x)\log P(x) H(X)=Ex(I(X))=xXP(x)logP(x)
由上述公式可知,信息越不确定,熵越大。即熵衡量了信息的混乱程度,信息越混乱,熵越大。
对于一个确定的信息,即发生概率为1或0时,熵为0;如果自变量的概率分布是均匀分布,熵最大。

3.联合熵和条件熵

离散随机变量 X , Y X,Y X,Y的联合概率分布为 P ( x , y ) P(x,y) P(x,y),则其联合熵为:
H ( X , Y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log ⁡ P ( x , y ) H(X,Y)=-\sum_{x\in X}\sum_{y\in Y} P(x,y)\log P(x,y) H(X,Y)=xXyYP(x,y)logP(x,y)
条件熵衡量了已知 Y Y Y的条件下, X X X的不确定程度:
H ( X ∣ Y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log ⁡ P ( x ∣ y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log ⁡ P ( x , y ) P ( y ) = H ( X , Y ) − H ( Y ) \begin{aligned} H(X|Y)&=-\sum_{x\in X}\sum_{y\in Y} P(x,y)\log P(x|y)\\ &=-\sum_{x\in X}\sum_{y\in Y} P(x,y)\log \frac{P(x,y)}{P(y)} \\ &=H(X,Y)-H(Y) \end{aligned} H(XY)=xXyYP(x,y)logP(xy)=xXyYP(x,y)logP(y)P(x,y)=H(X,Y)H(Y)

4.互信息

互信息衡量了已知一个变量的条件下,另一个变量的不确定性减少的程度:
I ( X , Y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log ⁡ P ( x , y ) P ( x ) P ( y ) I(X,Y)=-\sum_{x\in X}\sum_{y\in Y}P(x,y)\log \frac{P(x,y)}{P(x)P(y)} I(X,Y)=xXyYP(x,y)logP(x)P(y)P(x,y)
如果 X X X Y Y Y相互独立,即 X X X不对 Y Y Y提供任何信息,反之亦然,则它们的互信息为零。因此,互信息也可以表示为:
I ( X , Y ) = H ( X ) − H ( X ∣ Y ) = H ( Y ) − H ( Y ∣ X ) I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X) I(X,Y)=H(X)H(XY)=H(Y)H(YX)

5.交叉熵

两个概率分布, p ( x ) p(x) p(x)为真实分布, q ( x ) q(x) q(x)为非真实分布,如果用 q ( x ) q(x) q(x)来表示 p ( x ) p(x) p(x)的平均编码长度,则为交叉熵:
H ( p , q ) = E p ( − log ⁡ q ) = − ∑ x p ( x ) log ⁡ q ( x ) H(p,q)=E_{p}(-\log q)=-\sum_{x}p(x)\log q(x) H(p,q)=Ep(logq)=xp(x)logq(x)
在给定 p p p的情况下,如果 q q q p p p越接近,它们的交叉熵越小;反之,交叉熵越大。

6.相对熵(KL散度)

相对熵衡量了用非真实概率 q ( x ) q(x) q(x)来近似真实概率 p ( x ) p(x) p(x)时所造成的的信息损失量:
D K L ( p ∣ ∣ q ) = H ( p , q ) − H ( p ) = − ∑ x p ( x ) log ⁡ p ( x ) q ( x ) D_{KL} (p||q)=H(p,q)-H(p)=-\sum_{x}p(x)\log \frac{p(x)}{q(x)} DKL(pq)=H(p,q)H(p)=xp(x)logq(x)p(x)
K L KL KL散度衡量了两个概率分布之间的距离,它是非负的,当p=q时, D K L ( p ∣ ∣ q ) = 0 D_{KL} (p||q)=0 DKL(pq)=0。但是它是不对称的。

7.JS散度

J S JS JS散度是一种对称的衡量两个分布相似度的度量方式,定义为 :
D J S ( p ∣ ∣ q ) = 1 2 D K L ( p ∣ ∣ m ) + 1 2 D K L ( q ∣ ∣ m ) m = 1 2 ( p + q ) D_{JS}(p||q)=\frac{1}{2} D_{KL}(p||m) +\frac{1}{2} D_{KL}(q||m) \\ m=\frac{1}{2} (p+q) DJS(pq)=21DKL(pm)+21DKL(qm)m=21(p+q)

CONTENTS Contents v Preface to the Second Edition xv Preface to the First Edition xvii Acknowledgments for the Second Edition xxi Acknowledgments for the First Edition xxiii 1 Introduction and Preview 1.1 Preview of the Book 2 Entropy, Relative Entropy, and Mutual Information 2.1 Entropy 2.2 Joint Entropy and Conditional Entropy 2.3 Relative Entropy and Mutual Information 2.4 Relationship Between Entropy and Mutual Information 2.5 Chain Rules for Entropy, Relative Entropy,and Mutual Information 2.6 Jensen’s Inequality and Its Consequences 2.7 Log Sum Inequality and Its Applications 2.8 Data-Processing Inequality 2.9 Sufficient Statistics 2.10 Fano’s Inequality Summary Problems Historical Notes v vi CONTENTS 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of the AEP: Data Compression 3.3 High-Probability Sets and the Typical Set Summary Problems Historical Notes 4 Entropy Rates of a Stochastic Process 4.1 Markov Chains 4.2 Entropy Rate 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph 4.4 Second Law of Thermodynamics 4.5 Functions of Markov Chains Summary Problems Historical Notes 5 Data Compression 5.1 Examples of Codes 5.2 Kraft Inequality 5.3 Optimal Codes 5.4 Bounds on the Optimal Code Length 5.5 Kraft Inequality for Uniquely Decodable Codes 5.6 Huffman Codes 5.7 Some Comments on Huffman Codes 5.8 Optimality of Huffman Codes 5.9 Shannon–Fano–Elias Coding 5.10 Competitive Optimality of the Shannon Code 5.11 Generation of Discrete Distributions from Fair Coins Summary Problems Historical Notes CONTENTS vii 6 Gambling and Data Compression 6.1 The Horse Race 159 6.2 Gambling and Side Information 164 6.3 Dependent Horse Races and Entropy Rate 166 6.4 The Entropy of English 168 6.5 Data Compression and Gambling 171 6.6 Gambling Estimate of the Entropy of English 173 Summary 175 Problems 176 Historical Notes 182 7 Channel Capacity 183 7.1 Examples of Channel Capacity 1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值