10-Deep Unsupervised Network

本文探讨了线性自编码器的工作原理及其与奇异值分解(SVD)的关系,说明了如何通过随机梯度下降来识别主要子空间,并介绍了压缩表示的概念。此外,还讨论了L1范数在确保稀疏性方面的作用,以及自编码器在降噪方面的潜力。文中进一步解释了Kullback-Leibler散度的概念及其在贝叶斯视角下的应用,并概述了转置卷积的基本概念。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Linear Autoencoder

the optimal linear autocoder: projection to the principal subspace, is same as the result from SVD

decoder is the transpose of encoder

If the data points are too much, then SVD maybe not suitable to be carried out, the computational complexity is too much.

decoder and encoder is unique up to invertible transformation, A is an arbitrary regular matrix.

Decoder and encoder are not necessarily aligh with the eigen-basis, but do span the data into some sub-principal basis.

The true difference is between by spanning principal subspace by some basises versus having the actual orthogonal basis, often the eigenbasis.

THe true is to identify the principal subspace by Linear Autoencoder by stochastic gradient descent.

Find the compressive representation.

L1 norm can be applied to guarantee the sparisity, since L1 is tend to be sparse.

Autoencoder may also have the the power to denoise, since the signal may lie on some manifold while the white noise is not, thus the autoencoder can identify this signal subspace which is robust under noise.

Exercise

KL Divergence: also called relative entropy, is a measure of how one probability distribution diverge from the second, expected probability distribution.

In bayesian view, DKL(PQ) D K L ( P ‖ Q ) : Q is the prior distribution which is used to approximate the true distribution P.

Definition: DKL(PQ) D K L ( P ‖ Q ) is the expectation of logarithmic difference between P and Q, while the expectation is taken using probability of P.

DKL(PQ)=+p(x)logp(x)q(x)dx D K L ( P ‖ Q ) = ∫ − ∞ + ∞ p ( x ) l o g p ( x ) q ( x ) d x

Transposed Convolution: In contrast with the many-to-one operation in convolution, transposed convolution establish a many-to-one operation.
Tutorial

ELBO: also called variational lower bound

  • Problem set-up: Given observations X(training data), we want to estimate some latent variable Z (distribution with some parameters like mean or covariance.)
  • Goal & Challenge: we want to find P(Z|X) P ( Z | X ) (the true posterior distribution), but often it’s impossible, thus instead we estimate Pθ(Z) P θ ( Z ) with some parameter θ θ to approximate P(Z|X) P ( Z | X )
  • KL divergence: The difference between the true and approximation probability distribution is measured by KL divergence
  • Representation of log-likelihood of training data:
    logP(x)=L+DKL(qθ(Z)p(z|x)) l o g P ( x ) = L + D K L ( q θ ( Z ) ‖ p ( z | x ) )
    where qθ(Z) q θ ( Z ) is the approximation, while p(z|x) p ( z | x ) is the true posterior distribution.
  • L is the ELBO
  • Note that KL divergence is always bigger than 0.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值