10-Deep Unsupervised Network

最新推荐文章于 2023-12-26 08:50:44 发布

原创最新推荐文章于 2023-12-26 08:50:44 发布 · 266 阅读

0 ·

CC 4.0 BY-SA版权

Academic Note 专栏收录该内容

9 篇文章

订阅专栏

本文探讨了线性自编码器的工作原理及其与奇异值分解(SVD)的关系，说明了如何通过随机梯度下降来识别主要子空间，并介绍了压缩表示的概念。此外，还讨论了L1范数在确保稀疏性方面的作用，以及自编码器在降噪方面的潜力。文中进一步解释了Kullback-Leibler散度的概念及其在贝叶斯视角下的应用，并概述了转置卷积的基本概念。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Linear Autoencoder

the optimal linear autocoder: projection to the principal subspace, is same as the result from SVD

decoder is the transpose of encoder

If the data points are too much, then SVD maybe not suitable to be carried out, the computational complexity is too much.

decoder and encoder is unique up to invertible transformation, A is an arbitrary regular matrix.

Decoder and encoder are not necessarily aligh with the eigen-basis, but do span the data into some sub-principal basis.

The true difference is between by spanning principal subspace by some basises versus having the actual orthogonal basis, often the eigenbasis.

THe true is to identify the principal subspace by Linear Autoencoder by stochastic gradient descent.

Find the compressive representation.

L1 norm can be applied to guarantee the sparisity, since L1 is tend to be sparse.

Autoencoder may also have the the power to denoise, since the signal may lie on some manifold while the white noise is not, thus the autoencoder can identify this signal subspace which is robust under noise.

Exercise

KL Divergence: also called relative entropy, is a measure of how one probability distribution diverge from the second, expected probability distribution.

In bayesian view, $D_{KL}(P \| Q)$ : Q is the prior distribution which is used to approximate the true distribution P.

Definition: $D_{KL}(P \| Q)$ is the expectation of logarithmic difference between P and Q, while the expectation is taken using probability of P.

D K L (P ‖ Q) = \int + \infty - \infty p (x) l o g p ( x ) q ( x ) d x

$D_{KL}(P \| Q) = \int_{-\infty}^{+\infty}p(x)log\frac{p(x)}{q(x)}\, dx$

Transposed Convolution: In contrast with the many-to-one operation in convolution, transposed convolution establish a many-to-one operation.
Tutorial

ELBO: also called variational lower bound

Problem set-up: Given observations X(training data), we want to estimate some latent variable Z (distribution with some parameters like mean or covariance.)
Goal & Challenge: we want to find $P(Z|X)$ (the true posterior distribution), but often it’s impossible, thus instead we estimate $P_{\theta}(Z)$ with some parameter $\theta$ to approximate $P(Z|X)$
KL divergence: The difference between the true and approximation probability distribution is measured by KL divergence
Representation of log-likelihood of training data: $l o g P (x) = L + D K L (q θ (Z) ‖ p (z | x))$ $log P(x) = L + D_{KL}(q_{\theta}(Z) \| p(z|x))$ where $q_{\theta}(Z)$ is the approximation, while $p(z|x)$ is the true posterior distribution.
L is the ELBO
Note that KL divergence is always bigger than 0.