Square Loss Function in Frequentist and Bayesian View

最新推荐文章于 2023-02-28 12:42:08 发布

Haiyun_Jin

最新推荐文章于 2023-02-28 12:42:08 发布

阅读量254

点赞数

CC 4.0 BY-SA版权

文章标签：统计贝叶斯

本文链接：https://blog.youkuaiyun.com/weixin_32334291/article/details/87835568

本文探讨了在频率派和贝叶斯视角下，平方损失函数的不同处理方式。对于频率派，损失函数可以分解为均方误差（MSE），包括方差和偏差平方。而在贝叶斯视角中，我们处理的是参数θ的随机性，计算后验期望，同样得到损失函数的表达式。两个视角下的处理方法不同，但都揭示了损失函数的关键组成部分。

Suppose we have $X1,....,X2∼N(θ,σ02)X_1,....,X_2 \sim N(\theta, \sigma_0^2)$
Loss Function: Square Loss
$L(δ(x⃗)−θ)=(δ(x⃗)−θ)2L(\delta(\vec{x}) - \theta) = (\delta(\vec{x})-\theta)^2$
The parameter you want to estimate is $θ\theta$ .

Under frequentist perspective the Rist Function can be written as:

$EX(R(δ(x⃗),θ))=EX(δ(x⃗)−θ)2E_X(R(\delta(\vec{x}), \theta)) = E_X(\delta(\vec{x})- \theta)^2$ (here we take expectation with respect to X)
is equivalent to MSE, so we can decompose it into variance + bias^2:
$\begin{aligned} MSE = E_X(\delta(\vec{x})- \theta)^2 & =E_X(\delta(\vec{x}) - E(\delta(\vec{x})) + E(\delta(\vec{x})) - \theta)^2 \\ & = E_X(\delta(\vec{x}) - E(\delta(\vec{x})) )^2 + [E_X(E(\delta(\vec{x})) - \theta)]^2\\ & =Var(\delta(\vec{x})) + [ E(\delta(\vec{x})) - \theta)]^2\\ &=Var(\delta(\vec{x})) + Bias^2 \end{aligned}$
The above frequentist way shows that the random variable is the statistic $δ(x⃗)\delta(\vec{x})$ . And finally we can find its corresponding variance and bias then the computation will be finished.

Under Bayesian perspective the Posterior Expectation can be written as:
$\begin{aligned} E_{\theta|X}[\delta(\vec{x}) - \theta]^2 & = E_{\theta|X}[\theta - E_{\theta|X}(\theta) + E_{\theta|X}(\theta)+\delta(\vec{x})]^2\\ &=E_{\theta|X}[\theta - E_{\theta|X}(\theta)]^2 + [E_{\theta|X}[E_{\theta|X}(\theta)-\delta(\vec{x})]]^2\\ &=Var_{\theta|X}(\theta) + [E_{\theta|X}(\theta)-\delta(\vec{x})]^2 \end{aligned}$
The above bayesian way shows that this time, we the theta will be treated as random variable.Then we can find its corresponding posterior mean and variance the computation is done.
Here I just want to mention that in different scenario, the manipulation is different. In frequentist way, since we treat statistic $δ(x⃗)\delta(\vec{x})$ as the random variable, we need to subtract $E(δ(x)⃗)E(\delta\vec{(x)})$ . In the other hand, in the Bayesian way, we should subtract $Eθ∣X(θ)E_{\theta|X}(\theta)$