DW李宏毅机器学习Task2——Regression_w = argminwl(w).-优快云博客

本文链接：https://blog.youkuaiyun.com/momokofly/article/details/120307016

回归的应用：
（1）股票市场的预测
在这里插入图片描述

（2）自动驾驶车
在这里插入图片描述

（3）推荐系统
在这里插入图片描述
应用例子：预测进化后的宝可梦CP值

Step 1:Model

在这里插入图片描述

Linear model： $\sum{}^{}w_ix_i$
$x_i$ ：an attribute of input $x$ (feature)
$w_i$ ：weight
$b$ ：bias

Step 2:Goodness of Function

Source:https://www.openintro.org/stat/data/?data=pokemon
Training Data：10 pokemons
$(x^1,\hat y^1)、(x^2,\hat y^2)、...、(x^{10},\hat y^{10})$
在这里插入图片描述
定义Loss function L:
$L(f)=l(w,b)=\sum_{n=1}^{10}(\hat y^n-(b+w*x_{cp}^n))^2$
Input：a function
output：how bad it is

Pick the “Best” Function
$f^*=\displaystyle \argmin_f L(f)$
$w^*,b^* =\displaystyle \argmin_{w,b} L(w,b)=\displaystyle \argmin_{w,b}\sum_{n=1}^{10}(\hat y^n-(b+w*x_{cp}^n))^2$

Step 3:Gradient Descent：

只要函数可微，可以找到最优的参数
Consider loss function $L (w)$ with one parameter $w$ ：
$w^*=\displaystyle \argmin_wL(w)$
在这里插入图片描述

(Randomly) Pick an initial value $w^0$
compute $\frac{dL}{dw}|_{w=w^0}$ ， $w^1=w^0-k\frac{dL}{dw}|_{w=w^0}$
$k$ ：learning rate
compute $\frac{dL}{dw}|_{w=w^1}$ ， $w^2=w^1-k\frac{dL}{dw}|_{w=w^1}$
…Many iteration
以上是对于一个参数，对于两个参数类似

下图中越偏蓝色Loss越小

In linear regression,the loss function L is convex (no local optimal)
Formulation $L$ 对 $w$ 和 $b$ 的偏微分值

对10只pokemon进行线性回归

Average Error on Training Data= $\displaystyle \sum_{n=1}^{10}e_n$
但我们关心的是：What is the error on new data(testing data)？
testing data的误差平均值大于training data的平均误差，可能这个模型不太符合
Select another model： $y=b+w_1*x_{cp}+w_2*(x_{cp})^2$

$y=b+w_1*x_{cp}+w_2*(x_{cp})^2+w_3*(x_{cp})^3$

Overfitting：A more complex model does not always lead to better performance on testing data

What are the hidden factors?考虑物种区别

Back to step 1:Redesign the model

在这里插入图片描述
要写成线性模型，引入示性函数，如：

还可以考虑更加复杂的模型…但很可能发生overfitting

Back to step 2:Regularization

$y=b+\sum w_ix_i$
$L=\displaystyle \sum_n(\hat y^n-(b+\sum w _ix_i))^2+k\sum(w_i)^2$
The functions with smaller $w_i$ are better

回归demo

x_data = [338,333,328,207,226,25,179,60,208,606]
y_data = [640,633,619,393,428,27,193,66,226,1591]
#ydata = b + w * xdata
b=-120 #initial b
w=-4 #initial w
lr=0.0000001 #learning rate
iteration = 100000
#Store initial values for plotting
b_history = [b]
w_history = [w]

for i in range(iteration):
	b_grad = 0.0
	w_grad = 0.0
	for n in range(len(x_data)):
		b_grad = b_grad-2.0*(y_data[n]-b-w*x_data[n]*1.0)
		w_grad = w_grad-2.0*(y_data[n]-b-w*x_data[n])*x_data[n]
	b=b-lr*b_grad
	w=w-lr*w_grad

plt.contourf(x,y,z,50,alpha = 0.5,cmap=plt.get_cmap('jet'))
plt.plot([-188.4],[2.67],'x',ms=12,markeredgewidth=3,color='orange')
plt.plot(b_history,w_history,'o-',ms=3,lw=1.5,color='black')
plt.xlim=(-200,200)
plt.ylim=(-5,5)
plt.xlabel(r'$b$',fontsize=16)
plt.ylabel(r'$w$',fontsize=16)
plt.show()