(一).单变量线性回归及梯度下降

  1. 定义:只有一个特征(输入)变量的问题称为单变量线性回归问题,可以表示为:hθ(x)=θ0+θ1(x)h_{\theta}(x)=\theta_{0}+\theta_{1}(x)hθ(x)=θ0+θ1(x),其中θi,i=1,2\theta_{i},i=1,2θi,i=1,2是待定参数。

为了找出最好的直线来拟合给定数据,我们需要定义代价函数(Cost function)

  1. Cost function:
    J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2J\left(\theta_{0}, \theta_{1}\right)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}J(θ0,θ1)=2m1i=1m(hθ(x(i))y(i))2
    其中,m是训练样本数量。
    我们想要使模型所预测的值与训练集中实际值之间的差距,也就是让cost function取最小值。
    我们的目标是找出参数θi,i=1,2\theta_{i},i=1,2θi,i=1,2使得J(θ0,θ1)J\left(\theta_{0}, \theta_{1}\right)J(θ0,θ1)最小。
  2. 梯度下降是一个用来求函数最小值的算法,我们将使用梯度下降算法来求出代价函数J(θ0,θ1)J\left(\theta_{0}, \theta_{1}\right)J(θ0,θ1)的最小值。
    其思想是:开始时我们随机选择一个参数的组合(θ0,θ1,……,θn)\left(\theta_{0}, \theta_{1}, \ldots \ldots, \theta_{n}\right)(θ0,θ1,,θn),计算代价函数,然后我们寻找下一个能让代价函数值下降最多的参数组合。我们持续这么做直到找到一个局部最小值(local minimum),因为我们并没有尝试完所有的参数组合,所以不能确定我们得到的局部最小值是否便是全局最小值(global minimum),选择不同的初始参数组合,可能会找到不同的局部最小值。
    梯度下降算法:循环直至收敛
    θj:=θj−α∂∂θjJ(θ0,θ1)forj=1andj=0\theta_{j}:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J\left(\theta_{0}, \theta_{1}\right)\quad for j=1 and j=0θj:=θjαθjJ(θ0,θ1)forj=1andj=0
    线性回归模型:
    hθ(x)=θ0+θ1(x)h_{\theta}(x)=\theta_{0}+\theta_{1}(x)hθ(x)=θ0+θ1(x)
    J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2J\left(\theta_{0}, \theta_{1}\right)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}J(θ0,θ1)=2m1i=1m(hθ(x(i))y(i))2
    线性回归问题运用梯度下降法,关键在于求出代价函数的导数,即:∂∂θjJ(θ0,θ1)=∂∂θj12m∑i=1m(hθ(x(i))−y(i))2\frac{\partial}{\partial \theta_{j}} J\left(\theta_{0}, \theta_{1}\right)=\frac{\partial}{\partial \theta_{j}} \frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}θjJ(θ0,θ1)=θj2m1i=1m(hθ(x(i))y(i))2
    j=0 时 :∂∂θ0J(θ0,θ1)=1m∑i=1m(hθ(x(i))−y(i))j=1 时 :∂∂θ1J(θ0,θ1)=1m∑i=1m((hθ(x(i))−y(i))⋅x(i))\begin{array}{l} j=0 \text { 时 }: \frac{\partial}{\partial \theta_{0}} J\left(\theta_{0}, \theta_{1}\right)=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \\ j=1 \text { 时 }: \quad \frac{\partial}{\partial \theta_{1}} J\left(\theta_{0}, \theta_{1}\right)=\frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x^{(i)}\right) \end{array}j=0  :θ0J(θ0,θ1)=m1i=1m(hθ(x(i))y(i))j=1  :θ1J(θ0,θ1)=m1i=1m((hθ(x(i))y(i))x(i))
    则算法改写成: 循环直至收敛
    θ0:=θ0−a1m∑i=1m(hθ(x(i))−y(i))θ1:=θ1−a1m∑i=1m((hθ(x(i))−y(i))⋅x(i))\begin{aligned} &\theta_{0}:=\theta_{0}-a \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)\\ &\theta_{1}:=\theta_{1}-a \frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x^{(i)}\right) \end{aligned}θ0:=θ0am1i=1m(hθ(x(i))y(i))θ1:=θ1am1i=1m((hθ(x(i))y(i))x(i))

参考文献:黄海广博士的机器学习个人笔记完整版v5.5

以下是简单的Python代码示例,使用numpy库来实现单变量和多变量线性回归的代价函数以及梯度下降算法,以及逻辑回归的逻辑损失(Sigmoid交叉熵)作为代价函数和梯度下降: 1. **单变量线性回归(Linear Regression - Cost Function and Gradient Descent)** ```python import numpy as np def linear_regression_cost(x, y, w): m = len(y) h = np.dot(x, w) J = (1/(2*m)) * np.sum((h - y)**2) return J def gradient_descent(x, y, initial_w, learning_rate, num_iters): m = x.shape[0] w = initial_w for _ in range(num_iters): h = np.dot(x, w) dw = (1/m) * np.dot(x.T, (h-y)) w -= learning_rate * dw return w ``` 2. **多变量线性回归(Multiple Linear Regression - Cost Function and Gradient Descent)** ```python def multivariate_linear_regression_cost(X, Y, W): m = Y.shape[0] h = np.dot(X, W) J = (1/(2*m)) * np.sum(np.square(h - Y)) return J def multivariate_gradient_descent(X, Y, initial_W, learning_rate, num_iters): m = X.shape[0] W = initial_W for _ in range(num_iters): h = np.dot(X, W) dw = (1/m) * np.dot(X.T, (h-Y)) W -= learning_rate * dw return W ``` 3. **逻辑回归(Logistic Regression - Cost Function and Gradient Descent using Sigmoid Cross Entropy)** ```python import numpy as np def sigmoid(z): return 1 / (1 + np.exp(-z)) def logistic_regression_cost(X, y, w): m = y.size z = np.dot(X, w) hypothesis = sigmoid(z) J = (-y.T @ np.log(hypothesis) - (1 - y).T @ np.log(1 - hypothesis)).mean() return J def logistic_regression_grad_descent(X, y, initial_w, learning_rate, num_iters): m = X.shape[0] w = initial_w for _ in range(num_iters): z = np.dot(X, w) hypothesis = sigmoid(z) dw = (1/m) * np.dot(X.T, (hypothesis - y)) w -= learning_rate * dw return w ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值