regularization-修正

正则化

概念:

​ 在某种应用场景之下,可能一个结果会有许多影响参数,比如综合评价一个学生的标准,就有一堆的条件,此时会出现过拟合和欠拟合两种情况。

​ 欠拟合:程序对某一个参数偏置严重,甚至只以这个参数为标准。

​ 过拟合:对每一个参数都进行贴合,即使有些参数它其实是无关紧要的(看起来很重要,实际参与运算很小的),这样就会导致函数的复杂度大大提高。

​ 而解决欠拟合和过拟合的方式就是正则化。


我们通过引入惩罚项来进行正则化,这是原损失函数
c o s t = 1 2 m ∑ ( h θ ( x i ) − y i ) 2 cost = \frac{1}{2m}\sum(h_\theta(x_i) - y_i)^2 cost=2m1(hθ(xi)yi)2
现在我们引入正则项
c o s t = 1 2 m ∑ ( h θ ( x i ) − y i ) 2 + λ ∑ θ j 2 cost = \frac{1}{2m}\sum(h_\theta(x_i) - y_i)^2+\lambda\sum\theta_j^2 cost=2m1(hθ(xi)yi)2+λθj2
正则项的目的是缩小所有特征项,这里选择保留所有的特征值,通过控制 λ \lambda λ的大小来控制曲线尽量去拟合样本数据。

λ \lambda λ越大时,惩罚也就越大,因为要保证cost最小,所以 θ \theta θ也就越小。同时 θ \theta θ在前面依旧有权重。当我们在调整 θ \theta θ时,前面的部分计算的值会有偏差,后面的也会发生变化,通过 λ \lambda λ来调控这些值,最后依然会得到一个 θ \theta θ向量,我们可以把极小的舍去,达到化简。

所以当我们将正则化引入到线性回归时,每个 θ \theta θ的偏导如下:
J ( θ 0 ) = 1 M ∑ ( h θ ( x i ) − y i ) J ( θ j ) = 1 M ∑ ( h θ ( x i ) − y i ) x j i + λ θ j J(\theta_0) = \frac{1}{M} \sum(h_\theta(x^i) - y^i)\\ J(\theta_j) = \frac{1}{M} \sum(h_\theta(x^i)-y^i)x_j^i+\lambda\theta_j J(θ0)=M1(hθ(xi)yi)J(θj)=M1(hθ(xi)yi)xji+λθj
进行梯度下降时,即 θ = θ − α J ( θ ) \theta = \theta - \alpha J(\theta) θ=θαJ(θ),有:
θ 0 = θ 0 − α 1 M ∑ ( h θ ( x i ) − y i ) θ j = θ j ( 1 − α λ M ) − α M ∑ ( h θ ( x i ) − y i ) x j i \theta_0 = \theta_0 - \alpha \frac{1}{M} \sum(h_\theta(x^i) - y^i) \\ \theta_j = \theta_j(1 - \alpha\frac{\lambda}{M}) - \frac{\alpha}{M} \sum(h_\theta(x^i)-y^i)x_j^i θ0=θ0αM1(hθ(xi)yi)θj=θj(1αMλ)Mα(hθ(xi)yi)xji


正则化实验:
加载数据:
data = np.loadtxt('ex2data2.txt', delimiter=',')
X = data[:, 0:2]
y = data[:, 2]
m, n = X.shape

可视化数据:
# plot data
def plotData(X, y):
    X_1 = X[y == 1]
    X_0 = X[y == 0]
    plt.plot(X_1[:, 0], X_1[:, 1], 'k+')
    plt.plot(X_0[:, 0], X_0[:, 1], 'yo')
    plt.xlabel('Microchip Test 1')
    plt.ylabel('Microchip Test 2')
    plt.legend(['y=1', 'y=0'])
    plt.show()


plotData(X, y)
_ = input('Press [enter] to continue')
lg-reg-data


我们要做的就是划分出决策边界,显然这里肯定就不能用一维二维来划分了,我们要推广到高维去。

初始化变量:
def mapFeature(X_1, X_2):
    degree = 6
    col = int(degree * (degree + 1) / 2 + degree + 1) # 扩展到n维将拥有的变量个数
    out = np.ones((np.size(X_1, 0), col))
    count = 1
    for i in range(1, degree + 1): # 注意要从1开始,因为0维就是1
        for j in range(i + 1):
            out[:, count] = np.power(X_1, (i - j)) * np.power(X_2, j)
            count += 1
    return out

最后的out将会拥有0~degree维的变量并通过一组theta组成线性空间

X = mapFeature(X[:, 0], X[:, 1])
initial_theta = np.zeros(np.size(X, 1), )
lamd = 1

损失函数与下降函数:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def costFunctionReg(initial_theta, X, y, lamd):
    J = 0
    grad = np.zeros(np.size(initial_theta, ))
    z = X.dot(initial_theta)
    J = 1 / m * (-y.dot(np.log(sigmoid(z))) - ((1 - y).dot(np.log(1 - sigmoid(z))))) + lamd * (initial_theta.dot(initial_theta)) / (
                2 * m)
    grad[1:] = 1 / m * ((X[:, 1:].T.dot(sigmoid(z) - y)) + lamd * initial_theta[1:])
    grad[0] = 1 / m * (X[:, 0].T.dot(sigmoid(z) - y))
    return J, grad

测试:
cost, grad = costFunctionReg(initial_theta, X, y, lamd)
print('Cost at initial theta(zeros):%f' % cost)
print('Expected cost (approx) 0.693')
print('Gradient at initial theta(zeros)- first five values only:')
print(grad[0:5])
print('Expected gradients (approx) - first five values only')
print('[0.0085 0.0188 0.0001 0.0503 0.0115]')
_ = input('Press [enter] to continue')

# compute and display cost and gradient
test_theta = np.ones(np.size(X, 1), )
cost, grad = costFunctionReg(test_theta, X, y, 10)
print('Cost at test theta(with lambda = 10):%f' % cost)
print('Expected cost (approx) 3.16')
print('Gradient at test theta(with lambda=10)- first five values only:')
print(grad[0:5])
print('Expected gradients (approx) - first five values only')
print('[0.3460 0.1614 0.1948 0.2269 0.0922]')
_ = input('Press [enter] to continue')

结果如下:

Cost at initial theta(zeros):0.693147
Expected cost (approx) 0.693
Gradient at initial theta(zeros)- first five values only:
[8.47457627e-03 1.87880932e-02 7.77711864e-05 5.03446395e-02
 1.15013308e-02]
Expected gradients (approx) - first five values only
[0.0085 0.0188 0.0001 0.0503 0.0115] # 这里第三个数据个人认为老师给错了,因为我其他的数据都对得上
Press [enter] to continue

Cost at test theta(with lambda = 10):3.206882
Expected cost (approx) 3.16
Gradient at test theta(with lambda=10)- first five values only:
[0.34604507 0.16135192 0.19479576 0.22686278 0.09218568] # 这里就完全没有问题
Expected gradients (approx) - first five values only
[0.3460 0.1614 0.1948 0.2269 0.0922]
Press [enter] to continue

梯度下降:
def costFun(initial_theta, X, y, lamd): # 为下降做准备
    return costFunctionReg(initial_theta, X, y, lamd)[0]


def gradient(initial_theta, X, y, lamd):
    return costFunctionReg(initial_theta, X, y, lamd)[1]
# Regularization and Accuracies
initial_theta = np.zeros(np.size(X, 1), )
lamd = 1
options = op.minimize(fun=costFun, x0=initial_theta, method='TNC', jac=gradient, args=(X, y, lamd))
theta = options.x

绘制边界:
def plotDecisionBoundary(theta, x, y):
    pos = np.where(y == 1)
    neg = np.where(y == 0)
    p1 = plt.scatter(x[pos, 1], x[pos, 2], marker='+', s=60, color='r')
    p2 = plt.scatter(x[neg, 1], x[neg, 2], marker='o', s=60, color='y')
    u = np.linspace(-1, 1.5, 50)
    v = np.linspace(-1, 1.5, 50)
    z = np.zeros((np.size(u, 0), np.size(v, 0)))
    for i in range(np.size(u, 0)):
        for j in range(np.size(v, 0)):
            z[i, j] = mapFeature(np.array([u[i]]), np.array([v[j]])).dot(theta)
    z = z.T
    [um, vm] = np.meshgrid(u, v)
    plt.contour(um, vm, z, levels=[0], lineWidth=2)
    plt.legend((p1, p2), ('Admitted', 'Not admitted'), loc='upper right', fontsize=8)
    plt.xlabel('Microchip Test 1')
    plt.ylabel('Microchip Test 2')
    plt.title('lambda = 1')
    plt.show()


plotDecisionBoundary(theta, X, y)
结果如下:

lg-reg-boundary


准确率测试:
def predict(theta, x):
    return np.round(sigmoid(x.dot(theta)))


p = predict(theta, X)
print('Train Accuracy: %f', np.mean(np.double(p == y)) * 100)
print('Expected accuracy (approx): 83.1')
_ = input('Press [enter] to continue')
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值