题目
Lasso回归是一种线性回归模型,其目标函数为:
min
w
1
2
n
∑
i
=
1
n
(
y
i
−
w
T
x
i
)
2
+
λ
∥
w
∥
1
\min_{w} \frac{1}{2n} \sum_{i=1}^{n} (y_i - w^T x_i)^2 + \lambda \|w\|_1
wmin2n1i=1∑n(yi−wTxi)2+λ∥w∥1
其中,
w
w
w是模型参数,
x
i
x_i
xi是输入特征,
y
i
y_i
yi是输出标签,
λ
\lambda
λ是正则化参数。
本算法的关键在于对权重进行L1正则化,即在每次迭代中对权重进行L1范数惩罚。
梯度下降的公式为:
w
t
+
1
=
w
t
−
η
∇
f
(
w
t
)
w_{t+1} = w_t - \eta \nabla f(w_t)
wt+1=wt−η∇f(wt)
其中,
w
t
w_t
wt是第
t
t
t次迭代时的权重,
η
\eta
η是学习率,
∇
f
(
w
t
)
\nabla f(w_t)
∇f(wt)是目标函数在
w
t
w_t
wt处的梯度。而f(w_t)正是Lasso回归的目标函数。其梯度为:
∇
f
(
w
t
)
=
1
n
∑
i
=
1
n
(
y
i
−
w
t
T
x
i
)
x
i
+
λ
sign
(
w
t
)
\nabla f(w_t) = \frac{1}{n} \sum_{i=1}^{n} (y_i - w_t^T x_i)x_i + \lambda \text{sign}(w_t)
∇f(wt)=n1i=1∑n(yi−wtTxi)xi+λsign(wt)
标准代码如下
def l1_regularization_gradient_descent(X: np.array, y: np.array, alpha: float = 0.1, learning_rate: float = 0.01, max_iter: int = 1000, tol: float = 1e-4) -> tuple:
n_samples, n_features = X.shape
# Zero out weights and bias
weights = np.zeros(n_features)
bias = 0
for iteration in range(max_iter):
# Predict values
y_pred = np.dot(X, weights) + bias
# Calculate error
error = y_pred - y
# Gradient for weights with L1 penalty
grad_w = (1 / n_samples) * np.dot(X.T, error) + alpha * np.sign(weights)
# Gradient for bias (no penalty for bias)
grad_b = (1 / n_samples) * np.sum(error)
# Update weights and bias
weights -= learning_rate * grad_w
bias -= learning_rate * grad_b
# Check for convergence
if np.linalg.norm(grad_w, ord=1) < tol:
break
return [round(w, 3) for w in weights], round(bias, 3)