一、k近邻方法
1. 使用两层循环计算距离矩阵
训练数据X_train
和测试数据X
中每一行是一个样本点。距离矩阵dists
中每一行为X
中的一点与X_train
中各个点的距离。
k_nearest_neighbor
文件中的compute_distances_two_loops()
函数:
def compute_distances_two_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in tqdm(xrange(num_test)):
for j in xrange(num_train):
# TODO
dists[i, j] = np.sqrt(np.sum((X[i] - self.X_train[j])**2))
return dists
2. 实现分类函数
使用numpy.argsort()
获取k近邻,使用numpy.bincount()
实现k近邻的投票统计。
k_nearest_neighbor
文件中的predict_labels()
函数:
def predict_labels(self, dists, k=1):
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
indxRangeList = range(k)
for i in xrange(num_test):
closest_y = []
# TODO
closest_y = self.y_train[np.argsort(dists[i])[:k]]
# TODO
y_pred[i] = np.argmax(np.bincount(closest_y))
return y_pred
3. 使用一层循环计算距离矩阵
每次取出X
中的一个点,计算其与X_train
中各个点的距离。
这时就可看出使用矩阵中的一行代表一个样本点的好处:按行进行索引后,取出的向量与另一个矩阵自动满足numpy
的广播条件。
k_nearest_neighbor
文件中的compute_distances_one_loop()
函数:
def compute_distances_one_loop(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in tqdm(xrange(num_test)):
# TODO
dists[i, :] = np.sqrt(np.sum((X[i] - self.X_train)**2, axis = 1))
return dists
4. 使用向量化方法计算距离矩阵(不使用循环)
假设点用行向量表示,先考虑计算两个点 x ⃗ \vec{x} x和 y ⃗ \vec{y} y的距离:
∥
x
⃗
−
y
⃗
∥
2
=
(
x
⃗
−
y
⃗
)
⋅
(
x
⃗
−
y
⃗
)
T
=
x
⃗
x
⃗
T
−
2
x
⃗
y
⃗
T
+
y
⃗
y
⃗
T
=
∥
x
⃗
∥
2
−
2
x
⃗
y
⃗
T
+
∥
y
⃗
∥
2
\|\vec{x} - \vec{y}\|^2 = (\vec{x}-\vec{y})\cdot(\vec{x}-\vec{y})^T = \vec{x}\vec{x}^T - 2\vec{x}\vec{y}^T + \vec{y}\vec{y}^T=\|\vec{x}\|^2 - 2\vec{x}\vec{y}^T + \|\vec{y}\|^2
∥x−y∥2=(x−y)⋅(x−y)T=xxT−2xyT+yyT=∥x∥2−2xyT+∥y∥2
若设两个点集为:
X = [ −    x ⃗ 1    − −    x ⃗ 2    − ⋮ −    x ⃗ n    − ] Y = [ −    y ⃗ 1    − −    y ⃗ 2    − ⋮ −    y ⃗ m    − ] X = \left[ \begin{array}{c} -\,\, \vec{x}_1 \,\,-\\ -\,\, \vec{x}_2 \,\,-\\ \vdots \\ - \,\, \vec{x}_n \,\,- \end{array} \right] \quad Y = \left[ \begin{array}{c} -\,\, \vec{y}_1 \,\,-\\ -\,\, \vec{y}_2 \,\,-\\ \vdots \\ - \,\, \vec{y}_m \,\,- \end{array} \right] X=⎣⎢⎢⎢⎡−x1−−x2−⋮−xn−⎦⎥⎥⎥⎤Y=⎣⎢⎢⎢⎡−y1−−y2−⋮−ym−⎦⎥⎥⎥⎤
则 ∥ x ⃗ ∥ 2 \|\vec{x}\|^2 ∥x∥2和 ∥ y ⃗ ∥ 2 \|\vec{y}\|^2 ∥y∥2分别可用 X X X和 Y Y Y的按行取模的值替代; x ⃗ y ⃗ T \vec{x}\vec{y}^T xyT可用 X Y T XY^T XYT替代。然后在计算距离矩阵时,与 X X X相关的沿列方向进行广播,与 Y Y Y相关的沿行方向进行广播。
k_nearest_neighbor
文件中的compute_distances_no_loops()
函数:
def compute_distances_no_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
# TODO
XY = np.dot(X,self.X_train.T)
X_norm2 = np.sum(np.square(X),axis=1,keepdims=True)
Y_norm2 = np.sum(np.square(self.X_train),axis=1,keepdims=True)
dists = np.sqrt(X_norm2 - 2*XY + Y_norm2.T)
return dists
其中keepdims
参数表示:对于numpy
中归并类的函数(即如np.sum()
这类由矩阵中的某些值得到一个值的函数),保持所得结果的维数与原矩阵的维数相同。
5. k-folds
选取超参数
- 训练数据集与验证数据集划分。
使用np.array_split()
函数,默认沿行划分。
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
- 针对超参数 k k k进行参数筛选。
对每一个
k
k
k值,使用num_folds-1
折的数据作为训练集,使用剩余的一折数据作为验证集,得到模型的num_folds
个准确率。
for k in k_choices:
if k in k_to_accuracies:
continue
else:
k_to_accuracies[k] = []
for foldIndx in range(num_folds):
X_train_cv = np.vstack(X_train_folds[0:foldIndx] + X_train_folds[foldIndx+1:])
y_train_cv = np.hstack(y_train_folds[0:foldIndx] + y_train_folds[foldIndx+1:])
classifier.train(X_train_cv, y_train_cv)
dists = classifier.compute_distances_no_loops(X_train_folds[foldIndx])
y_pred = classifier.predict_labels(dists, k)
# Compute and print the fraction of correctly predicted examples
num_correct = np.sum(y_pred == y_train_folds[foldIndx])
accuracy = float(num_correct) / y_train_folds[foldIndx].shape[0]
k_to_accuracies[k].append(accuracy)
所得准确率统计如下图所示。

可见在k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
范围内,当k=10
时,所得准确率最高。此时k近邻模型在测试集上的准确率约为28.2%
。
二、支持向量机方法
对单一样本 ( x ⃗ , y ) (\vec{x}, y) (x,y),假设模型给出的在各个类别上的得分为 s ⃗ = { s i } = x ⃗ ⋅ W \vec{s}=\{s_i\}=\vec{x}\cdot W s={si}=x⋅W(行向量),则损失函数为: L = ∑ i ≠ y max ( 0 ,   s i − s y + 1 ) L = \sum_{i\ne y}\max(0,\, s_i - s_y+1) L=∑i̸=ymax(0,si−sy+1),梯度为:
c o l i d L d W ∣ i ≠ y = x ⃗ T ⋅ ( s i − s y + 1 > 0 ) c o l y d L d W = − x ⃗ T ⋅ ∑ i ≠ y ( s i − s y + 1 > 0 ) \begin{array}{lcr} \mathrm{col}_i \left. \frac{dL}{dW} \right|_{i\ne y} & = & \vec{x}^T\cdot (s_i-s_y+1>0) \\ \\ \mathrm{col}_y \frac{dL}{dW} & = & -\vec{x}^T\cdot\sum_{i\ne y}(s_i-s_y+1>0) \end{array} colidWdL∣∣i̸=ycolydWdL==xT⋅(si−sy+1>0)−xT⋅∑i̸=y(si−sy+1>0)
1. 计算损失函数与梯度的朴素方法(以循环方式)
linear_svm
文件中svm_loss_naive()
函数:
def svm_loss_naive(W, X, y, reg):
dW = np.zeros(W.shape) # initialize the gradient as zero
# TODO
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:, j] += X[i,:].T
dW[:, y[i]] -= X[i,:].T
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
loss += reg * np.sum(W * W)
dW += reg * 2 * W
return loss, dW
2. 使用向量化方法计算损失与梯度
事实上,实现朴素的计算方法,对整理出向量化实现方法是很有帮助的。
linear_svm
文件中svm_loss_vectorized()
函数:
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
# TODO
num_train = X.shape[0]
scores = X.dot(W)
correct_scores = scores[np.arange(num_train), y][:, newaxis]
scores = scores - correct_scores + 1
scores[range(num_train), y] = 0
scores = np.maximum(0, scores)
loss = np.sum(scores) / num_train
loss += reg * np.sum(W * W)
# TODO
scores[scores > 0] = 1
scores[range(num_train), y] -= np.sum(scores, axis=1)
dW = dW + X.T.dot(scores)
dW /= num_train
dW += reg * 2 * W
return loss, dW
3. 随机梯度算法
linear_classifier
文件中LinearClassifier.train()
函数:
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
batch_size=200, verbose=False):
num_train, dim = X.shape
num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
if self.W is None:
# lazily initialize W
self.W = 0.001 * np.random.randn(dim, num_classes)
# Run stochastic gradient descent to optimize W
loss_history = []
for it in xrange(num_iters):
X_batch = None
y_batch = None
# TODO
# Sample batch_size elements
idx = np.random.choice(num_train, batch_size)
X_batch = X[idx,:]
y_batch = y[idx]
# evaluate loss and gradient
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
# perform parameter update
# TODO
self.W -= learning_rate * grad
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
4. 预测函数
linear_classfier
文件中LinearClassifier.predict()
函数:使用numpy.argmax()
函数获取得分最高的位置索引。
def predict(self, X):
y_pred = np.argmax(X.dot(self.W), axis=1)
return y_pred
5. 超参数选取
for lr in learning_rates:
for rs in regularization_strengths:
svm = LinearSVM()
loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=rs,
num_iters=1000, verbose=False)
y_val_pred = svm.predict(X_val)
y_train_pred = svm.predict(X_train)
train_acc = np.mean(y_train == y_train_pred)
val_acc = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (train_acc, val_acc)
if val_acc > best_val:
best_val = val_acc
best_svm = svm
三、Softmax
分类器
分类器输出为被判判断是各类别的概率:
对单一样本
(
x
⃗
,
y
)
(\vec{x}, y)
(x,y),假设模型给出的在各个类别上的概率为
p
⃗
=
{
p
i
}
=
e
x
⃗
⋅
W
/
∥
e
x
⃗
⋅
W
∥
l
=
1
\vec{p}=\{p_i\}=e^{\vec{x}\cdot W}/\|e^{\vec{x}\cdot W}\|_{l=1}
p={pi}=ex⋅W/∥ex⋅W∥l=1(行向量),损失函数为:
L
=
−
log
p
y
=
log
∥
e
x
⃗
⋅
W
∥
l
=
1
−
x
⃗
⋅
c
o
l
y
W
L = -\log p_y=\log \|e^{\vec{x}\cdot W}\|_{l=1}-\vec{x}\cdot\mathrm{col}_y W
L=−logpy=log∥ex⋅W∥l=1−x⋅colyW,则梯度为:
c o l i d L d W ∣ i ≠ y = e x ⃗ ⋅ W ∥ e x ⃗ ⋅ W ∥ l = 1 x ⃗ T − x ⃗ T ⋅ ( i = y ) = [ p i − ( i = y ) ] ⋅ x ⃗ T \mathrm{col}_i \left. \frac{dL}{dW} \right|_{i\ne y} = \frac{e^{\vec{x}\cdot W}}{\|e^{\vec{x}\cdot W}\|_{l=1}}\vec{x}^T-\vec{x}^T\cdot(i=y) = \left[p_i-(i=y)\right]\cdot \vec{x}^T colidWdL∣∣∣∣i̸=y=∥ex⋅W∥l=1ex⋅WxT−xT⋅(i=y)=[pi−(i=y)]⋅xT
1. 计算损失函数与梯度的朴素方法(以循环方式)
softmax
文件中softmax_loss_naive()
函数:
def softmax_loss_naive(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
# TODO
num_train = X.shape[0]
num_class = W.shape[1]
for i in xrange(num_train):
probs = np.exp(X[i,:].dot(W))
probs = probs / np.sum(probs)
loss -= np.log(probs[y[i]])
for j in xrange(num_class):
if j == y[i]:
dW[:,j] += (probs[j] - 1)*X[i,:].T
else:
dW[:,j] += probs[j]*X[i,:].T
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
loss += reg * np.sum(W * W)
dW += reg * 2 * W
return loss, dW
2. 使用向量化方法计算损失与梯度
softmax
文件中的softmax_loss_vectorized()
函数:
def softmax_loss_vectorized(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_class = W.shape[1]
# TODO
probs = np.exp(X.dot(W))
probs = probs / np.sum(probs, axis = 1)[:, newaxis]
loss = loss - np.sum(np.log(probs[range(num_train), y]))
loss /= num_train
loss += reg * np.sum(W * W)
probs[range(num_train), y] = probs[range(num_train), y] - 1
dW = dW + X.T.dot(probs)
dW /= num_train
dW += reg * 2 * W
return loss, dW
3. 超参数选取
for lr in learning_rates:
for rs in regularization_strengths:
softmaxClassfier = Softmax()
loss_hist = softmaxClassfier.train(X_train, y_train, learning_rate=lr, reg=rs,
num_iters=1000, verbose=False)
y_val_pred = softmaxClassfier.predict(X_val)
y_train_pred = softmaxClassfier.predict(X_train)
train_acc = np.mean(y_train == y_train_pred)
val_acc = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (train_acc, val_acc)
if val_acc > best_val:
best_val = val_acc
best_softmax = softmaxClassfier
四、两层神经网络
1. 神经网络的前向传播与反向传播
这里采用和Softmax
分类器同样的损失函数。则前向传播的过程很直白。而由BP算法的推导,反向传播的过程也很直白。
neural_net
文件的loss()
函数:
def loss(self, X, y=None, reg=0.0):
# Unpack variables from the params dictionary
W1, b1 = self.params['W1'], self.params['b1']
W2, b2 = self.params['W2'], self.params['b2']
N, D = X.shape
# Compute the forward pass
scores = None
# TODO: Perform the forward pass, computing the class scores for the input.
h1 = np.maximum(0, X.dot(W1) + b1)
scores = h1.dot(W2) + b2
# If the targets are not given then jump out, we're done
if y is None:
return scores
# Compute the loss
loss = None
# TODO: Finish the forward pass, and compute the loss.
probs = np.exp(scores)
probs = probs / np.sum(probs, axis = 1, keepdims=True)
loss = 0 - np.sum(np.log(probs[range(N), y]+eps))
loss /= N
loss += reg * (np.sum(W1*W1) + np.sum(W2*W2))
# Backward pass: compute gradients
grads = {}
# TODO: backward propagation
probs[range(N), y] = probs[range(N), y] - 1
gradW2 = h1.T.dot(probs)
gradW2 /= N
gradW2 += reg * 2 * W2
gradB2 = np.sum(probs, axis=0)
gradB2 /= N
gradH1 = probs.dot(W2.T)
gradW1 = X.T.dot((h1>0)*gradH1)
gradW1 /= N
gradW1 += reg * 2 * W1
gradB1 = np.sum((h1>0)*gradH1, axis=0)
gradB1 /= N
grads["W1"] = gradW1
grads["W2"] = gradW2
grads["b1"] = gradB1
grads["b2"] = gradB2
return loss, grads
2. 训练、预测、超参数选取
与其余分类器类似。
五、特征工程
采用两层神经网络,使用如下参数可在测试数据集上获得60%的准确率:
learning_rate = 1.5
regularization_strength = 0.001
num_iters = 5000
batch_size = 500