我在做cs231n课后作业的knn这部分作业的时候借鉴知乎大神的代码https://zhuanlan.zhihu.com/p/28204173
主要贴出博主遇到问题的地方
主程序中部分
#****************交叉验证******************#
num_folds=5
k_choices=[1,3,5,9,11,13,15,20,50]
x_train_folds=[]
y_train_folds=[]
y_train=y_train.reshape(-1,1)
x_train_folds=np.array_split(x_train,num_folds) #1
y_train_folds=np.array_split(y_train,num_folds)
k_to_accuracies={} #2
for k in k_choices:
k_to_accuracies.setdefault(k,[])
for i in range(num_folds): #3
classifier=KNearestNeighbor()
x_val_train=np.vstack(x_train_folds[0:i]+x_train_folds[i+1:]) #3.1
y_val_train = np.vstack(y_train_folds[0:i] + y_train_folds[i + 1:])
y_val_train=y_val_train[:,0] #逗号十分重要
classifier.train(x_val_train,y_val_train)
for k in k_choices:
y_val_pred=classifier.predict(x_train_folds[i],k=k) #3.2
num_correct=np.sum(y_val_pred==y_train_folds[i][:,0])
accuracy=float(num_correct)/len(y_val_pred)
k_to_accuracies[k]=k_to_accuracies[k]+[accuracy]
for k in sorted(k_to_accuracies): #4
sum_accuracy=0
for accuracy in k_to_accuracies[k]:
print('k=%d, accuracy=%f' % (k,accuracy))
sum_accuracy+=accuracy
print('the average accuracy is :%f' % (sum_accuracy/5))
##不同k值下的准确值的曲线
for k in k_choices:
accuracies=k_to_accuracies[k]
plt.scatter([k]*len(accuracies),accuracies)
accuracies_mean=np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])
accuracies_std=np.array([np.std(v) for k ,v in sorted(k_to_accuracies.items())])
plt.errorbar(k_choices,accuracies_mean,yerr=accuracies_std)
plt.title('cross-validation on k')
plt.xlabel('k')
plt.ylabel('cross-validation accuracy')
plt.show()
#************用最好的k值进行预测计算
best_k=input("请输入最好的k值\t")
best_k = 10
classifier=KNearestNeighbor()
classifier.train(x_train,y_train)
y_test_pred = classifier.predict(x_test,k=best_k)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
报错所在
def predict_labels(self,dists,k=1):
num_test=dists.shape[0]
y_pred=np.zeros(num_test)
for i in range(num_test):
closest_y[]=[]
y_indicies=np.argsort(dists[i,:],axis=0) #将dists按从小到大排序并返回索引
closest_y=self.y_train[y_indicies[: k]] #最近的k个值赋给closest_y
print(closest_y)
y_pred[i]=np.argmax(np.bincount(closest_y)) ###此处报错
return y_pred
交叉验证和用指定k值运算正确率分开运行都可以运行,但在先交叉验证找出最佳k之后再用最佳k进行预测计算,就会报错ValueError: object too deep for desired array
通过帮助文档有以下语句:x是1维非负的
help(numpy.bincount)
bincount(...)
bincount(x, weights=None, minlength=0)
Parameters
----------
x : array_like, 1 dimension, nonnegative ints
Input array.
我在程序中通过print(closest_y)打印来:
交叉验证时打印的是这样的
在测试集调用时也就是出错时,closest_y是这样的
直观来看确实是维度太深了。关键是什么这次调用的时候就出错?在之前那么多次调用就不出错?,我尝试了一上午想在此基础上修改,未果,先记之,如有朋友知道修改方法,欢迎指教和讨论。
另附上其他博友通过另外方式实现的链接:https://blog.youkuaiyun.com/stalbo/article/details/79281901
******************5月6日更新内容***************
今天在做SVM的作业的时候,看到了numpy.squeeze()这个函数,突然想到说不定可以用,此函数的作用是从数组的形状中删除单维条目,即把shape中为1的维度去掉,参考博客点击打开链接。然后我就用这个函数进行改造,代码如下:
def predict_labels(self,dists,k=1):
num_test=dists.shape[0]
y_pred=np.zeros(num_test)
for i in range(num_test):
closest_y=[]
y_indicies=np.argsort(dists[i,:],axis=0) #将dists按从小到大排序并返回索引
closest_y=self.y_train[y_indicies[: k]] #最近的k个值赋给closest_y
if np.shape(np.shape(closest_y))[0] !=1: ############增加程序
closest_y=np.squeeze(closest_y) ############增加程序
y_pred[i]=np.argmax(np.bincount(closest_y))
return y_pred
因为一维数组用两次shape之后是返回的是1,而多维的大于1,所以加入一个if的选择语句,当返回非1时用np.squeeze进行降维,之后再用np.bincount()就不会出错了。