首先show一个示例:
from numpy import *
import matplotlib
import matplotlib.pyplot as plt
fr = open("datingTestSet.txt")
lines = fr.readlines()
n = len(lines)
datingDataset = zeros((n,3))
datingLabels = []
i = 0
for line in lines:
fdata = line.strip().split('\t')
datingDataset[i,0:3] = fdata[0:3]
datingLabels.append(int(fdata[-1]))
i = i + 1
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(datingDataset[:,1],datingDataset[:,2],20.0*array(datingLabels),20.0*array(datingLabels))
plt.show()
注:datingTestSet.txt是一个样本集合,每行代表一个4维的样本,前三维是样本的特征属性,第四维是样本所属的类别标签(共三类:1,2,3)
在可视化程序中,只用了每个数据的第二维和第三维
ax.scatter(datingDataset[:,1],datingDataset[:,2],20.0*array(datingLabels),20.0*array(datingLabels))
其中,20.0*array(datingLabels),20.0*array(datingLabels)是用类别标签标表示不同类别点的颜色和大小,结果如图: