数据集的官方下载地址:
CIFAR-10 and CIFAR-100 datasets (utoronto.ca)
CIFAR-10
Size: 32×32 RGB图像 ,数据集本身是 BGR 通道
Num: 训练集 50000 和 测试集 10000,一共60000张图片
Classes: plane(飞机), car(汽车),bird(鸟),cat(猫),deer(鹿),dog(狗),frog(蛙类),horse(马),ship(船),truck(卡车)
解压后的内容如下
数据集读取
数据集读取的官网方法(python3 version):
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
返回的是一个 字典
2)将字典打印出来
dict = unpickle('./data_batch_1')
print(dict)
{b'batch_label': b'training batch 1 of 5',
b'labels': [6, 9 ... 1, 5],
b'data': array([[ 59, 43, 50, ..., 140, 84, 72],
...
[ 62, 61, 60, ..., 130, 130, 131]], dtype=uint8),
b'filenames': [b'leptodactylus_pentadactylus_s_000004.png', b'camion_s_000148.png',
...
b'estate_car_s_001433.png', b'cur_s_000170.png']}
b’batch_label’
: 所属文件集b’labels’
: 图片标签b’data’
:图片数据b’filename’
:图片名称
3)打印类型
print(type(dict[b'batch_label']))
print(type(dict[b'labels']))
print(type(dict[b'data']))
print(type(dict[b'filenames']))
<class 'bytes'>
<class 'list'>
<class 'numpy.ndarray'>
<class 'list'>
4)打印图片类型
img = dict[b'data']
print(img.shape)
(10000, 3072)
其中 3072 = 32 * 32 * 3 (图片 size)
5)绘制图片
show_image = img[666]
img_reshape = show_image.reshape(3, 32, 32)
pic = img_reshape.transpose(1, 2, 0) # (3, 32, 32) --> (32, 32, 3)
plt.imshow(pic)
plt.show()
label = dict[b'labels']
image_label = label[666]
print(image_label)
9
卡车