how to use cifar10 in python
the first step:download the cifar10 using the shell scripts
#!/usr/bin/env bash
if ! [ -d "cifar-10-batches-py" ]; then
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar xvzf cifar-10-python.tar.gz
rm -f cifar-10-python.tar.gz
fi
in the first line, it means that this is a bash shell script
the second line represents that if there is no cifar-10-batches-py in the folder,then it will automatically download the batch file at toronto.
the third line, it stands for unfold the cifar10 at the same folder.
the forth line, it means that delete the compressed file right now
the last line, the scripts is finished.
Suppose you have created a folder named data, then if you cd /data then you will see the contents below.
if you cd into the cifar-10-py you will see the file below data_batch_1 …5 is the training data,and the test_batch is the testing data
how to split the cifar10 into training data, testing data
in the training process of a model, using the training data, in the testing process, it will use the testing data. So it is necessary to split the data into training data and testing data. But to our joy, the cifar10 have already split the data into training data and testing data, so what you need to do is to just take it out.
#because there are five files as the training data in the folder as you can see above,so the nbbatch=5
def load_cifar10_2(nbbatch=5):
all_data = []#this is the traning data
all_labels = []#this is the trianing label
test_data=[]#this is the testing data
test_labels=[]#this is the testing label
########
#this section is for getting the training data
for i in range(nbbatch):
data = open("./data/cifar-10-batches-py/data_batch_%s" % (i + 1), 'rb')
#open files in a sequence, and the flag is 'rb' because this file is opened in a read-only and Binary mode.(all images should do like this)
dict = pickle.load(data, encoding='bytes')
#the pickle.load return a dict in a bytes mode
data = dict[b'data']
labels = np.asarray(dict[b'labels']).reshape((-1,1))
#it changes it to an array
all_data.append(data)
all_labels.append(labels)
########
data=open("./data/cifar-10-batches-py/test_batch",'rb')
dict=pickle.load(data,encoding='bytes')
data=dict[b'data']
labels = np.asarray(dict[b'labels']).reshape((-1,1))
test_data.append(data)
test_labels.append(labels)
all_data = np.concatenate(all_data, axis=0)
all_labels = np.concatenate(all_labels, axis=0)
#cat the data and labels
test_data=np.concatenate(test_data,axis=0)
test_labels=np.concatenate(test_labels,axis=0)
return (all_data, all_labels,test_data,test_labels)
how to change the data more convient
def cifar10_proper_array(data):
all_red = data[:,:1024].reshape(-1, 32, 32)
all_green = data[:,1024:2048].reshape(-1, 32, 32)
all_blue = data[:,2048:].reshape(-1, 32, 32)
return np.stack([all_red, all_green, all_blue], axis=1) / 255.0
the snippet above is for data normalization.
data, labels,test_data,test_label =load_cifar10_2()
labels = labels.reshape(-1)
test_label=test_label.reshape(-1)
data = cifar10_proper_array(data)
test_data=cifar10_proper_array(test_data)
the code above is the main function