how to download cifar10 and split it into training file and testing file in python

the first step:download the cifar10 using the shell scripts

#!/usr/bin/env bash
if ! [ -d "cifar-10-batches-py" ]; then
        wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
        tar xvzf cifar-10-python.tar.gz
        rm -f cifar-10-python.tar.gz
fi

in the first line, it means that this is a bash shell script
the second line represents that if there is no cifar-10-batches-py in the folder,then it will automatically download the batch file at toronto.
the third line, it stands for unfold the cifar10 at the same folder.
the forth line, it means that delete the compressed file right now
the last line, the scripts is finished.

Suppose you have created a folder named data, then if you cd /data then you will see the contents below.
在这里插入图片描述
if you cd into the cifar-10-py you will see the file below data_batch_1 …5 is the training data,and the test_batch is the testing data you will see this contents in the folder named cifar10-batchs-py

how to split the cifar10 into training data, testing data

in the training process of a model, using the training data, in the testing process, it will use the testing data. So it is necessary to split the data into training data and testing data. But to our joy, the cifar10 have already split the data into training data and testing data, so what you need to do is to just take it out.

#because there are five files as the training data in the folder as you can see above,so the nbbatch=5
def load_cifar10_2(nbbatch=5):
    all_data = []#this is the traning data 
    all_labels = []#this is the trianing label
    test_data=[]#this is the testing data
    test_labels=[]#this is the testing label
    ########
    #this section is for getting the training data
    for i in range(nbbatch):
        data = open("./data/cifar-10-batches-py/data_batch_%s" % (i + 1), 'rb')
        #open files in a sequence, and the flag is 'rb' because this file is opened in a read-only and Binary mode.(all images should do like this)  
        dict = pickle.load(data, encoding='bytes')
        #the pickle.load return a dict in a bytes mode
        data = dict[b'data']
        labels = np.asarray(dict[b'labels']).reshape((-1,1))
        #it changes it to an array
        all_data.append(data)
        all_labels.append(labels)
    ########
    data=open("./data/cifar-10-batches-py/test_batch",'rb')
    dict=pickle.load(data,encoding='bytes')
    data=dict[b'data']
    labels = np.asarray(dict[b'labels']).reshape((-1,1))
    test_data.append(data)
    test_labels.append(labels)


    all_data = np.concatenate(all_data, axis=0)
    all_labels = np.concatenate(all_labels, axis=0)
	#cat the data and labels
    test_data=np.concatenate(test_data,axis=0)
    test_labels=np.concatenate(test_labels,axis=0)
    return (all_data, all_labels,test_data,test_labels)

how to change the data more convient


def cifar10_proper_array(data):
    all_red = data[:,:1024].reshape(-1, 32, 32)
    all_green = data[:,1024:2048].reshape(-1, 32, 32)
    all_blue = data[:,2048:].reshape(-1, 32, 32)
    return np.stack([all_red, all_green, all_blue], axis=1) / 255.0

the snippet above is for data normalization.

data, labels,test_data,test_label =load_cifar10_2()
labels = labels.reshape(-1)
test_label=test_label.reshape(-1)

data = cifar10_proper_array(data)
test_data=cifar10_proper_array(test_data)

the code above is the main function

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值