学习目标:熟练掌握mindspore.dataset
mindspore.dataset中有常用的视觉、文本、音频开源数据集供下载,点赞、关注+收藏哦
- 了解
mindspore.dataset
-
mindspore.dataset
应用实践 - 拓展自定义数据集
昇思平台学习时间记录:
一、关于mindspore.dataset
mindspore.dataset
模块提供了加载和处理各种通用数据集的API,如MNIST、CIFAR-10、CIFAR-100、VOC、COCO、ImageNet、CelebA、CLUE等, 也支持加载业界标准格式的数据集,包括MindRecord、TFRecord、Manifest等。此外,用户还可以使用此模块定义和加载自己的数据集。
1.1 常用数据集下载资源地址
开源数据集地址url如下:
1.加载MNIST:url= "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip"
2.加载CIFAR-10:"url=https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"
2.加载CIFAR-100:"url=https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets cifar-100-python.tar.gz"
3.加载ImageNet:url= https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/vit_imagenet_dataset.zip
4.加载狗与牛角包分类数据集DogCroissants:url=https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/beginner/DogCroissants.zip
5. 数据集coco2017 url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/ssd_datasets.zip"
1.2 数据集地址程序下载方式
方式一:from download import download
安装依赖库download
pip install download
方式二:from mindvision.dataset import DownLoad
安装依赖库:mindvision
pip install mindvision
示例如下:
# Begin to show your code!
from download import download
from mindvision.dataset import DownLoad
def downloadData1(url="https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/beginner/DogCroissants.zip"):
dataset_url = url
path = download(dataset_url, "./datasets", kind="zip", replace=True) # 当前文件夹下保存DogCroissants数据集
def downloadData2(url):
dataset_url = url
path = "./"
dl = DownLoad()
# 下载并解压数据集
dl.download_and_extract_archive(dataset_url, path)
if __name__ == "__main__":
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/" \
"notebook/datasets/MNIST_Data.zip"
downloadData1() # 方式一,下载DogCroissants
downloadData2(url) # 方式二,下载MNIST
运行结果:成功下载数据集
方式三:from mindvision.dataset import Mnist
使用方法:
from mindvision.dataset import Mnist
download_train = Mnist(path="./mnist", split="train", batch_size=32