数据集整理汇总附链接(深度学习)

之前遇到的一些数据集,自己收集一下,归到一起使用方便,可能不是很全,持续更新汇总。。。

1. Image Datasets — 图像数据集

DatasetLink
MNISThttp://yann.lecun.com/exdb/mnist/
CIFAR-100http://www.cs.utoronto.ca/~kriz/cifar.html
Imagenethttp://www.image-net.org/
Caltech 101http://www.vision.caltech.edu/Image_Datasets/Caltech101/
Caltech 256http://www.vision.caltech.edu/Image_Datasets/Caltech256/
PASCAL VOChttps://pjreddie.com/projects/pascal-voc-dataset-mirror/
COCOhttp://cocodataset.org/
COIL100http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php
STL-10http://www.stanford.edu/~acoates//stl10/
Google Open imageshttps://ai.googleblog.com/2016/09/introducing-open- images-dataset.html
Labelmehttp://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php

2. Speech Datasets — 语音数据集

DatasetLink
Google Audiosethttps://research.google.com/audioset/dataset/index.html
TIMIThttp://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
VoxForgehttp://www.voxforge.org/
2000 HUB5 Englishhttps://catalog.ldc.upenn.edu/LDC2002T43
LibriSpeechhttp://www.openslr.org/12/
VoxCelebhttp://www.robots.ox.ac.uk/~vgg/data/voxceleb/
Open SLRhttps://www.openslr.org/51
CALLHOME American English Speechhttps://catalog.ldc.upenn.edu/LDC97S42

3. Text Datasets — 文本数据集

DatasetLink
English Broadcast Newshttps://catalog.ldc.upenn.edu/LDC97S44
SQuADhttps://rajpurkar.github.io/SQuAD-explorer/
Billion Word Datasethttp://www.statmt.org/lm-benchmark/
20 Newsgroupshttp://qwone.com/~jason/20Newsgroups/
Google Books Ngramshttps://aws.amazon.com/datasets/google-books-ngrams/
UCI Spambasehttps://archive.ics.uci.edu/ml/datasets/Spambase
Common Crawlhttp://commoncrawl.org/the-data/
Yelp Open Datasethttps://www.yelp.com/dataset

4. Natural Language Datasets — 自然语言数据集

DatasetLink
Web 1T 5-gramhttps://catalog.ldc.upenn.edu/LDC2006T13
Blizzard Challenge 2018https://www.synsig.org/index.php/Blizzard_Challenge_2018
Flickr personal taxonomieshttps://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html
Multi-Domain Sentiment Datasethttp://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Enron Email Datasethttps://www.cs.cmu.edu/~./enron/
Blogger Corpushttp://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
Wikipedia Links Datahttps://code.google.com/archive/p/wiki-links/downloads
Gutenberg eBooks Listhttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
SMS Spam Collectionhttp://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
UCI’s Spambase datahttps://archive.ics.uci.edu/ml/datasets/Spambase

5. Geospatial Datasets — 地理空间数据集

DatasetLink
OpenStreetMaphttps://www.openstreetmap.org
Landsat8https://landsat.gsfc.nasa.gov/landsat-8/
NEXRADhttps://www.ncdc.noaa.gov/data-access/radar-data/nexrad
ESRI Open datahttps://hub.arcgis.com/pages/open-data
USGS EarthExplorerhttps://earthexplorer.usgs.gov/
OpenTopographyhttps://opentopography.org/
NASA SEDAChttps://sedac.ciesin.columbia.edu/
NASA Earth Observationshttps://neo.sci.gsfc.nasa.gov/
Terra Populushttps://terra.ipums.org/

6. Recommender Systems Datasets — 推荐系统数据集

DatasetLink
Movielenshttps://grouplens.org/datasets/movielens/
Million Song Datasethttps://www.kaggle.com/c/msdchallenge
Last.fmhttps://grouplens.org/datasets/hetrec-2011/
Book-crossing Datasethttp://www2.informatik.uni-freiburg.de/~cziegler/BX/
Jesterhttps://goldberg.berkeley.edu/jester-data/
Netflix Prizehttps://www.netflixprize.com/
Pinterest Fashion Compatibilityhttp://cseweb.ucsd.edu/~jmcauley/datasets.html#pinterest
Amazon Question and Answer Datahttp://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_qa
Social Circles Datahttp://cseweb.ucsd.edu/~jmcauley/datasets.html#socialcircles

7. Economics and Finance Datasets — 经济和金融数据集

DatasetLink
Quandlhttps://www.quandl.com/
World Bank Open Datahttps://data.worldbank.org/
IMF Datahttps://www.imf.org/en/Data
Financial Times Market Datahttps://markets.ft.com/data/
Google Trendshttps://trends.google.com/trends/?q=google&ctab=0&geo=all&date=all&sort=0
American Economic Associationhttps://www.aeaweb.org/resources/data/us-macro-regional
US stock Datahttps://github.com/eliangcs/pystock-data
World Factbookhttps://www.cia.gov/library/publications/download/
Dow Jones Index Data Sethttp://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index

8. Autonomous Vehicles Datasets — 自动驾驶数据集

DatasetLink
BDD100khttps://bdd-data.berkeley.edu/
Baidu Apolloscapeshttp://apolloscape.auto/
Comma.aihttps://archive.org/details/comma-dataset
Oxford’s Robotic Carhttps://robotcar-dataset.robots.ox.ac.uk/
Cityscape Datasethttps://www.cityscapes-dataset.com/
CSSAD Datasethttp://aplicaciones.cimat.mx/Personal/jbhayet/ccsad-dataset
KUL Belgium Traffic Sign Datasethttp://www.vision.ee.ethz.ch/~timofter/traffic_signs/
LISAhttp://cvrr.ucsd.edu/LISA/datasets.html
Bosch Small Traffic Lighthttps://hci.iwr.uni-heidelberg.de/node/6132
LaRa Traffic Light Recognitionhttp://www.lara.prd.fr/benchmarks/trafficlightsrecognition
WPI Datasetshttp://computing.wpi.edu/dataset.html

Reference:
《A review of deep learning with special emphasis on architectures,
applications and recent trends》

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

圆滚熊

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值