sklearn【数据集】datasets

本文介绍了sklearn库中内置的数据集,分为样本生成器和样本加载两类。样本生成器包括弯弯的月亮、画圈圈、一团一团和线性回归数据集;样本加载则涵盖波士顿房价、手写数字、鸢尾花和新闻主题20分类等经典数据集,通过实例展示了如何使用这些数据集。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、简介

sklearn内置数据集,分别是【Samples generator】和【Loaders】
1、【samples generator】是生成数据的工具
2、【Loaders】是可直接加载的数据集

2、样本生成器

2.1、弯弯的月亮

from sklearn.datasets import make_moons
import matplotlib.pyplot as mp
X, y = make_moons(noise=.2)
mp.scatter(X[:, 0], X[:, 1], s=40, c=y)
mp.show()

在这里插入图片描述

2.2、画圈圈

from sklearn.datasets import make_circles
import matplotlib.pyplot as mp
X, y = make_circles(noise=.2, factor=.4)
mp.scatter(X[:, 0], X[:, 1], s=40, c=y)
mp.show()

在这里插入图片描述

2.3、一团一团

from sklearn.datasets import make_blobs
import matplotlib.pyplot as mp
from mpl_toolkits import mplot3d  # 三维坐标轴
X, y = make_blobs(centers=[[-1, -1, -1], [1, 1, 1]], cluster_std=1)
ax = mplot3d.Axes3D(mp.figure())
ax.scatter(X[:, 0], X[:, 1], X[:, 2], s=99, c=y)
mp.show()

在这里插入图片描述

2.4、线性回归

from sklearn.datasets import make_regression
import matplotlib.pyplot as mp
# 创建数据
X, y, coef = make_regression(n_features=1, noise=9, coef=True)
x = X.reshape(-1)
# 可视化
mp.scatter(x, y, c='g', alpha=0.3)
mp.plot(x, coef * x)
mp.show()

在这里插入图片描述

3、样本加载

3.1、波士顿房价

在这里插入图片描述

3.2、手写数字

from sklearn.datasets import load_digits
import matplotlib.pyplot as mp
digits = load_digits()
images, target, data = digits.images, digits.target, digits.data
print(images.shape, target.shape, data.shape)
# (1797, 8, 8) (1797,) (1797, 64)
for i in range(10):
    mp.subplot(1, 10, i + 1)
    mp.axis('off')
    mp.imshow(images[i], cmap=mp.cm.gray_r)
    mp.title(target[i])
mp.show()

在这里插入图片描述

3.3、鸢尾花

from sklearn.datasets import load_iris
import matplotlib.pyplot as mp, seaborn
bunch = load_iris()
X = bunch.data
y = bunch.target
names = bunch.feature_names
mp.figure(figsize=(8, 7))
length = len(names)
for i in range(length):
    for j in range(length):
        mp.subplot(length, length, i * length + j + 1)
        if i == j:
            seaborn.violinplot(y, X[:, i])  # 小提琴图
        else:
            mp.scatter(X[:, i], X[:, j], 10, y)
        if i == length - 1:
            mp.xlabel(names[i].replace('(cm)', ''))
        if j == 0:
            mp.ylabel(names[j].replace('(cm)', ''))
mp.tight_layout()
mp.show()

在这里插入图片描述

3.4、新闻主题20分类

## 新闻主题20分类
from sklearn.datasets import fetch_20newsgroups
data_train = fetch_20newsgroups()
data = data_train.data  # 文本数据
target = data_train.target  # 标签索引
target_names = data_train.target_names  # 20个新闻组名称
print(data[0])
print('索引和对应组名', target[0], target_names[target[0]])
print('样本数', len(data), target.shape)
From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15

 I was wondering if anyone out there could enlighten me on this car I saw
the other day. It was a 2-door sports car, looked to be from the late 60s/
early 70s. It was called a Bricklin. The doors were really small. In addition,
the front bumper was separate from the rest of the body. This is 
all I know. If anyone can tellme a model name, engine specs, years
of production, where this car is made, history, or whatever info you
have on this funky looking car, please e-mail.

Thanks,
- IL
   ---- brought to you by your neighborhood Lerxst ----

索引和对应组名 7 rec.autos
样本数 11314 (11314,)

4、附录

官网:https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
在这里插入图片描述
在这里插入图片描述

注释

EnCn
violin小提琴
iris鸢(yuan)尾花
sepal花萼
petal花瓣
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小基基o_O

您的鼓励是我创作的巨大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值