[机器学习]三行代码快速划分交叉训练中训练集和验证集

最新推荐文章于 2025-02-18 20:40:28 发布

LandH的Blog

最新推荐文章于 2025-02-18 20:40:28 发布

阅读量3.5k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：传统机器学习文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/u013084616/article/details/79410337

传统机器学习专栏收录该内容

3 篇文章

订阅专栏

本文介绍了一种使用numpy.random.choice()和set()快速划分训练集和验证集的方法，并展示了如何利用该方法进行批量训练。

使用numpy.random.choice()和set()快速划分交叉训练数据集

之前在划分训练集和验证集时，都是手工随机生成index，很笨。

学到的新方法如下：

import numpy as np
# 正态分布生成原始数据
x = np.random.random.normal(1,0.1,100)
# 按8:2分割数据
x_train_index = np.random.choice(len(x),round(len(x)*0.8),replace = False)
x_valid_index = np.array(list(set(range(len(x))) - set(x_train_index)))

x_train = x[x_train_index]
x_valid = x[x_valid_index]

总结1: np.random.choice()

Definition : choice(a, size=None, replace=True, p=None)

Type : Function of None module

Parameters
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n)
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
replace : boolean, optional
Whether the sample is with or without replacement
是否包含重复元素
p : 1-D array-like, optional
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
按什么概率分布选取元素，默认是均匀分布

Returns
samples : 1-D ndarray, shape (size,)
The generated random samples

总结2: set()

Python的集合(set)和其他语言类似, 是一个无序不重复元素集, 基本功能包括关系测试和消除重复元素.

总结3: batch training

batch training 一样可以使用这种方法选取数据

batch_size = 25
for epoch in range(100):
    rand_index = np.random.choice(len(x_train), size = batch_size)
    rand_x = x_train[rand_index]
    rand_y = y_train[rand_index]
    ...