sklearn.datasets.make_regression

最新推荐文章于 2025-03-18 10:44:38 发布

翻译最新推荐文章于 2025-03-18 10:44:38 发布 · 4.4k 阅读

sklearn 专栏收录该内容

2 篇文章

订阅专栏

本文介绍sklearn.datasets.make_regression函数，用于生成随机回归问题，详细解释了各参数含义，如样本数、特征数、有信息的特征数量等，并提供了使用示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

sklearn.datasets.make_regression(
    n_samples=100, 
    n_features=100, 
    n_informative=10, 
    n_targets=1, 
    bias=0.0, 
    effective_rank=None,
    tail_strength=0.5, 
    noise=0.0, 
    shuffle=True, 
    coef=False, 
    random_state=None)

Generate a random regression problem.（产生一个随机回归问题）

Parameters:	n_samples : int, optional (default=100) 样本数 n_features : int, optional (default=100) 特征数 n_informative : int, optional (default=10) The number of informative features, i.e., the number of features used to build the linear model used to generate the output.（有信息的特征数量，也就是用来构造线性模型，生成输出的特征数量） n_targets : int, optional (default=1) The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar.（回归目标的数量，也就是对应于一个样本输出向量y的维度。默认输出是标量） bias : float, optional (default=0.0) The bias term in the underlying linear model. effective_rank : int or None, optional (default=None) if not None: The approximate number of singular vectors required to explain most of the input data by linear combinations. Using this kind of singular spectrum in the input allows the generator to reproduce the correlations often observed in practice. if None: The input set is well conditioned, centered and gaussian with unit variance. tail_strength : float between 0.0 and 1.0, optional (default=0.5) The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None. noise : float, optional (default=0.0) The standard deviation of the gaussian noise applied to the output. shuffle : boolean, optional (default=True) Shuffle the samples and the features. coef : boolean, optional (default=False) If True, the coefficients of the underlying linear model are returned. random_state : int, RandomState instance or None (default) Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.
Returns:	X : array of shape [n_samples, n_features] The input samples. y : array of shape [n_samples] or [n_samples, n_targets] The output values. coef : array of shape [n_features] or [n_features, n_targets], optional The coefficient of the underlying linear model. It is returned only if coef is True.

Parameters:

n_samples : int, optional (default=100)

样本数

n_features : int, optional (default=100)

特征数

n_informative : int, optional (default=10)

The number of informative features, i.e., the number of features used to build the linear model used to generate the output.（有信息的特征数量，也就是用来构造线性模型，生成输出的特征数量）

n_targets : int, optional (default=1)

The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar.（回归目标的数量，也就是对应于一个样本输出向量y的维度。默认输出是标量）

bias : float, optional (default=0.0)

The bias term in the underlying linear model.

effective_rank : int or None, optional (default=None)

if not None:

The approximate number of singular vectors required to explain most of the input data by linear combinations. Using this kind of singular spectrum in the input allows the generator to reproduce the correlations often observed in practice.

if None:

The input set is well conditioned, centered and gaussian with unit variance.

tail_strength : float between 0.0 and 1.0, optional (default=0.5)

The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None.

noise : float, optional (default=0.0)

The standard deviation of the gaussian noise applied to the output.

shuffle : boolean, optional (default=True)

Shuffle the samples and the features.

coef : boolean, optional (default=False)

If True, the coefficients of the underlying linear model are returned.

random_state : int, RandomState instance or None (default)

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.