scikit-learn 线性回归模型的score函数，返回值是决定系数R^2

最新推荐文章于 2025-10-30 21:49:04 发布

原创最新推荐文章于 2025-10-30 21:49:04 发布 · 3.6w 阅读

60 ·

CC 4.0 BY-SA版权

机器学习基础概念专栏收录该内容

9 篇文章

订阅专栏

本文介绍了线性回归中使用的决定系数R^2的概念及其计算方式。R^2用于评估预测模型的好坏，值范围通常在0到1之间，值越接近1表示模型的拟合效果越好。

部署运行你感兴趣的模型镜像

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score

线性回归的score函数返回的是：对预测结果计算出的决定系数R^2

LinearRegression的score函数源码：

def score(self, X, y, sample_weight=None):
        """Returns the coefficient of determination R^2 of the prediction.
        The coefficient R^2 is defined as (1 - u/v), where u is the residual
        sum of squares ((y_true - y_pred) ** 2).sum() and v is the total
        sum of squares ((y_true - y_true.mean()) ** 2).sum().
        The best possible score is 1.0 and it can be negative (because the
        model can be arbitrarily worse). A constant model that always
        predicts the expected value of y, disregarding the input features,
        would get a R^2 score of 0.0.
        Parameters
        ----------
        X : array-like, shape = (n_samples, n_features)
            Test samples.
        y : array-like, shape = (n_samples) or (n_samples, n_outputs)
            True values for X.
        sample_weight : array-like, shape = [n_samples], optional
            Sample weights.
        Returns
        -------
        score : float
            R^2 of self.predict(X) wrt. y.
        """

        from .metrics import r2_score
        return r2_score(y, self.predict(X), sample_weight=sample_weight,
                        multioutput='variance_weighted')

决定系数R^2

决定系数（coefficient ofdetermination），有的教材上翻译为判定系数，也称为拟合优度。

决定系数反应了y的波动有多少百分比能被x的波动所描述，即表征依变数Y的变异中有多少百分比,可由控制的自变数X来解释。

意义：拟合优度越大，说明x对y的解释程度越高。自变量对因变量的解释程度越高，自变量引起的变动占总变动的百分比高。观察点在回归直线附近越密集。

在对数据进行线性回归计算之后,我们能够得出相应函数的系数, 那么我们如何知道得出的这个系数对方程结果的影响有强呢?

所以我们用到了一种方法叫 coefficient of determination (决定系数) 来判断回归方程拟合的程度

由于 $SS_{res}$ 是估计数据也就是回归数据与平均值的误差
$SS_{tot}$ 是真实数据与平均值的误差
$SS_{res}$ 一般比 $SS_{tot}$ 小，结果一般在0-1之间， $SS_{tot}$ 在数据确定后始终是固定值，如果估计的越不准确，那么 $SS_{res}$ 就越大，那么 $R^{2}$ 就越接近0，所以估计的越准确就越接近1