训练集、验证集、测试集的区别，这次你一定能看懂

最新推荐文章于 2025-07-07 11:25:20 发布

梅花14

最新推荐文章于 2025-07-07 11:25:20 发布

阅读量8.4k

点赞数 8

分类专栏：深度学习

深度学习专栏收录该内容

9 篇文章

订阅专栏

疑惑点

我们其实主要搞不清验证集和测试集的关系和区别，训练集大家都很清楚是干什么的。网上说了一大堆我也是没看懂，今天突然又看到这个东西，终于在stack overflow看到了一个很容易理解的解答，下面给大家分享一下。

训练集和验证集都是在训练阶段使用的！！！

相信你看了下面这段立马就明白了
----------------------------
for each epoch
    for each training data instance
        propagate error through the network
        adjust the weights
        calculate the accuracy over training data
    for each validation data instance
        calculate the accuracy over the validation data
    if the threshold validation accuracy is met
        exit training
    else
        continue training

一旦你完成了训练，那么你就对照你的测试集进行测试，并验证准确性是否足够。

详细解释

训练集：该数据集用于调整神经网络上的权重。
测试集：该数据集用于最小化过度拟合。您没有使用此数据集调整网络的权重，您只是验证训练数据集的准确度的任何提高实际上都会提高准确性，而不是之前未向网络显示的数据集，或者至少网络没有接受过培训（即验证数据集）。 如果训练数据集的准确度增加，但验证数据集的准确度保持不变或降低，则您的神经网络过度拟合，应该停止训练。
验证集：该数据集仅用于测试最终解决方案，以确认网络的实际预测能力。
（以上内容均来自Google翻译，有些地方翻译的可能不太通顺和精确，由于水平有限，大家将就看）

下面附上原文

链接：https://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-netwo

The training and validation sets are used during training.

for each epoch
    for each training data instance
        propagate error through the network
        adjust the weights
        calculate the accuracy over training data
    for each validation data instance
        calculate the accuracy over the validation data
    if the threshold validation accuracy is met
        exit training
    else
        continue training

Once you’re finished training, then you run against your testing set and verify that the accuracy is sufficient.

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. You’re not adjusting the weights of the network with this data set, you’re just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn’t trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you’re overfitting your neural network and you should stop training.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.