吴恩达深度学习课程第三部分笔记要点_吴恩达 hand designing-优快云博客

本文链接：https://blog.youkuaiyun.com/Sebastien23/article/details/78094117

本文探讨了机器学习中评估指标的选择与优化方法，包括单一数值评估指标如F1分数的使用，以及如何通过多种满足指标来平衡模型性能。此外，还介绍了训练集、验证集和测试集的划分原则，并讨论了如何通过误差分析、偏差与方差分解来改进模型。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1-Using a single number evaluation metric:
eg. Trade-off betweeen the Precision & the Recall --> F1 Score;

2-Satisfacing & Optimizing metrics:
one optimizing metric(尽可能优化) with multiple satisfacing metrics(需满足指定threshold);

3-Dev set & test set:
dev set(hold out cross validation set) + one single metric;
确保开发集和测试集数据来自同一分布；

4-训练集、开发集与测试集大小：
-总数据量较小(<10000):60% + 20% + 20%;
-总数据量较大(百万级):98% + 1% + 1%;
-有时候只有训练集和开发集（应防止在开发集上过拟合）；

5-根据具体情况改变metric和测试数据：
-正则化；定义一个evaluation metric，优化此metric的表现；
eg. cat recognition给把porn图片错误识别为猫的情况加大惩罚；
eg. 用户上传的图片没有训练数据中的图片清晰；

6-Bayes Optimal Error(best possible error) & human-level performance;

7-Avoidable bias = Training error - Bayes error;
Avoidable Variance = Dev error - Training error;

8-Human-level error as a proxy for Bayes error;

9-To reduce avoidable bias:
Train bigger model;
longer/better optimization algorithm(momentum, RMSprop, Adam);
hyperparameters search;
-To reduce avoidable variance:
More data;
Regularization(L2, dropout, data augmentation);
hyperparameters search;

10-Error Analysis:
从测试集或开发集中选择一定数量的错误例子手动检查错误类型（假阳性/假阴性），统计属于不同错误类型的样本数量；

11-Remove incorrectly labeled data(for dev/test set):
-Compare the errors due to incorrect labels with the overall dev/test set errors by Error Analysis;
-Apply same process to your dev set & test set to make sure they still come from the same distribution;
-Consider examining examples your algo got right as well as ones it got wrong;
-Train and dev/test data may now come from slightly different distributions;

12-Training & testing on different distributions;

13-Bias & variance with mismatched data distributions
Train-dev set: same distribution as training set, but not used for training;
-> if the errors on train-dev set is almost as high as on dev set, generalization problem;
-> if the errors on train-dev set is almost as low as on train set, data mismatch problem;

training set errors - human level = avoidable bias;
train-dev set errors - training set errors = variance;
dev set errors - train-dev set errors = data mismatch;
test set errors - dev set errors = degree of overfitting;

14-Addressing data mismatch:
-do manuel error analysis to understand difference between train set & dev/test set;
-collect more training data similar to dev/test set;
-artificial data synthesis(eg. car noise, might cause overfitting on synthesised data);

15-Transfer learning:
pre-training & fine-tuning;
-low level features from the pre-trained network could be helpful for the fine-tuned network;
-useful when you have lots of data for the problem you're transferring from
& usually relatively less data for the problem you're transferring to;
eg. 在原有的训练好的图像识别网络上更换输出层或最后几层网络重新训练其他特定网络，
比如用1百万数据量的猫照片识别网络迁移学习X光照片诊断网络；

16-Multi-task learning:
一个数据可以有多个标签（在一张图片中标记多个物体）；
-Training on a set tasks that could benefit from having shared lower-level features;

-Can train a big enough networks to do well on all the tasks;

17-End-to-end Learning
-traditionnal pipeline vs end-to-end approach;
-let the data speak;
-less hand-designing of components needed;
-may need a large amount of data;

-excludes potentially useful hang-designed components(human knowledge).