【Machine Learning】【Andrew Ng】- Quiz2(Week 11)

最新推荐文章于 2021-08-09 18:54:39 发布

塔希提

最新推荐文章于 2021-08-09 18:54:39 发布

阅读量2.9k

点赞数

CC 4.0 BY-SA版权

分类专栏： Machine Learning - Andrew Ng 文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/sundy0808/article/details/79003456

Machine Learning - Andrew Ng 专栏收录该内容

25 篇文章

订阅专栏

本文探讨了机器学习系统中不同组件的性能评估与优化策略，包括滑动窗口检测器的应用、数据标注成本估算、天花板分析的价值及其实施步骤。通过具体案例说明如何识别系统瓶颈并采取有效措施。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、Suppose you are running a sliding window detector to find text in images. Your input images are 1000x1000 pixels. You will run your sliding windows detector at two scales, 10x10 and 20x20 (i.e., you will run your classier on lots of 10x10 patches to decide if they contain text or not; and also on lots of 20x20 patches), and you will “step” your detector by 2 pixels each time. About how many times will you end up running your classier on a single 1000x1000 test set image?
A. 250,000
B. 100,000
C. 500,000
D. 1,000,000
答案：C。
1000*1000/(2*2) = 500*500这是逛一遍需要的大概次数
500*500*2这是逛两遍需要的大概次数。

2、Suppose that you just joined a product team that has been developing a machine learning application, using m=1,000 training examples. You discover that you have the option of hiring additional personnel to help collect and label data.
You estimate that you would have to pay each of the labellers 10 dollars per hour, and that each labeller can label 4 examples per minute. About how much will it cost to hire labellers to label 10,000 new training examples?
A. 10,000 dollars
B. 400 dollars
C. 600 dollars
D. 250 dollars
答案：B。
10刀（1小时）可以做4*60=240个样本
10,000个样本需要：10,000/240= 41.6个小时，也就是416刀

3、What are the benefits of performing a ceiling analysis? Check all that apply.
A. It can help indicate that certain components of a system might not be worth a significant amount of work improving, because even if it had perfect performance its impact on the overall system may be small.
B. If we have a low-performing component, the ceiling analysis can tell us if that component has a high bias problem or a high variance problem.
C. It is a way of providing additional training data to the algorithm.
D. It helps us decide on allocation of resources in terms of which component in a machine learning pipeline to spend more effort on.
答案：AD。就是可以省时省力嘛

4、Suppose you are building an object classier, that takes as input an image, and recognizes that image as either containing a car (y=1 ) or not (y=0 ). For example, here are a positive example and a negative example:
After carefully analyzing the performance of your algorithm, you conclude that you need more positive (y=1 ) training examples. Which of the following might be a good way to get additional positive examples?
A. Apply translations, distortions, and rotations to the images already in your training set.
B. Select two car images and average them to make a third example.
C. Take a few images from your training set, and add random, gaussian noise to every pixel.
D. Make two copies of each image in the training set; this immediately doubles your training set size.
答案：A
A正确
B错误，图片取平均后，应该已经看不出车的样子了吧
C 加白噪声等于没加
D 用copies训练，参数不会改变，因为当前参数就是用这个样本训练来的啊，再把这个样本的copy件用来训练，就相当于验证了一把，当然参数不会改变啊。

5、Suppose you have a Photo OCR system, where you have the following pipeline:
这里写图片描述
You have decided to perform a ceiling analysis on this system, and find the following:

Which of the following statements are true?
A. There is a large gain in performance possible in improving the character recognition system.
B. Performing the ceiling analysis shown here requires that we have ground-truth labels for the text detection, character segmentation and the character recognition systems.
C. The least promising component to work on is the character recognition system, since it is already obtaining 100% accuracy.
D. The most promising component to work on is the text detection system, since it has the lowest performance (72%) and thus the biggest potential gain.
答案：AB。
component表示，如果当前component取正确值，那么最后的正确率是多少。所以是将有该component和没有该component作对比。明显character segmentation正确的话准确率能提高更多。