【Machine Learning】【Andrew Ng】- Quiz1(Week 8)

本文详细解析了K-means聚类算法的应用场景、工作原理及其关键步骤。通过实例介绍了如何选择合适的聚类数目,并探讨了算法中涉及的成本函数概念。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、For which of the following tasks might K-means clustering be a suitable
algorithm? Select all that apply.
A. Given a database of information about your users, automatically group them into different market segments.
B. Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.
C. Given historical weather records, predict the amount of rainfall
tomorrow (this would be a real-valued output)
D. Given sales data from a large number of products in a
supermarket, estimate future sales for each of these products.
答案:AB。K均值算法只要用来分类,但是编程题里的拓展练习也有用来压缩图片,从128×128×24的压缩至16×24+128×128×4的16种颜色

2、Suppose we have three cluster centroids 这里写图片描述 and 这里写图片描述. Furthermore, we have a training example 这里写图片描述. After a cluster assignment step, what will c(i) be?
A. c(i) = 1
B. c(i) is not assigned
C. c(i) = 3
D. c(i) = 2
答案:C。直接求x到各个聚类中心的距离就好,到第三个聚类中心的距离为根2,最小,所以归为第三类。

3、K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
A. Randomly initialize the cluster centroids.
B. Test on the cross-validation set.
C. The cluster assignment step, where the parameters c(i) are updated.
D. Move the cluster centroids, where the centroids μk are updated.
答案:CD。A随机初始化聚类中心是K均值的重要步骤,但是不在内部循环里。只用在当聚类中心比较少的时候,用来避开由于初始值不好引起的局部解。

4、Suppose you have an unlabeled dataset {x(1),…,x(m)}. You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data. What is the recommended way for choosing which one of these 50 clusterings to use?
A. Compute the distortion function J(c(1),…,c(m),μ(1),…μ(k)),
and pick the one that minimizes this.
B. Plot the data and the cluster centroids, and pick the clustering that gives the most “coherent” cluster centroids.
C. Use the elbow method.
D.Manually examine the clusterings, and pick the best one.
答案:A。当然代价最小的啊。B不知道是个啥,CD是用来确定聚类中心个数的时候用的,而本题聚类中心个数已经确定了。

5、Which of the following statements are true? Select all that apply.
A. On every iteration of K-means, the cost function J(c(1),…,c(m),μ(1),…μ(k)) (the distortion function) should either stay the same or decrease; in particular, it should not increase.
B. A good way to initialize K-means is to select K (distinct) examples
from the training set and set the cluster centroids equal to these selected examples.
C. K-Means will always give the same results regardless of the initialization of the centroids.
D. Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid
答案:AB。C错误,K均值算法的效果与初始值的选取有很大的关系,D在迭代过程中一般会一直改变聚类中心的,直到不改变时聚类结束。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值