2022吴恩达机器学习Deeplearning.ai课程编程作业C3_W1: KMeans_Assignment

K-means Clustering

In this this exercise, you will implement the K-means algorithm and use it for image compression.

  • You will start with a sample dataset that will help you gain an intuition of how the K-means algorithm works.
  • After that, you wil use the K-means algorithm for image compression by reducing the number of colors that occur in an image to only those that are most common in that image.

import numpy as np
import matplotlib.pyplot as plt
from utils import *

%matplotlib inline

1 - Implementing K-means

The K-means algorithm is a method to automatically cluster similar
data points together.

  • Concretely, you are given a training set { x ( 1 ) , . . . , x ( m ) } \{x^{(1)}, ..., x^{(m)}\} { x(1),...,x(m)}, and you want
    to group the data into a few cohesive “clusters”.

  • K-means is an iterative procedure that

    • Starts by guessing the initial centroids, and then
    • Refines this guess by
      • Repeatedly assigning examples to their closest centroids, and then
      • Recomputing the centroids based on the assignments.
  • In pseudocode, the K-means algorithm is as follows:

    # Initialize centroids
    # K is the number of clusters
    centroids = kMeans_init_centroids(X, K)
    
    for iter in range(iterations):
        # Cluster assignment step: 
        # Assign each data point to the closest centroid. 
        # idx[i] corresponds to the index of the centroid 
        # assigned to example i
        idx = find_closest_centroids(X, centroids)
    
        # Move centroid step: 
        # Compute means based on centroid assignments
        centroids = compute_means(X, idx, K)
    
  • The inner-loop of the algorithm repeatedly carries out two steps:

    • (i) Assigning each training example x ( i ) x^{(i)} x(i) to its closest centroid, and
    • (ii) Recomputing the mean of each centroid using the points assigned to it.
  • The K K K-means algorithm will always converge to some final set of means for the centroids.

  • However, that the converged solution may not always be ideal and depends on the initial setting of the centroids.

    • Therefore, in practice the K-means algorithm is usually run a few times with different random initializations.
    • One way to choose between these different solutions from different random initializations is to choose the one with the lowest cost function value (distortion).

You will implement the two phases of the K-means algorithm separately
in the next sections.

  • You will start by completing find_closest_centroid and then proceed to complete compute_centroids.

1.1 Finding closest centroids

In the “cluster assignment” phase of the K-means algorithm, the
algorithm assigns every training example x ( i ) x^{(i)} x(i) to its closest
centroid, given the current positions of centroids.

Exercise 1

Your task is to complete the code in find_closest_centroids.

  • This function takes the data matrix X and the locations of all
    centroids inside centroids
  • It should output a one-dimensional array idx (which has the same number of elements as X) that holds the index of the closest centroid (a value in { 1 , . . . , K } \{1,...,K\} { 1,...,K}, where K K K is total number of centroids) to every training example .
  • Specifically, for every example x ( i ) x^{(i)} x(i) we set
    c ( i ) : = j t h a t    m i n i m i z e s ∣ ∣ x ( i ) − μ j ∣ ∣ 2 , c^{(i)} := j \quad \mathrm{that \; minimizes} \quad ||x^{(i)} - \mu_j||^2, c(i):=jthatminimizes∣∣x(i)μj2
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

alterego2380

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值