Clustering as a Mixture of Gaussians

本文探讨了基于模型的聚类方法,通过使用参数化的分布来数学地表示集群,并试图优化数据与模型之间的拟合度。重点介绍了混合高斯模型在聚类过程中的应用,包括算法的工作原理、优势以及如何最大化似然函数。此外,通过实例演示了如何使用期望最大化算法求解混合高斯模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Clustering as a Mixture of Gaussians

Introduction to Model-Based Clustering
There’s another way to deal with clustering problems: a model-based approach, which consists in using certain models for clusters and attempting to optimize the fit between the data and the model.
In practice, each cluster can be mathematically represented by a parametric distribution, like a Gaussian (continuous) or a Poisson (discrete). The entire data set is therefore modelled by a mixture of these distributions. An individual distribution used to model a specific cluster is often referred to as a component distribution.

A mixture model with high likelihood tends to have the following traits:

  • component distributions have high “peaks” (data in one cluster are tight);
  • the mixture model “covers” the data well (dominant patterns in the data are captured by component distributions).

Main advantages of model-based clustering:

  • well-studied statistical inference techniques available;
  • flexibility in choosing the component distribution;
  • obtain a density estimation for each cluster;
  • a “soft” classification is available.

Mixture of Gaussians
The most widely used clustering method of this kind is the one based on learning a mixture of Gaussians: we can actually consider clusters as Gaussian distributions centred on their barycentres, as we can see in this picture, where the grey circle represents the first variance of the distribution:

The algorithm works in this way:

  • it chooses the component (the Gaussian) at random with probability ;
  • it samples a point .

Let’s suppose to have:

  • x1, x2,..., xN

We can obtain the likelihood of the sample: .
What we really want to maximise is  (probability of a datum given the centres of the Gaussians).

is the base to write the likelihood function:

Now we should maximise the likelihood function by calculating , but it would be too difficult. That’s why we use a simplified algorithm called EM (Expectation-Maximization).

The EM Algorithm
The algorithm which is used in practice to find the mixture of Gaussians that can model the data set is called EM (Expectation-Maximization) (Dempster, Laird and Rubin, 1977). Let’s see how it works with an example.

Suppose xk are the marks got by the students of a class, with these probabilities:

x1 = 30             

x2 = 18             

x3 = 0               

x4 = 23             

First case: we observe that the marks are so distributed among students:

x1 : a students
x2 : b students
x3 : c students
x4 : d students

We should maximise this function by calculating . Let’s instead calculate the logarithm of the function and maximise it:

Supposing a = 14, b = 6, c = 9 and d = 10 we can calculate that .

Second case: we observe that marks are so distributed among students:

x1 + x2 : h students
x3 : c students
x4 : d students

We have so obtained a circularity which is divided into two steps:

  • expectation: 
  • maximization: 

This circularity can be solved in an iterative way.

Let’s now see how the EM algorithm works for a mixture of Gaussians (parameters estimated at the pth iteration are marked by a superscript (p):

  1. Initialize parameters:


  2. E-step:


  3. M-step:



    where R is the number of records.

Bibliography

from: http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/mixture.html
内容概要:本文详细介绍了Rust在系统编程中的应用,包括基础知识、核心技术及开发流程。首先阐述了Rust语言的基础及环境搭建,强调了其强类型系统和现代语法。接着深入探讨了所有权机制与内存安全,指出Rust通过所有权系统确保内存安全,避免悬垂指针和内存泄漏。再者,讲解了Rust的并发编程模型,通过消息传递和无数据竞争的线程模型实现安全并发。此外,讨论了Rust在底层硬件访问与嵌入式开发中的应用,展示了其在资源受限平台上的适应性。随后,介绍了系统调用与内核模块开发,说明了Rust如何调用操作系统底层API并实现与现有内核代码的无缝集成。还提及了性能优化与调试技巧,如使用编译器优化选项和工具链支持。最后,通过实战项目案例解析和社区资源展望,展示了Rust在系统编程领域的潜力和发展前景。; 适合人群:有一定编程基础,尤其是对系统编程感兴趣的开发者,包括操作系统内核开发、驱动程序编写、嵌入式系统开发等领域的工程师。; 使用场景及目标:①学习Rust语言的基础知识和环境搭建,掌握强类型系统和现代语法;②理解所有权机制与内存安全,避免传统系统编程中的常见错误;③掌握并发编程模型,实现安全高效的多线程操作;④了解底层硬件访问和嵌入式开发,适应资源受限平台;⑤掌握系统调用与内核模块开发,实现与现有系统的无缝集成;⑥学习性能优化与调试技巧,提高系统软件的运行效率和稳定性;⑦通过实战项目案例解析,掌握系统编程的实际应用。; 其他说明:Rust作为系统编程的新选择,不仅提升了传统系统软件的安全性,还通过现代语言特性和工具链优化了开发效率。开发者应充分利用Rust的特性,构建更加健壮、高效的系统软件,迎接未来计算机领域的新挑战。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值