[ML] {ud120} Lesson 4: Decision Trees

本文深入探讨了决策树算法的核心概念,包括线性可分数据、决策树构建、熵与基尼不纯度等关键指标,并通过一个具体的电子邮件作者识别项目实例,展示了决策树在实际应用中的操作流程。

Linearly Separable Data

 

 

 

 

 

 Multiple Linear Questions

 

 

 

 

 

 

 

Constructing a Decision Tree First Split 

 

 

 

 

 Coding A Decision Tree

 

 

 

 

 

 

 

 

 

 Decision Tree Parameters

 

 

 

 

 Data Impurity and Entropy

 

 

 

 

 Formula of Entropy

 

 

There is an error in the formula in the entropy written on this slide. There should be a negative (-) sign preceding the sum:

Entropy = - \sum_i (p_i) \log_2 (p_i)i(pi)log2(pi)

 

 

 

 

 

IG = 1

 

 

 

 Tuning Criterion Parameter

 

gini is another measurement of purity

 

 

 

Decision Tree Mini-Project

 

In this project, we will again try to identify the authors in a body of emails, this time using a decision tree. The starter code is in decision_tree/dt_author_id.py.

Get the data for this mini project from here.

Once again, you'll do the mini-project on your own computer and enter your answers in the web browser. You can find the instructions for the decision tree mini-project here.

 

转载于:https://www.cnblogs.com/ecoflex/p/10987754.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值