SAS Module 5 Classification Analysis

本文探讨了SAS中分类分析的基本概念,包括逻辑回归模型用于概率估计,选择分类阈值以平衡假阳性和假阴性,通过敏感性和特异性评估模型性能,以及使用提升图评价模型效果。同时,讨论了多重逻辑回归及其潜在的混淆问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

SAS

Module 5 Classification Analysis

Classification:

  • Similar as regression model, but dependent variable is a categorical attribute
  • Common special case as binary classification
  • Generally, we are more interested in estimating the probabilities that the output belongs to each category level (ex: 65% to 1, 35% to 0)

Logistic Regression Model: calculate the probabilities
在这里插入图片描述
Classifier:

  • We always need to select a classification threshold probability for determining how to assign entities to predicted classifications. For example, if P(Y=1|X)>0.5, then classifier Y is 1, otherwise is 0.
  • But we also need to do tradeoff between optimizing for false positives or false negatives, to select better threshold to minimize the misclassification rate.
    在这里插入图片描述
  • The way to do tradeoff is to calculate the Sensitivity and Specificity , draw ROC chart and calculate ROC separation (KS-Youden). The cutoff of the largest value of KS-Youden will be the best threshold to select.
    Sensitivity = TP/(TP+FN) percentage of true positive results that are identified correctly
    Specificity = TN/(TN+FP) percentage of true negative results that are identified correctly
    KS-Youden = Sensitivity - (1-Specificity)

Lift Chart: the chart to evaluate how much the selected model is better than the random drawing and how far the selected model is away from the best model. Closer the model lift line to the best model line, better the model selected.

Multiple Logistic Regression:

  • Similar as the single logistic regression, just include multiple regressors and coefficients
    在这里插入图片描述

  • But sometimes, there are Confounding issues: the regressor has different performance in single logistic regression and multiple logistic regression because some regressors in multiple logistic regression may correlated in such a way that distort the true relationship

  • In SAS, we can use “Group By” function to separate for each level of the categorical variable

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值