Model Comparison

最新推荐文章于 2025-09-25 18:12:43 发布

原创最新推荐文章于 2025-09-25 18:12:43 发布 · 523 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #算法

Machine Learning 专栏收录该内容

2 篇文章

订阅专栏

                    
                        
                    
                    ModelTypeTargetAssumptionLinear / Non-LinearLoss FunctionEvaluation MetricAdvantageDisadvantage
Linear RegressionSupervisedRegression1. Independent Features; 2. the dependence of Y on X1,X2,… Xp is linear; 3. error terms are uncorrelated since correlated error terms are bad for standard error and confidence interval calculation (e.g. time series problem); 4. error terms are independent from X; 5. E[ε] = 0; 6. error terms have a constant varianceLinearRSSR^2; MSE1. Simple approach to supervised learning; 2. No need to scaleStrong Assumption
Logistic RegressionSupervisedClassification1. each sample is assigned to one and only one label;LinearNegative Log LikelihoodConfusion Matrix, Precision, Recall, etc1. Not Sensitive; 2. Good for k=2, Bernoulli Classification ProblemNot stable for “well-separated” data
Linear Discriminant AnalysisSupervisedClassification1. normal (Gaussian) distributions for each class; 2. same covariance matrix Σ in each classLinearConfusion Matrix, Precision, Recall, etc1.Stable for “well-separate” data; 2. Good for k>2, it provides low-dimensional views of the data; 3. Good for n << pSensitive to the observations that far from the decision boundary
Naive BayesSupervisedClassificationconditional independence model in each classLinearConfusion Matrix, Precision, Recall, etc1. good for computation (from joint probability to conditional probability); 2. Rather robust to isolated noise samples, since we average large samples; 3. Handles missing value by ignoring them (do not disregard the record/data point, just disregard the missing feature); 4. Rather robust to irrelevant attributes; 5. useful when p is very large1. strong assumption; 2. Not robust to redundant attributes (correlated attributes), because they break down the conditional independence assumption
KNNSupervisedClassifiersimilar things are always in close proximityNon-linearConfusion Matrix, Precision, Recall, etc1.no training needed, just measure the distance; 2. Simple to implement; 3. Few tuning parameter: just K and distance; 4. Flexible: classes do not need to be linearly separable1. KNN cannot tell us which predictor is more important; 2. Computationally expensive: we need to calculate distance from new observation to all samples; 3. Sensitive to imbalanced dataset: may get poor results for infrequent classes; 4. Sensitive to irrelevant inputs: irrelevant inputs make distances less meaningful for identifying similar neighbors
CARTSupervisedbothdoesn’t matterRSS for Regression; Misclassification rate/ Gini index for Classification1. Easy interpretation; 2. Display graphically; 3. Trees can easily handle qualitative predictors without the need to create dummy variables1. Poor prediction accuracy; 2. Trees can be very non-robust. A small change in the data can cause a large change in the final estimated tree
BaggingSupervisedbothdoesn’t matterOut-of-bag Error Estimation1. Improved ensemble method using bootstrap aggregation 2. Better Prediction Accuracy ; 3. reduce variance, avoid overfittingIt is no longer clear which variables are most important to the procedure, so not easy to interpretation
Random ForestSupervisedbothdoesn’t matter1. Improved Bagged Trees by way of a small tweak that de-correlates the trees; 2. This reduces the variance when we average the treesNot easy to interpretation
BoostingSupervisedbothdoesn’t matter1. Boosting is remarkably resistant to overfitting, and it is fast and simple; 2. It improves the performance of many kinds of machine learning algorithms, not only decision tree1. Really hard to interpretation; 2. Susceptible to noisy data
SVMSupervisedclassificationdoesn’t matterHinge Loss / Squared Hinge LossHamming Loss1. Good for “well-separate” data; 2. SVMs are popular in high-dimensional classification problems with p>>n; 3. For nonlinear boundaries, kernel SVMs are popular; 4. More stable; not sensitive to outliers, only depends on support vectorsResults are not probabilities
K-Means ClusteringUnsupervisedClusteringdoesn’t matterwithin-cluster variationEasy to implement1. Need to find K, but it’s hard to get a perfect one; 2. Sensitive to outliers; 3. Not very robust to changes of the data
Hierarchical ClusteringUnsupervisedClusteringdoesn’t matter1. Does not need to choose K; 2.1. Sometimes yield worse (i.e. less accurate) results than K - means clustering for a given number of clusters; 2. Need to consider which height to cut the model; 3. Sensitive to outliers; 4. Not very robust to changes of the data

                

Model	Type	Target	Assumption	Linear / Non-Linear	Loss Function	Evaluation Metric	Advantage	Disadvantage
Linear Regression	Supervised	Regression	1. Independent Features; 2. the dependence of Y on X1,X2,… Xp is linear; 3. error terms are uncorrelated since correlated error terms are bad for standard error and confidence interval calculation (e.g. time series problem); 4. error terms are independent from X; 5. E[ε] = 0; 6. error terms have a constant variance	Linear	RSS	R^2; MSE	1. Simple approach to supervised learning; 2. No need to scale	Strong Assumption
Logistic Regression	Supervised	Classification	1. each sample is assigned to one and only one label;	Linear	Negative Log Likelihood	Confusion Matrix, Precision, Recall, etc	1. Not Sensitive; 2. Good for k=2, Bernoulli Classification Problem	Not stable for “well-separated” data
Linear Discriminant Analysis	Supervised	Classification	1. normal (Gaussian) distributions for each class; 2. same covariance matrix Σ in each class	Linear		Confusion Matrix, Precision, Recall, etc	1.Stable for “well-separate” data; 2. Good for k>2, it provides low-dimensional views of the data; 3. Good for n << p	Sensitive to the observations that far from the decision boundary
Naive Bayes	Supervised	Classification	conditional independence model in each class	Linear		Confusion Matrix, Precision, Recall, etc	1. good for computation (from joint probability to conditional probability); 2. Rather robust to isolated noise samples, since we average large samples; 3. Handles missing value by ignoring them (do not disregard the record/data point, just disregard the missing feature); 4. Rather robust to irrelevant attributes; 5. useful when p is very large	1. strong assumption; 2. Not robust to redundant attributes (correlated attributes), because they break down the conditional independence assumption
KNN	Supervised	Classifier	similar things are always in close proximity	Non-linear		Confusion Matrix, Precision, Recall, etc	1.no training needed, just measure the distance; 2. Simple to implement; 3. Few tuning parameter: just K and distance; 4. Flexible: classes do not need to be linearly separable	1. KNN cannot tell us which predictor is more important; 2. Computationally expensive: we need to calculate distance from new observation to all samples; 3. Sensitive to imbalanced dataset: may get poor results for infrequent classes; 4. Sensitive to irrelevant inputs: irrelevant inputs make distances less meaningful for identifying similar neighbors
CART	Supervised	both		doesn’t matter		RSS for Regression; Misclassification rate/ Gini index for Classification	1. Easy interpretation; 2. Display graphically; 3. Trees can easily handle qualitative predictors without the need to create dummy variables	1. Poor prediction accuracy; 2. Trees can be very non-robust. A small change in the data can cause a large change in the final estimated tree
Bagging	Supervised	both		doesn’t matter		Out-of-bag Error Estimation	1. Improved ensemble method using bootstrap aggregation 2. Better Prediction Accuracy ; 3. reduce variance, avoid overfitting	It is no longer clear which variables are most important to the procedure, so not easy to interpretation
Random Forest	Supervised	both		doesn’t matter			1. Improved Bagged Trees by way of a small tweak that de-correlates the trees; 2. This reduces the variance when we average the trees	Not easy to interpretation
Boosting	Supervised	both		doesn’t matter			1. Boosting is remarkably resistant to overfitting, and it is fast and simple; 2. It improves the performance of many kinds of machine learning algorithms, not only decision tree	1. Really hard to interpretation; 2. Susceptible to noisy data
SVM	Supervised	classification		doesn’t matter	Hinge Loss / Squared Hinge Loss	Hamming Loss	1. Good for “well-separate” data; 2. SVMs are popular in high-dimensional classification problems with p>>n; 3. For nonlinear boundaries, kernel SVMs are popular; 4. More stable; not sensitive to outliers, only depends on support vectors	Results are not probabilities
K-Means Clustering	Unsupervised	Clustering		doesn’t matter		within-cluster variation	Easy to implement	1. Need to find K, but it’s hard to get a perfect one; 2. Sensitive to outliers; 3. Not very robust to changes of the data
Hierarchical Clustering	Unsupervised	Clustering		doesn’t matter			1. Does not need to choose K; 2.	1. Sometimes yield worse (i.e. less accurate) results than K - means clustering for a given number of clusters; 2. Need to consider which height to cut the model; 3. Sensitive to outliers; 4. Not very robust to changes of the data