Calculate Leave-One-Out Prediction for GLM

最新推荐文章于 2024-01-29 12:00:43 发布

RoQuant

最新推荐文章于 2024-01-29 12:00:43 发布

阅读量630

点赞数 1

分类专栏： R

R 专栏收录该内容

421 篇文章

订阅专栏

本文详细介绍了glm模型在开发过程中采用leave-one-out预测法进行交叉验证的方法，并通过实验证明了该方法的有效性。通过实例展示了如何利用loo_predict函数计算leave-one-out预测值，以评估模型的稳定性和预测能力。

In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below:
1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations
2. The out-of-sample prediction for the refitted model is calculated with the removed observation one by one to assemble the LOO, e.g. leave-one-out predicted values for the whole model development sample.
The loo_predict() function below is a general routine to calculate the LOO prediction for any GLM object, which can be further employed to investigate the model stability and predictability.

>
 pkgs <- c('doParallel',
'foreach')

>
lapply(pkgs,
 require, character.only = T)

[[1]]

[1]
TRUE

 

[[2]]

[1]
TRUE

 

>
registerDoParallel(cores
 = 8)

>

>
data(AutoCollision,
 package = "insuranceData")

>
#
 A GAMMA GLM #

>
 model1 <- glm(Severity
 ~ Age + Vehicle_Use, data = AutoCollision, family = Gamma(link
 = "log"))

>
#
 A POISSON GLM #

>
 model2 <- glm(Claim_Count
 ~ Age + Vehicle_Use, data = AutoCollision, family = poisson(link
 = "log"))

>

>
 loo_predict <- function(obj)
 {

+  
 yhat <- foreach(i
 = 1:nrow(obj$data),
 .combine = rbind) %dopar% {

+    
predict(update(obj,
 data = obj$data[-i, ]), obj$data[i,], type = "response")

+  
 }

+  
return(data.frame(result
 = yhat[, 1], row.names = NULL))

+
 }

>
#
 TEST CASE 1

>
 test1 <- loo_predict(model1)

>
 test1$result

 [1]
 303.7393 328.7292 422.6610 375.5023 240.9785 227.6365 288.4404 446.5589

 [9]
 213.9368 244.7808 278.7786 443.2256 213.9262 243.2495 266.9166 409.2565

[17]
 175.0334 172.0683 197.2911 326.5685 187.2529 215.9931 249.9765 349.3873

[25]
 190.1174 218.6321 243.7073 359.9631 192.3655 215.5986 233.1570 348.2781

>
#
 TEST CASE 2

>
 test2 <- loo_predict(model2)

>
 test2$result

 [1] 
 11.15897  37.67273  28.76127  11.54825  50.26364 152.35489 122.23782

 [8] 
 44.57048 129.58158 465.84173 260.48114 107.23832 167.40672 510.41127

[15]
 316.50765 121.75804 172.56928 546.25390 341.03826 134.04303 359.30141

[22]
 977.29107 641.69934 251.32547 248.79229 684.86851 574.13994 238.42209

[29]
 148.77733 504.12221 422.75047 167.61203