R语言 randomForest 函数

本文介绍了R语言中randomForest函数的使用,包括其在RStudio中的文档、参数详细解释和示例。函数用于分类和回归任务,提供变量重要性评估、随机特征选择等特性。此外,文章还对比了Python中的RandomForest实现,并总结了R中randomForest函数的主要特点。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

randomForest函数

randomForest在RStudio中的Documentation:

randomForest {randomForest}    R Documentation
Classification and Regression with Random Forest
Description
randomForest implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification and regression. It can also be used in unsupervised mode for assessing proximities among data points.

Usage
## S3 method for class 'formula'
randomForest(formula, data=NULL, ..., subset, na.action=na.fail)
## Default S3 method:
randomForest(x, y=NULL,  xtest=NULL, ytest=NULL, ntree=500,
             mtry=if (!is.null(y) && !is.factor(y))
             max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))),
             replace=TRUE, classwt=NULL, cutoff, strata,
             sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
             nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
             maxnodes = NULL,
             importance=FALSE, localImp=FALSE, nPerm=1,
             proximity, oob.prox=proximity,
             norm.votes=TRUE, do.trace=FALSE,
             keep.forest=!is.null(y) && is.null(xtest), corr.bias=FALSE,
             keep.inbag=FALSE, ...)
## S3 method for class 'randomForest'
print(x, ...)
Arguments
data    
an optional data frame containing the variables in the model. By default the variables are taken from the environment which randomForest is called from.

subset    
an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.)

na.action    
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)

默认na.action = na.fail,即不允许有na存在。如果数据集中没有missing value,就不需要修改此argument.

实际应用时若数据集中有missing value,则可修改为na.action = na.omit(忽略NA)或者na.action = na.roughfix(简单填充缺失值)。

用rfImpute()能够获得更优的拟合填充值。 

x, formula    
a data frame or a matrix of predictors, or a formula describing the model to be fitted (for the print method, an randomForest object).

y    
A response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, randomForest will run in unsupervised mode.

xtest    
a data frame or matrix (like x) containing predictors for the test set.

ytest    
response for the test set.

ntree    
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.

注意到这里的ntree即是tree总数,即B的值。R中默认是500,Python中默认为100.

mtry    
Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3)

这里的mtry即是RSF size, 也就是M的取值。

对Categorical Y默认取值为RSF size = floor(sqrt(M));对Continuous Y默认取值为RSF size = floor(M/3) 

replace    
Should sampling of cases be done with or

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值