R语言 randomForest 函数

最新推荐文章于 2023-12-22 23:46:05 发布

海色天蓝

最新推荐文章于 2023-12-22 23:46:05 发布

阅读量4.8k

点赞数

分类专栏： R语言randomForest 文章标签： r语言人工智能

本文链接：https://blog.youkuaiyun.com/SLANG006/article/details/120402168

版权

本文介绍了R语言中randomForest函数的使用，包括其在RStudio中的文档、参数详细解释和示例。函数用于分类和回归任务，提供变量重要性评估、随机特征选择等特性。此外，文章还对比了Python中的RandomForest实现，并总结了R中randomForest函数的主要特点。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

randomForest函数

randomForest在RStudio中的Documentation：

randomForest {randomForest}   R Documentation
Classification and Regression with Random Forest
Description
randomForest implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification and regression. It can also be used in unsupervised mode for assessing proximities among data points.

Usage
## S3 method for class 'formula'
randomForest(formula, data=NULL, ..., subset, na.action=na.fail)
## Default S3 method:
randomForest(x, y=NULL, xtest=NULL, ytest=NULL, ntree=500,
mtry=if (!is.null(y) && !is.factor(y))
max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))),
replace=TRUE, classwt=NULL, cutoff, strata,
sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
maxnodes = NULL,
importance=FALSE, localImp=FALSE, nPerm=1,
proximity, oob.prox=proximity,
norm.votes=TRUE, do.trace=FALSE,
keep.forest=!is.null(y) && is.null(xtest), corr.bias=FALSE,
keep.inbag=FALSE, ...)
## S3 method for class 'randomForest'
print(x, ...)
Arguments
data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which randomForest is called from.

subset
an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.)

na.action
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)

默认na.action = na.fail，即不允许有na存在。如果数据集中没有missing value，就不需要修改此argument.

实际应用时若数据集中有missing value，则可修改为na.action = na.omit（忽略NA）或者na.action = na.roughfix（简单填充缺失值）。

用rfImpute()能够获得更优的拟合填充值。

x, formula
a data frame or a matrix of predictors, or a formula describing the model to be fitted (for the print method, an randomForest object).

y
A response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, randomForest will run in unsupervised mode.

xtest
a data frame or matrix (like x) containing predictors for the test set.

ytest
response for the test set.

ntree
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.

注意到这里的ntree即是tree总数，即B的值。R中默认是500，Python中默认为100.

mtry
Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3)

这里的mtry即是RSF size, 也就是M的取值。

对Categorical Y默认取值为RSF size = floor(sqrt(M))；对Continuous Y默认取值为RSF size = floor(M/3)

replace
Should sampling of cases be done with or

最低0.47元/天解锁文章