[R] R package "benchmark"

本文深入解析R编程中性能评估工具Benchmark包,包括其核心功能、参数使用及返回结果解释,帮助读者掌握如何有效利用Benchmark进行R代码性能比较。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Benchmark package consists of only one function benchmark, which is a simple wrapper of system.time to evaluate expression of R code.

Usage:

benchmark(...,columns = c("test", "replications", "elapsed", "relative", "user.self", "sys.self",
"user.child", "sys.child"), order = "test", replications = 100, environment = parent.frame(), relative = "elapsed")

It means evaluate expression ... and return the results of c("test", "replications", "elapsed", "relative", "user.self", "sys.self",
"user.child", "sys.child")
. All column result can be choose which to return using column setting.

Now give the meaning of each column:

  • test means expression (R code) to be evaluated.
  • replications means the number of replications used within each individual benchmark.
  • elapsed means the total elapsed times for the currently running R process
  • user.self means ‘user.time’, which is the CPU time charged for the execution of user instructions of the calling process.
  • sys.selt mean ‘system.time’, which is the CPU time charged for execution by the system on behalf of the calling process.
  • user.child and sys.child are not available on Windows and will always be given as NA.
  • relative means the relative value or time using of the evaluated expressions.

So the we can use elapsed time to compare R processes, which can be seen relative directly.

From the blog,
the result of benchmark is

From the column relative, we see that

  • rbindlist is the fastest way
### 使用 R 语言中的 `mlr3` 包进行特征选择 为了在 R 中利用 `mlr3` 进行有效的特征选择,可以遵循特定的方法论。首先,确保已经安装了所需的软件包: ```r options(repos = c( mlrorg = 'https://mlr-org.r-universe.dev', raphaels1 = 'https://raphaels1.r-universe.dev', CRAN = 'https://cloud.r-project.org' )) install.packages(c("ggplot2", "mlr3benchmark", "mlr3pipelines", "mlr3proba", "mlr3tuning", "survivalmodels")) remotes::install_github("mlr-org/mlr3extralearners") ``` 接着,加载必要的库并设置环境。 #### 创建任务和学习器对象 创建一个基于数据的任务实例,并定义要使用的机器学习算法作为学习器。这里以逻辑回归为例[^3]: ```r library(mlr3) task <- tsk("pima") # PIMA Indians Diabetes Database example task from the package learner <- lrn("classif.log_reg") ``` #### 应用过滤方法进行初步筛选 通过计算单个特征的重要性来进行初始的选择过程。这种方法简单快速,适合于大规模数据集预处理阶段。 ```r library(mlr3filters) filter <- flt("auc") set.seed(123) # For reproducibility of results when using randomness. resampling = rsmp("holdout") fselect_result <- fselect(filter, task, resampling, store_models = TRUE) print(fselect_result) ``` 上述代码片段展示了如何应用 AUC (Area Under Curve) 指标来衡量各个输入变量对于目标变量的影响程度,并据此决定保留哪些特征[^4]。 #### 利用嵌入式方法进一步优化 除了外部的过滤手段外,还可以考虑集成到具体模型内部机制里的特征选取策略——即所谓的“嵌入式”方式。这种方式通常会随着模型训练同步完成,在此过程中自动识别重要性较低甚至无关紧要的属性予以剔除。 ```r library(mlr3pipelines) graph_learner <- as_learner(ppl("branch", branch_default = po("featureless"), branch_custom = ppl("nop"))) inner_resampling = rsmp("cv", folds = 3L) outer_resampling = rsmp("cv", folds = 5L) at = AutoTuner$new( learner = graph_learner, resampling = inner_resampling, measure = msr("classif.ce"), terminator = trm("evals", n_evals = 5), tuner = tnr("random_search") ) instance = TuningInstanceSingleCrit$new( task = task, learner = at, resampling = outer_resampling, measure = msr("classif.ce"), terminator = trm("none") ) result = tune(instance) print(result) ``` 这段脚本说明了一个更复杂的流程,其中包含了交叉验证在内的多种技术组合,旨在找到最优参数配置的同时也实现了特征空间的有效压缩[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值