xgboost-spark-scala_spark scala xgboost 回归-优快云博客

本文链接：https://blog.youkuaiyun.com/maokunnn/article/details/89077791

本文介绍了作者使用Scala结合Spark进行XGBoost模型调参的经验，包括关键步骤和代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

今天学习写scala，拿xgboost试一下～

先记一下xgboost调参要点：

7.xgboost中比较重要的参数介绍
（1）objective [ default=reg:linear ] 定义学习任务及相应的学习目标，可选的目标函数如下：

“reg:linear” –线性回归。
“reg:logistic” –逻辑回归。
“binary:logistic” –二分类的逻辑回归问题，输出为概率。
“binary:logitraw” –二分类的逻辑回归问题，输出的结果为wTx。
“count:poisson” –计数问题的poisson回归，输出结果为poisson分布。 在poisson回归中，max_delta_step的缺省值为0.7。(used to safeguard optimization)
“multi:softmax” –让XGBoost采用softmax目标函数处理多分类问题，同时需要设置参数num_class（类别个数）
“multi:softprob” –和softmax一样，但是输出的是ndata * nclass的向量，可以将该向量reshape成ndata行nclass列的矩阵。没行数据表示样本所属于每个类别的概率。
“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss
（2）’eval_metric’ The choices are listed below，评估指标:

“rmse”: root mean square error
“logloss”: negative log-likelihood
“error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negativ