机器学习:随机森林算法实战

本文介绍了一个使用Apache Spark MLlib库中的随机森林算法进行分类任务的例子。通过对数据集进行预处理和训练模型,评估了预测准确性,并展示了如何进行预测。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

object ForestTest {
  def main(args: Array[String]): Unit = {
    val conf=new SparkConf().setAppName("DesionTrain").setMaster("local[2]")
    val sc=new SparkContext(conf)
    //   
加载数据
    val data=sc.textFile("f://rf.csv").map(lines=>{
      val fields=lines.split(",")
      val lable=fields(fields.length-1).toDouble
      val features=fields.slice(1,fields.length-1).map(x=>x.toDouble)
      LabeledPoint(lable,Vectors.dense(features))
    })
    val labe=data.map(_.label)
    //    配置决策树的参数
    val model=  RandomForest.trainClassifier(data,9,Map[Int,Int](),20,"auto","entropy",30,300)
    val predictionAndLabel = data.map { point =>
      val score = model.predict(point.features)
      (score)
    }
    predictionAndLabel.foreach(x=> println(x))
    //测试准确率
    val acc=labe.zip(predictionAndLabel).filter(x=>{
      x._1.equals(x._2)
    }).count()/labe.count().toDouble
    println("Forest预测患者在医院花费的准确率是")
    println(acc)
  }
}

rf.csv格式如下:

1,1,1,30,2201,0,1,20,210105,51,3,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,0,3.2,3.2,3.2,3.2,0,1,0,1,1,0,10,4680,1,1,80,0,1,2,2,5,0,1,2101,1,1,2101,3,1,1,1,1,1,1,0,0,0,0,1,5467,3253,300133,300133,13,1,300102,4535,0,0,0,0,75357,8907.53,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,55

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值