Spark机器学习之空气质量预测和评价

本文介绍了一个使用线性回归预测空气质量指数(AQI)的模型。该模型基于SO2、CO、NO2等污染物浓度数据进行训练,并能够对新的空气质量数据进行预测评估。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

根据空气测得的数据,对空气质量评价

以下是部分空气数据:

ID,DAYTIME,CITYCODE,SO2 ,CO,NO2 ,O3, PM10,PM2_5,AQI,MEASURE, TIMEPOINT         

0:110000:20141120,20141120,110000,31,3.939,141,8,368,301,351,6,2014-11-20
0:110000:20141208,20141208,110000,32,1.431,65,37,89,60,82,2,2014-12-08
0:110000:20141220,20141220,110000,10,0.478,25,48,18,9,32,1,2014-12-20
0:110000:20150108,20150108,110000,53,3.305,101,12,176,143,190,4,2015-01-08
0:110000:20150120,20150120,110000,45,2.029,76,23,112,85,113,3,2015-01-20

0:110000:20150212,20150212,110000,17,0.832,47,74,49,36,59,2,2015-02-12

更多数据:https://pan.baidu.com/s/1uVSpjx4-yQe1gXVpnzHNeQ

数据是以 “,”分割,其中 MEASURE 是评价,(1:优,2:良,3:轻度污染,4:中度污染,5:重度污染,6:严重污染)

实现 根据数据对空气进行评价

import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.sql.SparkSession

object TestML {

  def main(args: Array[String]): Unit = {
    val dataDir = "file:///d:/docment/air/data/logs.txt";
    val sess = SparkSession.builder().appName("wangjk").master("local[2]").config("spark.testing.memory", "2147480000").getOrCreate();
    val sc = sess.sparkContext;

    //定义样例类
    case class Air(SO2: Double, CO: Double,
                   NO2: Double, O3: Double, PM10: Double,
                   PM2_5: Double, AQI: Double, MEASURE: Double)

    //变换
    val rd1=sc.textFile(dataDir).map(_.split(",")).map(e =>
      Air(e(3).toDouble, e(4).toDouble, e(5).toDouble, e(6).toDouble,
        e(7).toDouble, e(8).toDouble, e(9).toDouble, e(10).toDouble)
      )

    //转换RDDDataFrame
    import sess.implicits._
    val trainDF= rd1.map(w=>(
      w.MEASURE,Vectors.dense(w.SO2,w.CO,w.NO2,w.O3,w.PM10,w.PM2_5,w.AQI))).toDF("label", "features")


    trainDF.show()


    //创建线性回归对象
    var lr=new LinearRegression()
    //迭代次数
    lr.setMaxIter(20)
    //创建模型
    val model=lr.fit(trainDF)
    //测试数据
     val testDF = sess.createDataFrame(Seq((6.0, Vectors.dense(31 , 3.939 , 141 , 8 , 368 , 301 ,351)),
      (2.0,Vectors.dense(32 , 1.431 ,65 , 37 , 89 , 60 , 82)),
      (1.0, Vectors.dense(10,  0.478, 25, 48 , 18, 9, 32)))).toDF("label", "features")

   //保存模型
    model.write.overwrite().save("file:///d:/docment/air/model/")
     val tested= model.transform(testDF).select("features","label","prediction");

     tested.show()

  }
}

结果如下:

+--------------------+-----+------------------+
|            features|
label|        prediction|
+--------------------+-----+------------------+
|[31.0,3.939,141.0...| 
6.0| 6.424981475397239|
|[32.0,1.431,65.0,...| 
2.0|2.1950808612353887|
|[10.0,0.478,25.0,...| 
1.0|1.2091385222200972|
+--------------------+-----+------------------+

其中label是原数据评价,即数据库中的真实评价  prediction 为预测值。


下面是利用保存的模型计算数据:

测试数据与上面的相同,但label得值是随便给的  之前是 6.0 ,2.0 ,1.0  改为 1.0 ,6.0 , 6.0 

主要是测试评价是否会影响结果


import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.sql.SparkSession

object MLTest {

  def main(args: Array[String]): Unit = {

    val sess = SparkSession.builder().appName("wangjk").master("local[2]")
      .config("spark.testing.memory", "2147480000").getOrCreate();
    val sc = sess.sparkContext;
    //加载模型
    val model = LinearRegressionModel.load("file:///d:/docment/air/model/")

    //测试数据
    val testDF = sess.createDataFrame(Seq((1.0, Vectors.dense(31 , 3.939 , 141 , 8 , 368 , 301 ,351)),
      (6.0,Vectors.dense(32 , 1.431 ,65 , 37 , 89 , 60 , 82)),
      (6.0, Vectors.dense(10,  0.478, 25, 48 , 18, 9, 32)))).toDF("label", "features")

    val tested= model.transform(testDF).select("features","label","prediction");
    tested.show()

  }


}

结果如下:

+--------------------+-----+------------------+
|            features|label|        prediction|
+--------------------+-----+------------------+
|[31.0,3.939,141.0...|  1.0| 6.424981475397239|
|[32.0,1.431,65.0,...|  6.0|2.1950808612353887|
|[10.0,0.478,25.0,...|  6.0|1.2091385222200972|
+--------------------+-----+------------------+

prediction 与之前的数据相同



评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值