Spark 2.4.0编程指南--Spark DataSources

本文详细介绍Spark2.4.0中各种数据源的读写操作,包括parquet、orc、csv、json等格式文件,以及通过SparkSQL直接运行文件,使用BucketyBy和PartitionBy进行文件读写,合并dataSet,和mysql的JDBC读写操作。

Spark 2.4.0编程指南–Spark DataSources

更多资源

视频

width="800" height="500" src="//player.bilibili.com/player.html?aid=38193405&cid=68636905&page=5" scrolling="no" border="0" allowfullscreen="true">

前置条件

  • 已安装好java(选用的是java 1.8.0_191)
  • 已安装好scala(选用的是scala 2.11.121)
  • 已安装好hadoop(选用的是Hadoop 2.9.2)
  • 已安装好hive(选用的是apache-hive-3.1.1-bin)
  • 已安装好spark(选用的是spark-2.4.0-bin-hadoop2.7)

技能标签

  • parquet、orc、csv、json、text、avro格式文件的读、写
  • spark.sql直接运行文件
  • BucketyBy,PartitionBy 读写文件
  • mergining dataSet
  • jdbc(mysql)读写操作
  • Hive操作(create drop database ,create,insert,show,truncate,drop table)
  • 官网: http://spark.apache.org/docs/2.4.0/sql-data-sources.html

常规 Load/Save (parquet)

读取parquest格式文件

  • 读取parquet格式文件users.parquet
  • users.parquet 直接打开是十六进制数据
spark.read.load("hdfs://standalone.com:9000/home/liuwen/data/parquest/users.parquet").show

//+------+--------------+----------------+
//|  name|favorite_color|favorite_numbers|
//+------+--------------+----------------+
//|Alyssa|          null|  [3, 9, 15, 20]|
//|   Ben|           red|              []|
//+------+--------------+----------------+

保存parquest格式文件

  • 读取parquet格式文件users.parquet
  • users.parquet 直接打开是十六进制数据
  • 保存的数据会在这个目录下namesAndFavColors.parquet
val usersDF = spark.read.load("hdfs://standalone.com:9000/home/liuwen/data/parquest/users.parquet")
usersDF.select("name", "favorite_color").write.save("hdfs://standalone.com:9000/home/liuwen/data/parquest/namesAndFavColors.parquet")

spark.read.load("hdfs://m0:9000/home/liuwen/data/parquest/namesAndFavColors.parquet").show


//+------+--------------+
//|  name|favorite_color|
//+------+--------------+
//|Alyssa|          null|
//|   Ben|           red|
//+------+--------------+

Load/Save (json)

读取json格式文件

  • 读取json格式文件people.json
  • people.json存储的是json格式的数据
  • 注意,json格式存储的文件,每行中都包含字段名称信息,比较占空间,不推荐使用,可以考虑用默认的 parquet格式存储
spark.read.format("json").load("hdfs://standalone.com:9000/home/liuwen/data/json/people.json").show

//+----+-------+
//| age|   name|
//+----+-------+
//|null|Michael|
//|  30|   Andy|
//|  19| Justin|
//+----+-------+

保存json格式文件

  • 读取json格式文件people.json
  • people.json存储的是json格式的数据
spark.read.format("json").load("hdfs://standalone.com:9000/home/liuwen/data/json/people.json").show

//+----+-------+
//| age|   name|
//+----+-------+
//|null|Michael|
//|  30|   Andy|
//|  19| Justin|
//+----+-------+


//保存json格式数据到hdfs上面
ds.select("name", "age").write.format("json").save("hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json")


//读取保存的数据
 spark.read.format("json").load("hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json").show
 
//+----+-------+
//| age|   name|
//+----+-------+
//|null|Michael|
//|  30|   Andy|
//|  19| Justin|
//+----+-------+

  • HDFS查看保存的文件信息
 hdfs dfs -ls -R hdfs://standalone.com:9000/home/liuwen/output/json
 
// drwxr-xr-x   - liuwen supergroup          0 2018-12-18 17:44 //hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json
//-rw-r--r--   1 liuwen supergroup          0 2018-12-18 17:44 //hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json/_SUCCESS
//-rw-r--r--   1 liuwen supergroup         71 2018-12-18 17:44 //hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json/part-00000-6690fee8-33d3-413c-8364-927f02593ff2-c000.json


 hdfs dfs -cat hdfs://standalone.com:9000/home/liuwen/output/json/namesAndAges.json/*
//数据在文件 namesAndAges.json/part-00000-6690fee8-33d3-413c-8364-927f02593ff2-c000.json
//{"name":"Michael"}
//{"name":"Andy","age":30}
//{"name":"Justin","age":19}
//[liuwen@standalone ~]$ 

Load/Save (csv)

读取csv格式文件

 val peopleDFCsv = spark.read.format("csv").option("sep", ";").option("inferSchema", "true").option("header", "true").load("hdfs://m0:9000/home/liuwen/data/csv/people.csv")
 //peopleDFCsv: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
 peopleDFCsv.show
 
// +-----+---+---------+
//| name|age|      job|
//+-----+---+---------+
//|Jorge| 30|Developer|
//|  Bob| 32|Developer|
//+-----+---+---------+


写csv格式文件

//保存json格式数据到hdfs上面
peopleDFCsv.select("name", "age").write.format("csv").save("hdfs://standalone.com:9000/home/liuwen/output/csv/people.csv")

spark.read.format("csv").option("sep", ",").option("inferSchema", "true").option("header", "true").load("hdfs://standalone.com:9000//home/liuwen/output/csv/people.csv").show

//+-----+---+
//|Jorge| 30|
//+-----+---+
//|  Bob| 32|
//+-----+---+
  • 查看hdfs上csv文件
hdfs dfs -ls -R   hdfs://m0:9000/home/liuwen/output/csv/

//drwxr-xr-x   - liuwen supergroup          0 2018-12-18 18:04 hdfs://m0:9000/home/liuwen/output/csv/people.csv
//-rw-r--r--   1 liuwen supergroup          0 2018-12-18 18:04 hdfs://m0:9000/home/liuwen/output/csv/people.csv/_SUCCESS
//-rw-r--r--   1 liuwen supergroup         16 2018-12-18 18:04 hdfs://m0:9000/home/liuwen/output/csv/people.csv/part-00000-d6ad5563-5908-4c0e-8e6f-f13cd0ff445e-c000.csv

hdfs dfs -text hdfs://m0:9000/home/liuwen/output/csv/people.csv/part-00000-d6ad5563-5908-4c0e-8e6f-f13cd0ff445e-c000.csv

//Jorge,30
//Bob,32

Load/Save (orc)

写orc格式文件

val usersDF = spark.read.load("hdfs://standalone.com:9000/home/liuwen/data/parquest/users.parquet")
usersDF.show
//+------+--------------+----------------+
//|  name|favorite_color|favorite_numbers|
//+------+--------------+----------------+
//|Alyssa|          null|  [3, 9, 15, 20]|
//|   Ben|           red|              []|
//+------+--------------+----------------+

usersDF.write.format("orc").option("orc.bloom.filter.columns", "favorite_color").option("orc.dictionary.key.threshold", "1.0").save("hdfs://standalone.com:9000/home/liuwen/output/orc/users_with_options.orc")


读orc格式文件


 spark.read.format("orc").load("hdfs://standalone.com:9000/home/liuwen/output/orc/users_with_options.orc").show
 
//+------+--------------+----------------+
//|  name|favorite_color|favorite_numbers|
//+------+--------------+----------------+
//|Alyssa|          null|  [3, 9, 15, 20]|
//|   Ben|           red|              []|
//+------+--------------+----------------+

直接在文件上运行sql

  • 直接在文件上运行sql
val sqlDF = spark.sql("SELECT * FROM parquet.`hdfs://standalone.com:9000/home/liuwen/data/parquest/users.parquet`")

sqlDF.show
//+------+--------------+----------------+
//|  name|favorite_color|favorite_numbers|
//+------+--------------+----------------+
//|Alyssa|          null|  [3, 9, 15, 20]|
//|   Ben|           red|              []|
//+------+--------------+----------------+

saveAsTable

  • 把数据保存为Hive表
val sqlDF = spark.read.format("json").load("hdfs://standalone.com:9000/home/liuwen/output/json/employ.json")
    sqlDF.show
    //+----+-------+
    //| age|   name|
    //+----+-------+
    //|null|Michael|
    //|  30|   Andy|
    //|  19| Justin|

    sqlDF.write.saveAsTable("people_bucketed")

    val sqlDF2 = spark.sql("select * from people_bucketed")

  • 读取hive表中的数据
 val sqlDF = spark.sql("select * from people_bucketed")

bucket

  • 把数据保存为Hive表,bucketBy 分桶
val sqlDF = spark.read.format("json").load("hdfs://standalone.com:9000/home/liuwen/output/json/employ.json")
    sqlDF.show
    //+----+-------+
    //| age|   name|
    //+----+-------+
    //|null|Michael|
    //|  30|   Andy|
    //|  19| Justin|

    sqlDF.write.bucketBy(42, "name").sortBy("salary")
      .saveAsTable("people_bucketed3")

    val sqlDF2 = spark.sql("select * from people_bucketed3")
    sqlDF2.show

  • 读取hive表中的数据
 val sqlDF = spark.sql("select * from people_bucketed3")

partitionBy

  • 把数据保存为Hive表,partitionBy 按字段分区
  val spark = sparkSession(true)
    val usersDF = spark.read.load("hdfs://standalone.com:9000/home/liuwen/data/parquest/users.parquet")

    usersDF.show()
    //+------+--------------+----------------+
    //|  name|favorite_color|favorite_numbers|
    //+------+--------------+----------------+
    //|Alyssa|          null|  [3, 9, 15, 20]|
    //|   Ben|           red|              []|
    //+------+--------------+----------------+


    //保存在HDFS上 hdfs://standalone.com:9000/user/liuwen/namesPartByColor.parquet
    usersDF.write.partitionBy("favorite_color").format("parquet").save("namesPartByColor.parquet")

  • 读取hive表中的数据
 val sqlDF = spark.sql("select * from namesPartByColor.parquet")

dataFrame的合并

  • 把两个dataSet合并,就是把两个dataSet先保存到hdfs的文件上,这两个dataSet的文件在同一个目录上,再读这个目录下的文件

    import spark.implicits._
    val df1 = Seq(1,2,3,5).map(x => (x,x * x)).toDF("a","b")
    val df2 = Seq(10,20,30,50).map(x => (x,x * x)).toDF("a","b")

    df1.write.parquet("data/test_table/key=1")
    df1.show()
//    +---+---+
//    |  a|  b|
//    +---+---+
//    |  1|  1|
//    |  2|  4|
//    |  3|  9|
//    |  5| 25|
//    +---+---+
    df2.write.parquet("data/test_table/key=2")
    df2.show()
//    +---+----+
//    |  a|   b|
//    +---+----+
//    | 10| 100|
//    | 20| 400|
//    | 30| 900|
//    | 50|2500|
//    +---+----+

    val mergedDF = spark.read.option("mergeSchema", "true").parquet("data/test_table")
    mergedDF.printSchema()
//    root
//    |-- a: integer (nullable = true)
//    |-- b: integer (nullable = true)
//    |-- key: integer (nullable = true)


    mergedDF.show()

//    +---+----+---+
//    |  a|   b|key|
//    +---+----+---+
//    | 10| 100|  2|
//    | 20| 400|  2|
//    | 30| 900|  2|
//    | 50|2500|  2|
//    |  1|   1|  1|
//    |  2|   4|  1|
//    |  3|   9|  1|
//    |  5|  25|  1|
//    +---+----+---+

mysql(jdbc)

  • 读mysql的数据
val connectionProperties = new Properties()
    connectionProperties.put("user","admin")
    connectionProperties.put("password","000000")

    val jdbcDF = spark.read.jdbc("jdbc:mysql://mysql.com:3306/test","test.test2",connectionProperties)

    jdbcDF.show()

  • 往mysql写数据

    val connectionProperties = new Properties()
    connectionProperties.put("user","admin")
    connectionProperties.put("password","000000")

    val jdbcDF = spark.read.jdbc("jdbc:mysql://macbookmysql.com:3306/test","test.test",connectionProperties)

    jdbcDF.show()


    jdbcDF.write.mode(SaveMode.Overwrite).jdbc("jdbc:mysql://macbookmysql.com:3306/test","test.test3",connectionProperties)

spark hive

  • 就是支持hive的语法,只不过是在spark中执行,把hive的数据转成dataFrame,供spark算子计算

     val warehouseLocation = new File("spark-warehouse").getAbsolutePath

    val spark = SparkSession
      .builder()
      .master("local")
     // .master("spark://standalone.com:7077")
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

    import spark.sql

    sql("CREATE database IF NOT EXISTS test_tmp")
    sql("use test_tmp")
    sql("CREATE TABLE IF NOT EXISTS student(name VARCHAR(64), age INT)")
    sql("INSERT INTO TABLE student  VALUES ('小王', 35), ('小李', 50)")

end

"C:\Program Files\Java\jdk1.8.0_281\bin\java.exe" "-javaagent:D:\新建文件夹 (2)\IDEA\idea\IntelliJ IDEA 2019.3.3\lib\idea_rt.jar=59342" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_281\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_281\jre\lib\rt.jar;D:\carspark\out\production\carspark;C:\Users\wyatt\.ivy2\cache\org.scala-lang\scala-library\jars\scala-library-2.12.10.jar;C:\Users\wyatt\.ivy2\cache\org.scala-lang\scala-reflect\jars\scala-reflect-2.12.10.jar;C:\Users\wyatt\.ivy2\cache\org.scala-lang\scala-library\srcs\scala-library-2.12.10-sources.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\accessors-smart-1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\activation-1.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\aircompressor-0.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\algebra_2.12-2.0.0-M2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\antlr-runtime-3.5.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\antlr4-runtime-4.8-1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\aopalliance-1.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\aopalliance-repackaged-2.6.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\arpack_combined_all-0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\arrow-format-2.0.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\arrow-memory-core-2.0.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\arrow-memory-netty-2.0.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\audience-annotations-0.5.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\automaton-1.11-8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\avro-1.8.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\avro-ipc-1.8.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\avro-mapred-1.8.2-hadoop2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\bonecp-0.8.0.RELEASE.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\breeze-macros_2.12-1.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\breeze_2.12-1.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\cats-kernel_2.12-2.0.0-M4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\chill-java-0.9.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\chill_2.12-0.9.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-beanutils-1.9.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-cli-1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-codec-1.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-collections-3.2.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-compiler-3.0.16.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-compress-1.20.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-configuration2-2.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-crypto-1.1.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-daemon-1.0.13.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-dbcp-1.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-httpclient-3.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-io-2.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-lang-2.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-lang3-3.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-logging-1.1.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-math3-3.4.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-net-3.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-pool-1.5.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\commons-text-1.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\compress-lzf-1.0.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\core-1.1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\curator-client-2.13.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\curator-framework-2.13.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\curator-recipes-2.13.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\datanucleus-api-jdo-4.2.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\datanucleus-core-4.1.17.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\datanucleus-rdbms-4.1.19.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\derby-10.12.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\dnsjava-2.1.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\ehcache-3.3.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\flatbuffers-java-1.9.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\generex-1.0.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\geronimo-jcache_1.0_spec-1.0-alpha-1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\gson-2.2.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\guava-14.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\guice-4.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\guice-servlet-4.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-annotations-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-auth-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-common-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-hdfs-client-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-mapreduce-client-common-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-mapreduce-client-core-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-mapreduce-client-jobclient-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-api-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-client-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-common-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-registry-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-server-common-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hadoop-yarn-server-web-proxy-3.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\HikariCP-2.5.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-beeline-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-cli-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-common-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-exec-2.3.7-core.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-jdbc-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-llap-common-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-metastore-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-serde-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-service-rpc-3.1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-shims-0.23-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-shims-common-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-shims-scheduler-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-storage-api-2.7.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hive-vector-code-gen-2.3.7.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hk2-api-2.6.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hk2-locator-2.6.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\hk2-utils-2.6.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\htrace-core4-4.1.0-incubating.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\httpclient-4.5.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\httpcore-4.4.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\istack-commons-runtime-3.0.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\ivy-2.4.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-annotations-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-core-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-core-asl-1.9.13.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-databind-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-dataformat-yaml-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-datatype-jsr310-2.11.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-jaxrs-base-2.9.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-jaxrs-json-provider-2.9.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-mapper-asl-1.9.13.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-module-jaxb-annotations-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-module-paranamer-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jackson-module-scala_2.12-2.10.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.activation-api-1.2.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.annotation-api-1.3.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.inject-2.6.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.servlet-api-4.0.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.validation-api-2.0.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.ws.rs-api-2.1.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jakarta.xml.bind-api-2.3.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\janino-3.0.16.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\javassist-3.25.0-GA.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\javax.inject-1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\javax.jdo-3.2.0-m3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\javolution-5.5.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jaxb-api-2.2.11.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jaxb-runtime-2.3.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jcip-annotations-1.0-1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jcl-over-slf4j-1.7.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jdo-api-3.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-client-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-common-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-container-servlet-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-container-servlet-core-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-hk2-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-media-jaxb-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jersey-server-2.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\JLargeArrays-1.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jline-2.14.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\joda-time-2.10.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jodd-core-3.5.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jpam-1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json-1.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json-smart-2.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json4s-ast_2.12-3.7.0-M5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json4s-core_2.12-3.7.0-M5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json4s-jackson_2.12-3.7.0-M5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\json4s-scalap_2.12-3.7.0-M5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jsp-api-2.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jsr305-3.0.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jta-1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\JTransforms-3.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\jul-to-slf4j-1.7.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-admin-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-client-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-common-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-core-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-crypto-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-identity-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-server-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-simplekdc-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerb-util-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerby-asn1-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerby-config-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerby-pkix-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerby-util-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kerby-xdr-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kryo-shaded-4.0.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-client-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-admissionregistration-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-apiextensions-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-apps-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-autoscaling-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-batch-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-certificates-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-common-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-coordination-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-core-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-discovery-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-events-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-extensions-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-metrics-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-networking-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-policy-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-rbac-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-scheduling-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-settings-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\kubernetes-model-storageclass-4.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\leveldbjni-all-1.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\libfb303-0.9.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\libthrift-0.12.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\log4j-1.2.17.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\logging-interceptor-3.12.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\lz4-java-1.7.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\machinist_2.12-0.6.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\macro-compat_2.12-1.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\mesos-1.4.0-shaded-protobuf.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\metrics-core-4.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\metrics-graphite-4.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\metrics-jmx-4.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\metrics-json-4.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\metrics-jvm-4.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\minlog-1.3.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\netty-all-4.1.51.Final.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\nimbus-jose-jwt-4.41.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\objenesis-2.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\okhttp-2.7.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\okhttp-3.12.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\okio-1.14.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\opencsv-2.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\orc-core-1.5.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\orc-mapreduce-1.5.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\orc-shims-1.5.12.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\oro-2.0.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\osgi-resource-locator-1.0.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\paranamer-2.8.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-column-1.10.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-common-1.10.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-encoding-1.10.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-format-2.4.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-hadoop-1.10.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\parquet-jackson-1.10.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\protobuf-java-2.5.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\py4j-0.10.9.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\pyrolite-4.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\re2j-1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\RoaringBitmap-0.9.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-collection-compat_2.12-2.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-compiler-2.12.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-library-2.12.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-parser-combinators_2.12-1.1.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-reflect-2.12.10.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\scala-xml_2.12-1.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\shapeless_2.12-2.3.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\shims-0.9.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\slf4j-api-1.7.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\slf4j-log4j12-1.7.30.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\snakeyaml-1.24.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\snappy-java-1.1.8.2.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-catalyst_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-core_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-graphx_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-hive-thriftserver_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-hive_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-kubernetes_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-kvstore_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-launcher_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-mesos_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-mllib-local_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-mllib_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-network-common_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-network-shuffle_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-repl_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-sketch_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-sql_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-streaming_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-tags_2.12-3.1.1-tests.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-tags_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-unsafe_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spark-yarn_2.12-3.1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spire-macros_2.12-0.17.0-M1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spire-platform_2.12-0.17.0-M1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spire-util_2.12-0.17.0-M1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\spire_2.12-0.17.0-M1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\ST4-4.0.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\stax-api-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\stax2-api-3.1.4.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\stream-2.9.6.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\super-csv-2.2.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\threeten-extra-1.5.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\token-provider-1.0.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\transaction-api-1.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\univocity-parsers-2.9.1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\velocity-1.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\woodstox-core-5.0.3.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\xbean-asm7-shaded-4.15.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\xz-1.5.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\zjsonpatch-0.3.0.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\zookeeper-3.4.14.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\zstd-jni-1.4.8-1.jar;D:\spark\spark-3.1.1-bin-hadoop3.2\jars\arrow-vector-2.0.0.jar" car.LoadModelRideHailing Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 25/06/08 17:05:07 INFO SparkContext: Running Spark version 3.1.1 25/06/08 17:05:07 INFO ResourceUtils: ============================================================== 25/06/08 17:05:07 INFO ResourceUtils: No custom resources configured for spark.driver. 25/06/08 17:05:07 INFO ResourceUtils: ============================================================== 25/06/08 17:05:07 INFO SparkContext: Submitted application: LoadModelRideHailing 25/06/08 17:05:07 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/06/08 17:05:07 INFO ResourceProfile: Limiting resource is cpu 25/06/08 17:05:07 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/06/08 17:05:07 INFO SecurityManager: Changing view acls to: wyatt 25/06/08 17:05:07 INFO SecurityManager: Changing modify acls to: wyatt 25/06/08 17:05:07 INFO SecurityManager: Changing view acls groups to: 25/06/08 17:05:07 INFO SecurityManager: Changing modify acls groups to: 25/06/08 17:05:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(wyatt); groups with view permissions: Set(); users with modify permissions: Set(wyatt); groups with modify permissions: Set() 25/06/08 17:05:07 INFO Utils: Successfully started service 'sparkDriver' on port 59361. 25/06/08 17:05:07 INFO SparkEnv: Registering MapOutputTracker 25/06/08 17:05:07 INFO SparkEnv: Registering BlockManagerMaster 25/06/08 17:05:08 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/06/08 17:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/06/08 17:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/06/08 17:05:08 INFO DiskBlockManager: Created local directory at C:\Users\wyatt\AppData\Local\Temp\blockmgr-8fe065e2-024c-4e2f-8662-45d2fe3de444 25/06/08 17:05:08 INFO MemoryStore: MemoryStore started with capacity 1899.0 MiB 25/06/08 17:05:08 INFO SparkEnv: Registering OutputCommitCoordinator 25/06/08 17:05:08 INFO Utils: Successfully started service 'SparkUI' on port 4040. 25/06/08 17:05:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://windows10.microdone.cn:4040 25/06/08 17:05:08 INFO Executor: Starting executor ID driver on host windows10.microdone.cn 25/06/08 17:05:08 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59392. 25/06/08 17:05:08 INFO NettyBlockTransferService: Server created on windows10.microdone.cn:59392 25/06/08 17:05:08 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/06/08 17:05:08 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, windows10.microdone.cn, 59392, None) 25/06/08 17:05:08 INFO BlockManagerMasterEndpoint: Registering block manager windows10.microdone.cn:59392 with 1899.0 MiB RAM, BlockManagerId(driver, windows10.microdone.cn, 59392, None) 25/06/08 17:05:08 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, windows10.microdone.cn, 59392, None) 25/06/08 17:05:08 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, windows10.microdone.cn, 59392, None) Exception in thread "main" java.lang.IllegalArgumentException: 测试数据中不包含 features 列,请检查数据! at car.LoadModelRideHailing$.main(LoadModelRideHailing.scala:23) at car.LoadModelRideHailing.main(LoadModelRideHailing.scala) 进程已结束,退出代码为 1 package car import org.apache.spark.ml.classification.{LogisticRegressionModel, RandomForestClassificationModel} import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator import org.apache.spark.sql.{SparkSession, functions => F} object LoadModelRideHailing { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[3]") .appName("LoadModelRideHailing") .getOrCreate() spark.sparkContext.setLogLevel("Error") // 使用经过特征工程处理后的测试数据 val TestData = spark.read.option("header", "true").csv("C:\\Users\\wyatt\\Documents\\ride_hailing_test_data.csv") // 将 label 列转换为数值类型 val testDataWithNumericLabel = TestData.withColumn("label", F.col("label").cast("double")) // 检查 features 列是否存在 if (!testDataWithNumericLabel.columns.contains("features")) { throw new IllegalArgumentException("测试数据中不包含 features 列,请检查数据!") } // 修正后的模型路径(确保文件夹存在且包含元数据) val LogisticModel = LogisticRegressionModel.load("C:\\Users\\wyatt\\Documents\\ride_hailing_logistic_model") // 示例路径 val LogisticPre = LogisticModel.transform(testDataWithNumericLabel) val LogisticAcc = new MulticlassClassificationEvaluator() .setLabelCol("label") .setPredictionCol("prediction") .setMetricName("accuracy") .evaluate(LogisticPre) println("逻辑回归模型后期数据准确率:" + LogisticAcc) // 随机森林模型路径同步修正 val RandomForest = RandomForestClassificationModel.load("C:\\Users\\wyatt\\Documents\\ride_hailing_random_forest_model") // 示例路径 val RandomForestPre = RandomForest.transform(testDataWithNumericLabel) val RandomForestAcc = new MulticlassClassificationEvaluator() .setLabelCol("label") .setPredictionCol("prediction") .setMetricName("accuracy") .evaluate(RandomForestPre) println("随机森林模型后期数据准确率:" + RandomForestAcc) spark.stop() } }
最新发布
06-09
评论 2
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值