Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
25/06/12 17:05:56 INFO SparkContext: Running Spark version 3.0.1
25/06/12 17:05:57 INFO ResourceUtils: ==============================================================
25/06/12 17:05:57 INFO ResourceUtils: Resources for spark.driver:
25/06/12 17:05:57 INFO ResourceUtils: ==============================================================
25/06/12 17:05:57 INFO SparkContext: Submitted application: Movie Rating Analysis
25/06/12 17:05:57 INFO SecurityManager: Changing view acls to: 86187
25/06/12 17:05:57 INFO SecurityManager: Changing modify acls to: 86187
25/06/12 17:05:57 INFO SecurityManager: Changing view acls groups to:
25/06/12 17:05:57 INFO SecurityManager: Changing modify acls groups to:
25/06/12 17:05:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(86187); groups with view permissions: Set(); users with modify permissions: Set(86187); groups with modify permissions: Set()
25/06/12 17:05:59 INFO Utils: Successfully started service 'sparkDriver' on port 61854.
25/06/12 17:05:59 INFO SparkEnv: Registering MapOutputTracker
25/06/12 17:05:59 INFO SparkEnv: Registering BlockManagerMaster
25/06/12 17:05:59 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/06/12 17:05:59 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/06/12 17:05:59 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/06/12 17:05:59 INFO DiskBlockManager: Created local directory at D:\temp\blockmgr-e2ad0ea3-a82e-4e75-ace9-632d74c7ac85
25/06/12 17:05:59 INFO MemoryStore: MemoryStore started with capacity 1969.5 MiB
25/06/12 17:05:59 INFO SparkEnv: Registering OutputCommitCoordinator
25/06/12 17:05:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/06/12 17:05:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://karida:4040
25/06/12 17:05:59 INFO Executor: Starting executor ID driver on host karida
25/06/12 17:05:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61865.
25/06/12 17:05:59 INFO NettyBlockTransferService: Server created on karida:61865
25/06/12 17:05:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/06/12 17:05:59 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, karida, 61865, None)
25/06/12 17:05:59 INFO BlockManagerMasterEndpoint: Registering block manager karida:61865 with 1969.5 MiB RAM, BlockManagerId(driver, karida, 61865, None)
25/06/12 17:05:59 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, karida, 61865, None)
25/06/12 17:05:59 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, karida, 61865, None)
25/06/12 17:06:00 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Users/86187/IdeaProjects/testspark-5.19/spark-warehouse').
25/06/12 17:06:00 INFO SharedState: Warehouse path is 'file:/C:/Users/86187/IdeaProjects/testspark-5.19/spark-warehouse'.
25/06/12 17:06:01 INFO InMemoryFileIndex: It took 33 ms to list leaf files for 1 paths.
25/06/12 17:06:01 INFO InMemoryFileIndex: It took 3 ms to list leaf files for 1 paths.
25/06/12 17:06:03 INFO FileSourceStrategy: Pruning directories with:
25/06/12 17:06:03 INFO FileSourceStrategy: Pushed Filters:
25/06/12 17:06:03 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
25/06/12 17:06:03 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
25/06/12 17:06:03 INFO CodeGenerator: Code generated in 150.2378 ms
25/06/12 17:06:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 284.3 KiB, free 1969.2 MiB)
25/06/12 17:06:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.7 KiB, free 1969.2 MiB)
25/06/12 17:06:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on karida:61865 (size: 23.7 KiB, free: 1969.5 MiB)
25/06/12 17:06:03 INFO SparkContext: Created broadcast 0 from csv at movie1.scala:9
25/06/12 17:06:03 INFO FileSourceScanExec: Planning scan with bin packing, max size: 28788435 bytes, open cost is considered as scanning 4194304 bytes.
25/06/12 17:06:03 INFO SparkContext: Starting job: csv at movie1.scala:9
25/06/12 17:06:03 INFO DAGScheduler: Got job 0 (csv at movie1.scala:9) with 1 output partitions
25/06/12 17:06:03 INFO DAGScheduler: Final stage: ResultStage 0 (csv at movie1.scala:9)
25/06/12 17:06:03 INFO DAGScheduler: Parents of final stage: List()
25/06/12 17:06:03 INFO DAGScheduler: Missing parents: List()
25/06/12 17:06:03 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at csv at movie1.scala:9), which has no missing parents
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 10.7 KiB, free 1969.2 MiB)
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KiB, free 1969.2 MiB)
25/06/12 17:06:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on karida:61865 (size: 5.3 KiB, free: 1969.5 MiB)
25/06/12 17:06:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1223
25/06/12 17:06:04 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at csv at movie1.scala:9) (first 15 tasks are for partitions Vector(0))
25/06/12 17:06:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
25/06/12 17:06:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, karida, executor driver, partition 0, PROCESS_LOCAL, 7741 bytes)
25/06/12 17:06:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
25/06/12 17:06:04 INFO FileScanRDD: Reading File path: file:///F:/大数据技术/ml-1m/ratings.dat, range: 0-24594131, partition values: [empty row]
25/06/12 17:06:04 INFO CodeGenerator: Code generated in 13.2395 ms
25/06/12 17:06:04 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1540 bytes result sent to driver
25/06/12 17:06:04 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 367 ms on karida (executor driver) (1/1)
25/06/12 17:06:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
25/06/12 17:06:04 INFO DAGScheduler: ResultStage 0 (csv at movie1.scala:9) finished in 0.464 s
25/06/12 17:06:04 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
25/06/12 17:06:04 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
25/06/12 17:06:04 INFO DAGScheduler: Job 0 finished: csv at movie1.scala:9, took 0.514178 s
25/06/12 17:06:04 INFO CodeGenerator: Code generated in 9.5756 ms
25/06/12 17:06:04 INFO FileSourceStrategy: Pruning directories with:
25/06/12 17:06:04 INFO FileSourceStrategy: Pushed Filters:
25/06/12 17:06:04 INFO FileSourceStrategy: Post-Scan Filters:
25/06/12 17:06:04 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 284.3 KiB, free 1968.9 MiB)
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 23.7 KiB, free 1968.9 MiB)
25/06/12 17:06:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on karida:61865 (size: 23.7 KiB, free: 1969.4 MiB)
25/06/12 17:06:04 INFO SparkContext: Created broadcast 2 from csv at movie1.scala:9
25/06/12 17:06:04 INFO FileSourceScanExec: Planning scan with bin packing, max size: 28788435 bytes, open cost is considered as scanning 4194304 bytes.
25/06/12 17:06:04 INFO InMemoryFileIndex: It took 4 ms to list leaf files for 1 paths.
25/06/12 17:06:04 INFO InMemoryFileIndex: It took 4 ms to list leaf files for 1 paths.
25/06/12 17:06:04 INFO FileSourceStrategy: Pruning directories with:
25/06/12 17:06:04 INFO FileSourceStrategy: Pushed Filters:
25/06/12 17:06:04 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#32, None)) > 0)
25/06/12 17:06:04 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 284.3 KiB, free 1968.6 MiB)
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 23.7 KiB, free 1968.6 MiB)
25/06/12 17:06:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on karida:61865 (size: 23.7 KiB, free: 1969.4 MiB)
25/06/12 17:06:04 INFO SparkContext: Created broadcast 3 from csv at movie1.scala:11
25/06/12 17:06:04 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4365612 bytes, open cost is considered as scanning 4194304 bytes.
25/06/12 17:06:04 INFO SparkContext: Starting job: csv at movie1.scala:11
25/06/12 17:06:04 INFO DAGScheduler: Got job 1 (csv at movie1.scala:11) with 1 output partitions
25/06/12 17:06:04 INFO DAGScheduler: Final stage: ResultStage 1 (csv at movie1.scala:11)
25/06/12 17:06:04 INFO DAGScheduler: Parents of final stage: List()
25/06/12 17:06:04 INFO DAGScheduler: Missing parents: List()
25/06/12 17:06:04 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[13] at csv at movie1.scala:11), which has no missing parents
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 10.7 KiB, free 1968.6 MiB)
25/06/12 17:06:04 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 5.3 KiB, free 1968.6 MiB)
25/06/12 17:06:04 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on karida:61865 (size: 5.3 KiB, free: 1969.4 MiB)
25/06/12 17:06:04 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1223
25/06/12 17:06:04 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[13] at csv at movie1.scala:11) (first 15 tasks are for partitions Vector(0))
25/06/12 17:06:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
25/06/12 17:06:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, karida, executor driver, partition 0, PROCESS_LOCAL, 7740 bytes)
25/06/12 17:06:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
25/06/12 17:06:04 INFO FileScanRDD: Reading File path: file:///F:/大数据技术/ml-1m/movies.dat, range: 0-171308, partition values: [empty row]
25/06/12 17:06:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1564 bytes result sent to driver
25/06/12 17:06:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 156 ms on karida (executor driver) (1/1)
25/06/12 17:06:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
25/06/12 17:06:05 INFO BlockManagerInfo: Removed broadcast_2_piece0 on karida:61865 in memory (size: 23.7 KiB, free: 1969.4 MiB)
25/06/12 17:06:05 INFO DAGScheduler: ResultStage 1 (csv at movie1.scala:11) finished in 0.217 s
25/06/12 17:06:05 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
25/06/12 17:06:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
25/06/12 17:06:05 INFO DAGScheduler: Job 1 finished: csv at movie1.scala:11, took 0.222223 s
25/06/12 17:06:05 INFO BlockManagerInfo: Removed broadcast_0_piece0 on karida:61865 in memory (size: 23.7 KiB, free: 1969.5 MiB)
25/06/12 17:06:05 INFO BlockManagerInfo: Removed broadcast_1_piece0 on karida:61865 in memory (size: 5.3 KiB, free: 1969.5 MiB)
25/06/12 17:06:05 INFO FileSourceStrategy: Pruning directories with:
25/06/12 17:06:05 INFO FileSourceStrategy: Pushed Filters:
25/06/12 17:06:05 INFO FileSourceStrategy: Post-Scan Filters:
25/06/12 17:06:05 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
25/06/12 17:06:05 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 284.3 KiB, free 1968.9 MiB)
25/06/12 17:06:05 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 23.7 KiB, free 1968.9 MiB)
25/06/12 17:06:05 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on karida:61865 (size: 23.7 KiB, free: 1969.4 MiB)
25/06/12 17:06:05 INFO SparkContext: Created broadcast 5 from csv at movie1.scala:11
25/06/12 17:06:05 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4365612 bytes, open cost is considered as scanning 4194304 bytes.
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '|' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 2, pos 37)
== SQL ==
|SELECT r.MovieID, m.Title,SUM(r.Rating) AS TotalScore,AVG(r.Rating) AS AverageScore
-------------------------------------^^^
|FROM ratings r
|JOIN movies m ON r.MovieID = m.MovieID
|GROUP BY r.MovieID, m.Title
|ORDEY BY r.MovieID
|
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:605)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
at com.sql.movie1$.main(movie1.scala:17)
at com.sql.movie1.main(movie1.scala)
25/06/12 17:06:05 INFO SparkContext: Invoking stop() from shutdown hook
25/06/12 17:06:05 INFO SparkUI: Stopped Spark web UI at http://karida:4040
25/06/12 17:06:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/06/12 17:06:05 INFO MemoryStore: MemoryStore cleared
25/06/12 17:06:05 INFO BlockManager: BlockManager stopped
25/06/12 17:06:05 INFO BlockManagerMaster: BlockManagerMaster stopped
25/06/12 17:06:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/06/12 17:06:05 INFO SparkContext: Successfully stopped SparkContext
25/06/12 17:06:05 INFO ShutdownHookManager: Shutdown hook called
25/06/12 17:06:05 INFO ShutdownHookManager: Deleting directory D:\temp\spark-a40640bb-73ee-4a39-9b1e-a38d382805d2怎么解决