Spark上传jar包到Hadoop(Linux)

1.在idea中添加pom打包依赖

        <pluginManagement>
            <plugins>
                <!-- 编译scala的插件 -->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                </plugin>
                <!-- 编译java的插件 -->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.5.1</version>
                </plugin>
            </plugins>
        </pluginManagement>

        如果出错可以将3.2.2换成3.2.0

2.生命周期package打成jar包

3.将jar包上传到linux中  可以利用Xshell

        rz快捷键

4.先启动Hadoop,Spark...

        start-all.sh

5.可以先打开spark的webui界面 ip:8080

        也可以通过spark-shell命令查看端口

6.利用spark-submit上传到集群

[root@hadoop ~]# spark-submit --class Task.wordCount --master local /root/spark_study-1.0.jar

这里是本地上传所以是local 也可以是 spark://ip:7077

完整提交命令:

spark-submit --class 包名.object名 --master 地址 --executor-memory  1g --total-executor-cores 4 jar包路径

7.展示

[root@hadoop ~]# cd /usr/local/src/spark
[root@hadoop spark]# spark-submit --class Task.task01.Step01 --master loca /root/spark_study-1.0.jarError: Master must either be yarn or start with spark, mesos, local
Run with --help for usage help or --verbose for debug output
[root@hadoop spark]# cd bin
[root@hadoop bin]# spark-submit --class Task.task01.Step01 --master local /root/spark_study-1.0.jar
22/10/20 07:53:09 INFO spark.SparkContext: Running Spark version 2.1.1
22/10/20 07:53:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/20 07:53:10 INFO spark.SecurityManager: Changing view acls to: root
22/10/20 07:53:10 INFO spark.SecurityManager: Changing modify acls to: root
22/10/20 07:53:10 INFO spark.SecurityManager: Changing view acls groups to: 
22/10/20 07:53:10 INFO spark.SecurityManager: Changing modify acls groups to: 
22/10/20 07:53:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/10/20 07:53:10 INFO util.Utils: Successfully started service 'sparkDriver' on port 37223.
22/10/20 07:53:11 INFO spark.SparkEnv: Registering MapOutputTracker
22/10/20 07:53:11 INFO spark.SparkEnv: Registering BlockManagerMaster
22/10/20 07:53:11 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/10/20 07:53:11 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/10/20 07:53:11 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-bf622dee-70e2-4e23-9002-4ced1155e705
22/10/20 07:53:11 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MB
22/10/20 07:53:11 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/10/20 07:53:11 INFO util.log: Logging initialized @2282ms
22/10/20 07:53:11 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5af28b27{/jobs,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71104a4{/jobs/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4985cbcb{/jobs/job,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@72f46e16{/jobs/job/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c9168dc{/stages,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@332a7fce{/stages/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@549621f3{/stages/stage,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54361a9{/stages/stage/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32232e55{/stages/pool,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5217f3d0{/stages/pool/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37ebc9d8{/storage,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@293bb8a5{/storage/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2416a51{/storage/rdd,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6fa590ba{/storage/rdd/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e9319f{/environment,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@72e34f77{/environment/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7bf9b098{/executors,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@389adf1d{/executors/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77307458{/executors/threadDump,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1fc0053e{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@290b1b2e{/static,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@47874b25{/,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33617539{/api,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2c177f9e{/jobs/job/kill,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5db4c359{/stages/stage/kill,null,AVAILABLE,@Spark}
22/10/20 07:53:11 INFO server.ServerConnector: Started Spark@d78795{HTTP/1.1}{0.0.0.0:4040}
22/10/20 07:53:11 INFO server.Server: Started @2482ms
22/10/20 07:53:11 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/10/20 07:53:11 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.60.130:4040
22/10/20 07:53:11 INFO spark.SparkContext: Added JAR file:/root/spark_study-1.0.jar at spark://192.168.60.130:37223/jars/spark_study-1.0.jar with timestamp 1666223591637
22/10/20 07:53:11 INFO executor.Executor: Starting executor ID driver on host localhost
22/10/20 07:53:11 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35308.
22/10/20 07:53:11 INFO netty.NettyBlockTransferService: Server created on 192.168.60.130:35308
22/10/20 07:53:11 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/10/20 07:53:11 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.60.130, 35308, None)
22/10/20 07:53:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.60.130:35308 with 413.9 MB RAM, BlockManagerId(driver, 192.168.60.130, 35308, None)
22/10/20 07:53:11 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.60.130, 35308, None)
22/10/20 07:53:11 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.60.130, 35308, None)
22/10/20 07:53:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@36a7abe1{/metrics/json,null,AVAILABLE,@Spark}
22/10/20 07:53:11 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
22/10/20 07:53:12 INFO internal.SharedState: Warehouse path is 'file:/usr/local/src/spark/bin/spark-warehouse'.
22/10/20 07:53:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@40620d8e{/SQL,null,AVAILABLE,@Spark}
22/10/20 07:53:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49b07ee3{/SQL/json,null,AVAILABLE,@Spark}
22/10/20 07:53:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28486680{/SQL/execution,null,AVAILABLE,@Spark}
22/10/20 07:53:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a1e3ac1{/SQL/execution/json,null,AVAILABLE,@Spark}
22/10/20 07:53:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5300f14a{/static/sql,null,AVAILABLE,@Spark}
22/10/20 07:53:13 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 243.7 KB, free 413.7 MB)
22/10/20 07:53:13 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.6 KB, free 413.7 MB)
22/10/20 07:53:13 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.60.130:35308 (size: 23.6 KB, free: 413.9 MB)
22/10/20 07:53:13 INFO spark.SparkContext: Created broadcast 0 from textFile at Step01.scala:13
22/10/20 07:53:14 INFO mapred.FileInputFormat: Total input paths to process : 1
22/10/20 07:53:14 INFO spark.SparkContext: Starting job: first at Step01.scala:14
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Got job 0 (first at Step01.scala:14) with 1 output partitions
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (first at Step01.scala:14)
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Missing parents: List()
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv MapPartitionsRDD[1] at textFile at Step01.scala:13), which has no missing parents
22/10/20 07:53:15 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 413.7 MB)
22/10/20 07:53:15 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1985.0 B, free 413.7 MB)
22/10/20 07:53:15 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.60.130:35308 (size: 1985.0 B, free: 413.9 MB)
22/10/20 07:53:15 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/10/20 07:53:15 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv MapPartitionsRDD[1] at textFile at Step01.scala:13)
22/10/20 07:53:15 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
22/10/20 07:53:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6065 bytes)
22/10/20 07:53:15 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/10/20 07:53:15 INFO executor.Executor: Fetching spark://192.168.60.130:37223/jars/spark_study-1.0.jar with timestamp 1666223591637
22/10/20 07:53:15 INFO client.TransportClientFactory: Successfully created connection to /192.168.60.130:37223 after 19 ms (0 ms spent in bootstraps)
22/10/20 07:53:15 INFO util.Utils: Fetching spark://192.168.60.130:37223/jars/spark_study-1.0.jar to /tmp/spark-867e26d9-0441-4b96-a8a3-2cc311b6b80d/userFiles-a67d8a2d-c17f-4b0c-b5cd-b969dc4b41ab/fetchFileTemp4283142326901178396.tmp
22/10/20 07:53:15 INFO executor.Executor: Adding file:/tmp/spark-867e26d9-0441-4b96-a8a3-2cc311b6b80d/userFiles-a67d8a2d-c17f-4b0c-b5cd-b969dc4b41ab/spark_study-1.0.jar to class loader
22/10/20 07:53:15 INFO rdd.HadoopRDD: Input split: hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv:0+124387
22/10/20 07:53:16 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/10/20 07:53:16 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/10/20 07:53:16 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/10/20 07:53:16 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/10/20 07:53:16 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/10/20 07:53:16 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1556 bytes result sent to driver
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1154 ms on localhost (executor driver) (1/1)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
22/10/20 07:53:16 INFO scheduler.DAGScheduler: ResultStage 0 (first at Step01.scala:14) finished in 1.191 s
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Job 0 finished: first at Step01.scala:14, took 1.477301 s
22/10/20 07:53:16 INFO spark.SparkContext: Starting job: count at Step01.scala:17
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Got job 1 (count at Step01.scala:17) with 1 output partitions
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at Step01.scala:17)
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Missing parents: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv MapPartitionsRDD[1] at textFile at Step01.scala:13), which has no missing parents
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.1 KB, free 413.7 MB)
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1915.0 B, free 413.7 MB)
22/10/20 07:53:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.60.130:35308 (size: 1915.0 B, free: 413.9 MB)
22/10/20 07:53:16 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv MapPartitionsRDD[1] at textFile at Step01.scala:13)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, ANY, 5983 bytes)
22/10/20 07:53:16 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
22/10/20 07:53:16 INFO rdd.HadoopRDD: Input split: hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv:0+124387
22/10/20 07:53:16 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1210 bytes result sent to driver
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 92 ms on localhost (executor driver) (1/1)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
22/10/20 07:53:16 INFO scheduler.DAGScheduler: ResultStage 1 (count at Step01.scala:17) finished in 0.093 s
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Job 1 finished: count at Step01.scala:17, took 0.113036 s
22/10/20 07:53:16 INFO spark.SparkContext: Starting job: count at Step01.scala:17
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Got job 2 (count at Step01.scala:17) with 1 output partitions
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (count at Step01.scala:17)
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Missing parents: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[2] at filter at Step01.scala:15), which has no missing parents
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 3.3 KB, free 413.7 MB)
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2017.0 B, free 413.6 MB)
22/10/20 07:53:16 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.60.130:35308 (size: 2017.0 B, free: 413.9 MB)
22/10/20 07:53:16 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:996
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[2] at filter at Step01.scala:15)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, ANY, 5983 bytes)
22/10/20 07:53:16 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
22/10/20 07:53:16 INFO rdd.HadoopRDD: Input split: hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv:0+124387
22/10/20 07:53:16 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 1123 bytes result sent to driver
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 67 ms on localhost (executor driver) (1/1)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
22/10/20 07:53:16 INFO scheduler.DAGScheduler: ResultStage 2 (count at Step01.scala:17) finished in 0.074 s
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Job 2 finished: count at Step01.scala:17, took 0.090038 s
删除数据源中缺失值大于3个字段的数据的条目数为:4条
22/10/20 07:53:16 INFO spark.SparkContext: Starting job: count at Step01.scala:18
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Got job 3 (count at Step01.scala:18) with 1 output partitions
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (count at Step01.scala:18)
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Missing parents: List()
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[2] at filter at Step01.scala:15), which has no missing parents
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.3 KB, free 413.6 MB)
22/10/20 07:53:16 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2017.0 B, free 413.6 MB)
22/10/20 07:53:16 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.60.130:35308 (size: 2017.0 B, free: 413.9 MB)
22/10/20 07:53:16 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[2] at filter at Step01.scala:15)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, ANY, 5983 bytes)
22/10/20 07:53:16 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3)
22/10/20 07:53:16 INFO rdd.HadoopRDD: Input split: hdfs://192.168.60.130:9000/file3_1/accommodationdata.csv:0+124387
22/10/20 07:53:16 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1123 bytes result sent to driver
22/10/20 07:53:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 46 ms on localhost (executor driver) (1/1)
22/10/20 07:53:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
22/10/20 07:53:16 INFO scheduler.DAGScheduler: ResultStage 3 (count at Step01.scala:18) finished in 0.047 s
22/10/20 07:53:16 INFO scheduler.DAGScheduler: Job 3 finished: count at Step01.scala:18, took 0.060488 s
清洗后输出的结果文件总行数为:1043行
22/10/20 07:53:16 INFO server.ServerConnector: Stopped Spark@d78795{HTTP/1.1}{0.0.0.0:4040}
22/10/20 07:53:16 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.60.130:35308 in memory (size: 2017.0 B, free: 413.9 MB)
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5db4c359{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2c177f9e{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33617539{/api,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@47874b25{/,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@290b1b2e{/static,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1fc0053e{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77307458{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@389adf1d{/executors/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7bf9b098{/executors,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@72e34f77{/environment/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@6e9319f{/environment,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fa590ba{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2416a51{/storage/rdd,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@293bb8a5{/storage/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@37ebc9d8{/storage,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5217f3d0{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@32232e55{/stages/pool,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@54361a9{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@549621f3{/stages/stage,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@332a7fce{/stages/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3c9168dc{/stages,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@72f46e16{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4985cbcb{/jobs/job,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@71104a4{/jobs/json,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5af28b27{/jobs,null,UNAVAILABLE,@Spark}
22/10/20 07:53:16 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.60.130:35308 in memory (size: 1915.0 B, free: 413.9 MB)
22/10/20 07:53:16 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.60.130:4040
22/10/20 07:53:16 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/10/20 07:53:16 INFO memory.MemoryStore: MemoryStore cleared
22/10/20 07:53:16 INFO storage.BlockManager: BlockManager stopped
22/10/20 07:53:16 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/10/20 07:53:16 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/10/20 07:53:16 INFO spark.SparkContext: Successfully stopped SparkContext
22/10/20 07:53:16 INFO spark.SparkContext: SparkContext already stopped.
22/10/20 07:53:16 INFO util.ShutdownHookManager: Shutdown hook called
22/10/20 07:53:16 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-867e26d9-0441-4b96-a8a3-2cc311b6b80d
[root@hadoop bin]# 

8.输出看文件行数可以有这个命令:

[root@hadoop bin]# hdfs dfs -cat /accommodationoutput1/* |wc -l
1043

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值