06/06/21

 今天看了看qu的脱壳,有些感悟,不错。很多东西还是比较简单的。看了周润发的纵横天下,有些感悟,不错,很多东西还是比较精彩的。看了最近一些马,有些感悟,不错,很多东西还是比较难的。看了看blog,有些感悟,不错,很多东西还是忘了写的。 
【语音分离】基于平均谐波结构建模的无监督单声道音乐声源分离(Matlab代码实现)内容概要:本文介绍了基于平均谐波结构建模的无监督单声道音乐声源分离方法,并提供了相应的Matlab代码实现。该方法通过对音乐信号中的谐波结构进行建模,利用音源间的频率特征差异,实现对混合音频中不同乐器或人声成分的有效分离。整个过程无需标注数据,属于无监督学习范畴,适用于单通道录音场景下的语音与音乐分离任务。文中强调了算法的可复现性,并附带完整的仿真资源链接,便于读者学习与验证。; 适合人群:具备一定信号处理基础和Matlab编程能力的高校学生、科研人员及从事音频处理、语音识别等相关领域的工程师;尤其适合希望深入理解声源分离原理并进行算法仿真实践的研究者。; 使用场景及目标:①用于音乐音频中人声与伴奏的分离,或不同乐器之间的分离;②支持无监督条件下的语音处理研究,推动盲源分离技术的发展;③作为学术论文复现、课程项目开发或科研原型验证的技术参考。; 阅读建议:建议读者结合提供的Matlab代码与网盘资料同步运行调试,重点关注谐波建模与频谱分解的实现细节,同时可扩展学习盲源分离中的其他方法如独立成分分析(ICA)或非负矩阵分解(NMF),以加深对音频信号分离机制的理解。
内容概要:本文系统介绍了新能源汽车领域智能底盘技术的发展背景、演进历程、核心技术架构及创新形态。文章指出智能底盘作为智能汽车的核心执行层,通过线控化(X-By-Wire)和域控化实现驱动、制动、转向、悬架的精准主动控制,支撑高阶智能驾驶落地。技术发展历经机械、机电混合到智能三个阶段,当前以线控转向、线控制动、域控制器等为核心,并辅以传感器、车规级芯片、功能安全等配套技术。文中还重点探讨了“智能滑板底盘”这一创新形态,强调其高度集成化、模块化优势及其在成本、灵活性、空间利用等方面的潜力。最后通过“2025智能底盘先锋计划”的实车测试案例,展示了智能底盘在真实场景中的安全与性能表现,推动技术从研发走向市场验证。; 适合人群:汽车电子工程师、智能汽车研发人员、新能源汽车领域技术人员及对智能底盘技术感兴趣的从业者;具备一定汽车工程或控制系统基础知识的专业人士。; 使用场景及目标:①深入了解智能底盘的技术演进路径与系统架构;②掌握线控技术、域控制器、滑板底盘等关键技术原理与应用场景;③为智能汽车底盘研发、系统集成与技术创新提供理论支持与实践参考。; 阅读建议:建议结合实际车型和技术标准进行延伸学习,关注政策导向与行业测试动态,注重理论与实车验证相结合,全面理解智能底盘从技术构想到商业化落地的全过程。
【顶级EI复现】计及连锁故障传播路径的电力系统 N-k 多阶段双层优化及故障场景筛选模型(Matlab代码实现)内容概要:本文介绍了名为《【顶级EI复现】计及连锁故障传播路径的电力系统 N-k 多阶段双层优化及故障场景筛选模型(Matlab代码实现)》的技术资源,重点围绕电力系统中连锁故障的传播路径展开研究,提出了一种N-k多阶段双层优化模型,并结合故障场景筛选方法,用于提升电力系统在复杂故障条件下的安全性与鲁棒性。该模型通过Matlab代码实现,具备较强的工程应用价值和学术参考意义,适用于电力系统风险评估、脆弱性分析及预防控制策略设计等场景。文中还列举了大量相关的科研技术支持方向,涵盖智能优化算法、机器学习、路径规划、信号处理、电力系统管理等多个领域,展示了广泛的仿真与复现能力。; 适合人群:具备电力系统、自动化、电气工程等相关背景,熟悉Matlab编程,有一定科研基础的研究生、高校教师及工程技术人员。; 使用场景及目标:①用于电力系统连锁故障建模与风险评估研究;②支撑高水平论文(如EI/SCI)的模型复现与算法验证;③为电网安全分析、故障传播防控提供优化决策工具;④结合YALMIP等工具进行数学规划求解,提升科研效率。; 阅读建议:建议读者结合提供的网盘资源,下载完整代码与案例进行实践操作,重点关注双层优化结构与场景筛选逻辑的设计思路,同时可参考文档中提及的其他复现案例拓展研究视野。
(/root/.conda/envs/untitled) [root@master untitled]# spark-submit /root/IdeaProjects/untitled/test.py25/06/21 03:46:42 INFO spark.SparkContext: Running Spark version 3.2.425/06/21 03:46:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable25/06/21 03:46:42 INFO resource.ResourceUtils: ==============================================================25/06/21 03:46:42 INFO resource.ResourceUtils: No custom resources configured for spark.driver.25/06/21 03:46:42 INFO resource.ResourceUtils: ==============================================================25/06/21 03:46:42 INFO spark.SparkContext: Submitted application: NBAPlayerStatsAnalysis25/06/21 03:46:42 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)25/06/21 03:46:42 INFO resource.ResourceProfile: Limiting resource is cpu25/06/21 03:46:42 INFO resource.ResourceProfileManager: Added ResourceProfile id: 025/06/21 03:46:42 INFO spark.SecurityManager: Changing view acls to: root25/06/21 03:46:42 INFO spark.SecurityManager: Changing modify acls to: root25/06/21 03:46:42 INFO spark.SecurityManager: Changing view acls groups to: 25/06/21 03:46:42 INFO spark.SecurityManager: Changing modify acls groups to: 25/06/21 03:46:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()25/06/21 03:46:42 INFO util.Utils: Successfully started service 'sparkDriver' on port 36933.25/06/21 03:46:42 INFO spark.SparkEnv: Registering MapOutputTracker25/06/21 03:46:42 INFO spark.SparkEnv: Registering BlockManagerMaster25/06/21 03:46:42 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information25/06/21 03:46:42 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up25/06/21 03:46:42 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat25/06/21 03:46:42 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-f4bf76ca-4382-4352-8252-7091abcccdd225/06/21 03:46:42 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MiB25/06/21 03:46:42 INFO spark.SparkEnv: Registering OutputCommitCoordinator25/06/21 03:46:42 INFO util.log: Logging initialized @3442ms to org.sparkproject.jetty.util.log.Slf4jLog25/06/21 03:46:43 INFO server.Server: jetty-9.4.44.v20210927; built: 2021-09-27T23:02:44.612Z; git: 8da83308eeca865e495e53ef315a249d63ba9332; jvm 1.8.0_241-b0725/06/21 03:46:43 INFO server.Server: Started @3522ms25/06/21 03:46:43 INFO server.AbstractConnector: Started ServerConnector@3802b02d{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}25/06/21 03:46:43 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33cb530d{/jobs,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@597073aa{/jobs/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@170a0963{/jobs/job,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d7aff6c{/jobs/job/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b3cd35f{/stages,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@749bd873{/stages/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@36f6ff64{/stages/stage,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53c4807{/stages/stage/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3fab6100{/stages/pool,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58d1569b{/stages/pool/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2510a308{/storage,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1ae892d4{/storage/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@40270ad1{/storage/rdd,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2deda930{/storage/rdd/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a1d8a12{/environment,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16f587b6{/environment/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@216349f5{/executors,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@455d5738{/executors/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@397514a9{/executors/threadDump,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14a9355f{/executors/threadDump/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1151479d{/static,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6966d957{/,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67db9e0c{/api,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54e22bed{/jobs/job/kill,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ebb574{/stages/stage/kill,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:404025/06/21 03:46:43 INFO executor.Executor: Starting executor ID driver on host master25/06/21 03:46:43 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41829.25/06/21 03:46:43 INFO netty.NettyBlockTransferService: Server created on master:4182925/06/21 03:46:43 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy25/06/21 03:46:43 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 41829, None)25/06/21 03:46:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager master:41829 with 366.3 MiB RAM, BlockManagerId(driver, master, 41829, None)25/06/21 03:46:43 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 41829, None)25/06/21 03:46:43 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 41829, None)25/06/21 03:46:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e7cc552{/metrics/json,null,AVAILABLE,@Spark}25/06/21 03:46:43 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.25/06/21 03:46:43 INFO internal.SharedState: Warehouse path is 'file:/root/IdeaProjects/untitled/spark-warehouse'.25/06/21 03:46:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4782cb47{/SQL,null,AVAILABLE,@Spark}25/06/21 03:46:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@136cde03{/SQL/json,null,AVAILABLE,@Spark}25/06/21 03:46:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@212a8549{/SQL/execution,null,AVAILABLE,@Spark}25/06/21 03:46:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54bfaa8f{/SQL/execution/json,null,AVAILABLE,@Spark}25/06/21 03:46:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29426cde{/static/sql,null,AVAILABLE,@Spark}25/06/21 03:46:45 INFO datasources.InMemoryFileIndex: It took 115 ms to list leaf files for 1 paths.25/06/21 03:46:45 INFO datasources.InMemoryFileIndex: It took 3 ms to list leaf files for 1 paths.25/06/21 03:46:47 INFO datasources.FileSourceStrategy: Pushed Filters: 25/06/21 03:46:47 INFO datasources.FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)25/06/21 03:46:47 INFO datasources.FileSourceStrategy: Output Data Schema: struct<value: string>25/06/21 03:46:47 INFO codegen.CodeGenerator: Code generated in 205.916922 ms25/06/21 03:46:47 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 222.8 KiB, free 366.1 MiB)25/06/21 03:46:47 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.9 KiB, free 366.1 MiB)25/06/21 03:46:47 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on master:41829 (size: 20.9 KiB, free: 366.3 MiB)25/06/21 03:46:47 INFO spark.SparkContext: Created broadcast 0 from csv at NativeMethodAccessorImpl.java:025/06/21 03:46:47 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.25/06/21 03:46:48 INFO spark.SparkContext: Starting job: csv at NativeMethodAccessorImpl.java:025/06/21 03:46:48 INFO scheduler.DAGScheduler: Got job 0 (csv at NativeMethodAccessorImpl.java:0) with 1 output partitions25/06/21 03:46:48 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (csv at NativeMethodAccessorImpl.java:0)25/06/21 03:46:48 INFO scheduler.DAGScheduler: Parents of final stage: List()25/06/21 03:46:48 INFO scheduler.DAGScheduler: Missing parents: List()25/06/21 03:46:48 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at csv at NativeMethodAccessorImpl.java:0), which has no missing parents25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 11.6 KiB, free 366.1 MiB)25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.8 KiB, free 366.0 MiB)25/06/21 03:46:48 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on master:41829 (size: 5.8 KiB, free: 366.3 MiB)25/06/21 03:46:48 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:147425/06/21 03:46:48 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at csv at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))25/06/21 03:46:48 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 025/06/21 03:46:48 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (master, executor driver, partition 0, NODE_LOCAL, 4881 bytes) taskResourceAssignments Map()25/06/21 03:46:48 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)25/06/21 03:46:48 INFO datasources.FileScanRDD: Reading File path: hdfs://master:9000/usr/local/hadoop/clean_data_final.csv, range: 0-71444, partition values: [empty row]25/06/21 03:46:48 INFO codegen.CodeGenerator: Code generated in 9.108527 ms25/06/21 03:46:48 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1646 bytes result sent to driver25/06/21 03:46:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 462 ms on master (executor driver) (1/1)25/06/21 03:46:48 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 25/06/21 03:46:48 INFO scheduler.DAGScheduler: ResultStage 0 (csv at NativeMethodAccessorImpl.java:0) finished in 0.622 s25/06/21 03:46:48 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job25/06/21 03:46:48 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished25/06/21 03:46:48 INFO scheduler.DAGScheduler: Job 0 finished: csv at NativeMethodAccessorImpl.java:0, took 0.659738 s25/06/21 03:46:48 INFO codegen.CodeGenerator: Code generated in 7.988203 ms25/06/21 03:46:48 INFO datasources.FileSourceStrategy: Pushed Filters: 25/06/21 03:46:48 INFO datasources.FileSourceStrategy: Post-Scan Filters: 25/06/21 03:46:48 INFO datasources.FileSourceStrategy: Output Data Schema: struct<value: string>25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 222.8 KiB, free 365.8 MiB)25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 20.9 KiB, free 365.8 MiB)25/06/21 03:46:48 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on master:41829 (size: 20.9 KiB, free: 366.3 MiB)25/06/21 03:46:48 INFO spark.SparkContext: Created broadcast 2 from csv at NativeMethodAccessorImpl.java:025/06/21 03:46:48 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.25/06/21 03:46:48 INFO spark.SparkContext: Starting job: csv at NativeMethodAccessorImpl.java:025/06/21 03:46:48 INFO scheduler.DAGScheduler: Got job 1 (csv at NativeMethodAccessorImpl.java:0) with 1 output partitions25/06/21 03:46:48 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (csv at NativeMethodAccessorImpl.java:0)25/06/21 03:46:48 INFO scheduler.DAGScheduler: Parents of final stage: List()25/06/21 03:46:48 INFO scheduler.DAGScheduler: Missing parents: List()25/06/21 03:46:48 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at csv at NativeMethodAccessorImpl.java:0), which has no missing parents25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 16.7 KiB, free 365.8 MiB)25/06/21 03:46:48 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.5 KiB, free 365.8 MiB)25/06/21 03:46:48 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on master:41829 (size: 8.5 KiB, free: 366.2 MiB)25/06/21 03:46:48 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:147425/06/21 03:46:48 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at csv at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))25/06/21 03:46:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 025/06/21 03:46:48 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (master, executor driver, partition 0, NODE_LOCAL, 4881 bytes) taskResourceAssignments Map()25/06/21 03:46:48 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)25/06/21 03:46:49 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on master:41829 in memory (size: 5.8 KiB, free: 366.3 MiB)25/06/21 03:46:49 INFO datasources.FileScanRDD: Reading File path: hdfs://master:9000/usr/local/hadoop/clean_data_final.csv, range: 0-71444, partition values: [empty row]25/06/21 03:46:49 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1840 bytes result sent to driver25/06/21 03:46:49 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 273 ms on master (executor driver) (1/1)25/06/21 03:46:49 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 25/06/21 03:46:49 INFO scheduler.DAGScheduler: ResultStage 1 (csv at NativeMethodAccessorImpl.java:0) finished in 0.339 s25/06/21 03:46:49 INFO scheduler.DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job25/06/21 03:46:49 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished25/06/21 03:46:49 INFO scheduler.DAGScheduler: Job 1 finished: csv at NativeMethodAccessorImpl.java:0, took 0.343278 s25/06/21 03:46:51 INFO datasources.FileSourceStrategy: Pushed Filters: 25/06/21 03:46:51 INFO datasources.FileSourceStrategy: Post-Scan Filters: 25/06/21 03:46:51 INFO datasources.FileSourceStrategy: Output Data Schema: struct<pname: string, pos: string, team: string, age: int, gp: int ... 29 more fields>25/06/21 03:46:51 WARN util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.25/06/21 03:46:51 INFO codegen.CodeGenerator: Code generated in 39.64368 ms25/06/21 03:46:51 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 222.6 KiB, free 365.6 MiB)25/06/21 03:46:51 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 20.9 KiB, free 365.6 MiB)25/06/21 03:46:51 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on master:41829 (size: 20.9 KiB, free: 366.2 MiB)25/06/21 03:46:51 INFO spark.SparkContext: Created broadcast 4 from toPandas at /root/IdeaProjects/untitled/test.py:1825/06/21 03:46:51 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.25/06/21 03:46:51 INFO spark.SparkContext: Starting job: toPandas at /root/IdeaProjects/untitled/test.py:1825/06/21 03:46:51 INFO scheduler.DAGScheduler: Got job 2 (toPandas at /root/IdeaProjects/untitled/test.py:18) with 1 output partitions25/06/21 03:46:51 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (toPandas at /root/IdeaProjects/untitled/test.py:18)25/06/21 03:46:51 INFO scheduler.DAGScheduler: Parents of final stage: List()25/06/21 03:46:51 INFO scheduler.DAGScheduler: Missing parents: List()25/06/21 03:46:51 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[13] at toPandas at /root/IdeaProjects/untitled/test.py:18), which has no missing parents25/06/21 03:46:51 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 28.7 KiB, free 365.5 MiB)25/06/21 03:46:51 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 10.1 KiB, free 365.5 MiB)25/06/21 03:46:51 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on master:41829 (size: 10.1 KiB, free: 366.2 MiB)25/06/21 03:46:51 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:147425/06/21 03:46:51 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[13] at toPandas at /root/IdeaProjects/untitled/test.py:18) (first 15 tasks are for partitions Vector(0))25/06/21 03:46:51 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 025/06/21 03:46:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2) (master, executor driver, partition 0, NODE_LOCAL, 4881 bytes) taskResourceAssignments Map()25/06/21 03:46:51 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)25/06/21 03:46:51 INFO datasources.FileScanRDD: Reading File path: hdfs://master:9000/usr/local/hadoop/clean_data_final.csv, range: 0-71444, partition values: [empty row]25/06/21 03:46:52 INFO codegen.CodeGenerator: Code generated in 63.838881 ms25/06/21 03:46:52 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 94292 bytes result sent to driver25/06/21 03:46:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 130 ms on master (executor driver) (1/1)25/06/21 03:46:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 25/06/21 03:46:52 INFO scheduler.DAGScheduler: ResultStage 2 (toPandas at /root/IdeaProjects/untitled/test.py:18) finished in 0.137 s25/06/21 03:46:52 INFO scheduler.DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job25/06/21 03:46:52 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished25/06/21 03:46:52 INFO scheduler.DAGScheduler: Job 2 finished: toPandas at /root/IdeaProjects/untitled/test.py:18, took 0.140503 s25/06/21 03:46:52 INFO spark.SparkContext: Invoking stop() from shutdown hook25/06/21 03:46:52 INFO server.AbstractConnector: Stopped Spark@3802b02d{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}25/06/21 03:46:52 INFO ui.SparkUI: Stopped Spark web UI at http://master:404025/06/21 03:46:52 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!25/06/21 03:46:52 INFO memory.MemoryStore: MemoryStore cleared25/06/21 03:46:52 INFO storage.BlockManager: BlockManager stopped25/06/21 03:46:52 INFO storage.BlockManagerMaster: BlockManagerMaster stopped25/06/21 03:46:52 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!25/06/21 03:46:52 INFO spark.SparkContext: Successfully stopped SparkContext25/06/21 03:46:52 INFO util.ShutdownHookManager: Shutdown hook called25/06/21 03:46:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-880bca98-c6bd-42a0-84a9-e243c4052d6f25/06/21 03:46:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-184bfddf-35ba-4ae4-aa65-df8be18332be25/06/21 03:46:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-184bfddf-35ba-4ae4-aa65-df8be18332be/pyspark-66ea83af-4111-4f7c-9ec6-1a6ffae82e4a
06-22
从日志信息来看,Spark 任务成功运行并完成了对 HDFS 文件的读取和处理。以下是详细的分析和解释。 --- ### **回答问题** #### **1. 日志核心信息** - **Spark 版本**:`Running Spark version 3.2.4` - **HDFS 文件路径**:`hdfs://master:9000/usr/local/hadoop/clean_data_final.csv` - **任务执行状态**: - Spark 成功读取了 HDFS 上的文件。 - 数据被解析为 DataFrame 并进行了后续处理。 - 最终任务完成并停止了 SparkContext。 #### **2. 日志中的关键步骤** - **任务提交**: ```bash spark-submit /root/IdeaProjects/untitled/test.py ``` - **任务执行阶段**: - Spark 创建了多个 `ResultStage` 和 `Job` 来处理数据。 - 每个阶段的任务都成功完成。 - **任务结束**: - SparkContext 被成功停止。 - 临时目录被清理。 #### **3. 示例代码** 以下是可能的 Python 代码示例: ```python from pyspark.sql import SparkSession # 创建 SparkSession spark = SparkSession.builder \ .appName("NBAPlayerStatsAnalysis") \ .getOrCreate() # 读取 HDFS 上的 CSV 文件 df = spark.read.option("header", True).option("inferSchema", True).csv("hdfs://master:9000/usr/local/hadoop/clean_data_final.csv") # 显示 DataFrame 的前几行 df.show() # 将 DataFrame 转换为 Pandas DataFrame pandas_df = df.toPandas() # 打印 Pandas DataFrame 的前几行 print(pandas_df.head()) # 停止 SparkSession spark.stop() ``` --- ### **给出解释** 1. **Spark 任务流程**: - **SparkSession 创建**:通过 `SparkSession.builder` 创建了一个 SparkSession 实例。 - **文件读取**:使用 `spark.read.csv` 方法从 HDFS 中读取 CSV 文件,并设置了 `header` 和 `inferSchema` 参数。 - **数据展示**:调用 `df.show()` 方法展示了 DataFrame 的前几行。 - **转换为 Pandas**:使用 `toPandas()` 方法将 Spark DataFrame 转换为 Pandas DataFrame。 - **任务停止**:调用 `spark.stop()` 方法停止了 SparkSession。 2. **日志分析**: - **任务启动**:日志显示 Spark 版本为 `3.2.4`,并且任务名称为 `NBAPlayerStatsAnalysis`。 - **文件读取**:日志中多次提到 `Reading File path: hdfs://master:9000/usr/local/hadoop/clean_data_final.csv`,表明文件成功被读取。 - **任务完成**:日志显示所有阶段的任务都成功完成,最终 SparkContext 被停止。 3. **性能优化**: - 如果数据量较大,可以考虑调整 `spark.sql.shuffle.partitions` 参数以优化 Shuffle 性能。 - 使用 `cache()` 或 `persist()` 方法缓存中间结果,避免重复计算。 --- ###
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值