Hadoop MapReduce:
MapReduce在每次执行的时候都要从磁盘读数据,计算完毕后都要把数据放到磁盘
spark map reduce:
RDD is everything for dev:
Basic Concepts:
Graph RDD:
Spark Runtime:
schedule:
Depency Type:
Scheduler Optimizations:
Event Flow:
Submit Job:
New Job Instance:
Job In Detail:
executor.launchTask:
Standalone:
Work Flow:
Standalone detail:
Driver application to Clustor:
Worker Exception:
Executor Exception:
Master Exception:
Master HA:
本文对比了Hadoop MapReduce与Spark MapReduce在大数据处理中的性能、效率及应用场景,深入探讨了RDD(弹性分布式数据集)的概念及其在Spark Runtime中的作用,详细解析了调度、依赖类型、优化策略、事件流等关键组件,同时阐述了从Driver应用程序到集群的执行流程,以及在执行过程中可能遇到的异常情况,如Worker、Executor、Master异常等。
871





