spark源码阅读-环境准备(一)
基于1.6.0版本结合源码理解spark on yarn
环境准备
下载代码: https://github.com/juntaozhang/spark/tree/my.v1.6.0
git clone https://github.com/juntaozhang/spark.git
git checkout my.v1.6.0
mvn generate-resources generate-test-sources
1.open spark/pom.xml with IDEA
2.generate spark/external/flume-sink/src/main/avro
3.make “target/scala-2.10/src_managed/main/compiled_avro” as a source path
4.then you can run “Build -> Rebuild Project” in IDEA.
5. run org.apache.spark.examples.sql.JsonDemo
spark组件
- spark core, spark 内核
- spark streaming, spark流计算(基于batch方式)
- spark sql
- MLlib, 机器学习lib库
- GraphX, 图计算
- SparkR, 与R语言结合