(一)spark 相关安装部署、开发环境 1、Spark 伪分布式 & 全分布式 安装指南 http://my.oschina.net/leejun2005/blog/394928 2、Apache Spark探秘:三种分布式部署方式比较 http://dongxicheng.org/framework-on-yarn/apache-spark-comparing-three-deploying-ways/ 3、idea上运行local的spark sql hive http://dataknocker.github.io/2014/10/11/idea%E4%B8%8A%E8%BF%90%E8%A1%8Clocal%E7%9A%84spark-sql-hive/ 4、Apache Spark学习:利用Scala语言开发Spark应用程序 http://dongxicheng.org/framework-on-yarn/spark-scala-writing-application/ 5、如何在CDH5上运行Spark应用(Scala、Java、Python) http://blog.javachen.com/2015/02/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/ 6、Spark集群安装和使用 http://blog.javachen.com/2014/07/01/spark-install-and-usage/# (二)spark 架构、原理与编码 1、理解Spark的核心RDD http://www.infoq.com/cn/articles/spark-core-rdd 2、How-to: Translate from MapReduce to Apache Spark(怎样从 MapReduce 迁移到 Spark) http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/ 3、Spark SQL 源码分析之 In-Memory Columnar Storage 之 cache table http://blog.youkuaiyun.com/oopsoom/article/details/39525483 4、Databricks Spark 知识库
http://aiyanbo.gitbooks.io/databricks-spark-knowledge-base-zh-cn/content/ 5、Spark1.0.0 编程模型 http://blog.youkuaiyun.com/book_mmicky/article/details/32096871 6、Spark技术内幕:Client,Master和Worker 通信源码解析 http://blog.youkuaiyun.com/anzhsoft/article/details/30802603 7、Spark Streaming编程指南 http://yangqijun.com/archives/200 8、Spark分布式计算执行模型 http://www.flickering.cn/%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%A1%E7%AE%97/2014/07/spark%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%A1%E7%AE%97%E6%89%A7%E8%A1%8C%E6%A8%A1%E5%9E%8B/ 9、Top 3 Troubleshooting Tips To Keep You Sparking http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/ 10、Apache Spark 设计与实现(重点关注设计思想、运行原理、实现架构及性能调优,附带讨论与 MapReduce 在设计与实现上的区别。) https://github.com/JerryLead/SparkInternals/tree/master/markdown 11、Spark Examples http://spark.apache.org/examples.html 12、RDD操作详解 http://dataknocker.github.io/2014/07/20/RDD%E5%90%84%E6%93%8D%E4%BD%9C%E8%AF%A6%E8%A7%A3/ 13、Spark编程指南笔记 http://blog.javachen.com/2015/02/03/spark-programming-guide/# 14、Spark Core Runtime分析: DAGScheduler, TaskScheduler, SchedulerBackend http://blog.youkuaiyun.com/pelick/article/details/44495611 15、Getting Started with Spark (in Python) https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python 16、Spark编程指南笔记 http://blog.javachen.com/2015/02/03/spark-programming-guide/# 17、Spark SQL中的DataFrame http://blog.javachen.com/2015/03/26/spark-sql-dataframe/# 18、Spark RDD API详解(一) Map和Reduce https://www.zybuluo.com/jewes/note/35032 (三)spark 监控与管理 1、Common Spark Troubleshooting http://www.datastax.com/dev/blog/common-spark-troubleshooting 2、 (四)YARN & spark 1、Apache Spark探秘:多进程模型还是多线程模型? http://dongxicheng.org/framework-on-yarn/apache-spark-multi-threads-model/ (五)spark 数据平台架构 (六)spark 应用与实践 1、How-to: Do Near-Real Time Sessionization with Spark Streaming and Apache Hadoop http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ 2、Integrating Kafka and Spark Streaming: Code Examples and State of the Game http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/ 3、spark读取 kafka nginx网站日志消息 并写入HDFS中 http://yangqijun.com/archives/227 4、Flafka: Apache Flume Meets Apache Kafka for Event Processing http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ 5、Log Analysis with Spark http://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/README.html 6、Spark将计算结果写入到Mysql中 http://www.iteblog.com/archives/1275 7、Spark Streaming 1.3对Kafka整合的提升详解 http://www.iteblog.com/archives/1307 8、Spark SQL中的数据源 http://blog.javachen.com/2015/04/03/spark-sql-datasource/# (七)spark 机器学习实践 1、ML Pipelines: A New High-Level API for MLlib http://databricks.com/blog/2015/01/07/ml-pipelines-a-new-high-level-api-for-mllib.html 2、Spark 0.9.1 MLLib 机器学习库简介 http://rdc.taobao.org/?p=2163 (八)Scala 学习指北 1、Spark开发指南(0.8.1中文版) http://rdc.taobao.org/?p=2024 2、Swift和Scala语法上的诸多相似之处 http://segmentfault.com/a/1190000000575561 3、Awesome Scala https://github.com/lauris/awesome-scala 4、scala(有关jvm,scala与后端架构,阿里工程师的博客,相当不错) http://hongjiang.info/scala/ 5、Scala极速入门 http://my.oschina.net/mup/blog/363436?from=20150111 6、An-Overview-of-the-Scala-Programming-Language https://github.com/wecite/papers/tree/master/An-Overview-of-the-Scala-Programming-Language 7、Scala简明教程 http://colobu.com/2015/01/14/Scala-Quick-Start-for-Java-Programmers/ (九)Spark book 1、Spark Cook Book http://www.infoobjects.com/spark-cookbook/ 2、Fast Data Processing with Spark http://it-ebooks.info/book/3185/ 3、Scala语言概览 http://wecite.github.io/docs/ScalaOverview-20150226.pdf 4、Effective Scala http://twitter.github.io/effectivescala/index-cn.html 5、有趣的 Scala 语言: 简洁的 Scala 语法 http://www.ibm.com/developerworks/cn/java/j-lo-funinscala2/
|