Spark(8)Non Fat Jar/Cassandra Cluster Issue and Spark Version 1.3.1

本文介绍了如何解决Spark 1.3.1版本部署过程中遇到的问题,包括Java 8兼容性调整、BouncyCastle Provider设置、JCE限制解除等,并详细记录了Spark集群搭建步骤及配置细节。
Spark(8)Non Fat Jar/Cassandra Cluster Issue and Spark Version 1.3.1

1. Can upgrade to Java8?
Fix the BouncyCastleProvider Problem
Visit https://www.bouncycastle.org/latest_releases.html, download the file bcprov-jdk15on-152.jar
Place the file in directory
/usr/lib/jvm/java-8-oracle/jre/lib/ext

And then go to this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security

edit this file
sudo vi java.security

Add this line
security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider

I should download this file
http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15%2b/1.46/bcprov-jdk15%2b-1.46.jar

Fix the JCE Problem
Download the file from here
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

Unzip the file and place the jars in this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security


2. Fat Jar?
https://github.com/apache/spark/pull/288?
https://issues.apache.org/jira/browse/SPARK-1154
http://apache-spark-user-list.1001560.n3.nabble.com/Clean-up-app-folders-in-worker-nodes-td20889.html
https://spark.apache.org/docs/1.0.1/spark-standalone.html

Based on my understanding, we should keep using assembly jar in scala, submit the task job to master, it will distribute the jobs to spark standalone cluster or YARN cluster. The clients should not require any setting up or jar dependencies.

3. Cluster Sync Issue in Cassandra 1.2.13
http://stackoverflow.com/questions/23345045/cassandra-cas-delete-does-not-work
http://wiki.apache.org/cassandra/DistributedDeletes

Need to use ntpd to sync the clock
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
https://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/

Cluster of Cassandra, all the nodes will do write operation with timestamp, if the system time are different across the cluster nodes. The cassandra can run into wired status. Sometimes, delete, update can not work.

4. Upgrade to 1.3.1 Version
https://spark.apache.org/docs/latest/

Download the Spark source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz

Unzip and place the spark file in working directory
> sudo ln -s /opt/spark-1.3.1 /opt/spark

My Java version and Scala version are as follow:
> java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

> scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

Build the binary
> build/sbt clean
> build/sbt compile
Compile is not working for lack of dependencies. I will not spend time on that. I will directly download the binary.
>wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Unzip it and add it to the classpath.

Then my project sillycat-spark can easily run.
Simple Spark Cluster
download the source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz

build the source
> build/sbt clean
> build/sbt compile
Not build on ubuntu as well. Using binary instead.
> wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz

Prepare Configuration
Go to the CONF directory.
> cp spark-env.sh.template spark-env.sh
> cp slaves.template slaves

> cat slaves
# A Spark Worker will be started on each of the machines listed below.
ubuntu-dev1
ubuntu-dev2

>cat spark-env.sh
export SPARK_WORKER_MEMORY=768m
export SPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
export USER=carl

copy the same settings to all the slaves
> scp -r ubuntu-master:/home/carl/tool/spark-1.3.1-hadoop2.6 ./

Call the shell to start the standalone cluster
> sbin/start-all.sh

How to build
https://spark.apache.org/docs/1.1.0/building-with-maven.html
> mvn -DskipTests clean package
Build successfully.

Build with Yarn and hive and JDBC support
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package

Go to directory
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install

Error Message:
[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-assembly_2.10: Failed during scalastyle execution: Unable to find configuration file at location scalastyle-config.xml -> [Help 1]

Solution:
copy the [spark_root]/scalastyle-config.xml to [spark_root]/examples/scalastyle-config.xmlcan solve the problem

> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Pbigtop-dist -DskipTests clean package
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install

Changes in Resolver.scala
var mavenLocal = Resolver.mavenLocal

I set it up and running on batch mode on spark single cluster and yarn cluster. I will keep working on streaming mode and dynamic SQL.
All the based core codes are in project sillycat-spark now.

References:
Spark
http://sillycat.iteye.com/blog/1871204
http://sillycat.iteye.com/blog/1872478
http://sillycat.iteye.com/blog/2083193
http://sillycat.iteye.com/blog/2083194
http://sillycat.iteye.com/blog/2103288
http://sillycat.iteye.com/blog/2103457
http://sillycat.iteye.com/blog/2105430

Spark deployment
http://sillycat.iteye.com/blog/2166583
http://sillycat.iteye.com/blog/2167216
http://sillycat.iteye.com/blog/2183932

spark test
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
http://stackoverflow.com/questions/26170957/using-funsuite-to-test-spark-throws-nullpointerexception
http://blog.quantifind.com/posts/spark-unit-test/

spark docs
http://www.sparkexpert.com/
https://github.com/sujee81/SparkApps
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
http://dataunion.org/category/tech/spark-tech
http://dataunion.org/6308.html
http://endymecy.gitbooks.io/spark-programming-guide-zh-cn/content/spark-sql/README.html
http://zhangyi.farbox.com/post/access-postgresql-based-on-spark-sql

https://github.com/mkuthan/example-spark.git
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值