Spark(8)Non Fat Jar/Cassandra Cluster Issue and Spark Version 1.3.1
1. Can upgrade to Java8?
Fix the BouncyCastleProvider Problem
Visit https://www.bouncycastle.org/latest_releases.html, download the file bcprov-jdk15on-152.jar
Place the file in directory
/usr/lib/jvm/java-8-oracle/jre/lib/ext
And then go to this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security
edit this file
sudo vi java.security
Add this line
security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider
I should download this file
http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15%2b/1.46/bcprov-jdk15%2b-1.46.jar
Fix the JCE Problem
Download the file from here
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
Unzip the file and place the jars in this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security
2. Fat Jar?
https://github.com/apache/spark/pull/288?
https://issues.apache.org/jira/browse/SPARK-1154
http://apache-spark-user-list.1001560.n3.nabble.com/Clean-up-app-folders-in-worker-nodes-td20889.html
https://spark.apache.org/docs/1.0.1/spark-standalone.html
Based on my understanding, we should keep using assembly jar in scala, submit the task job to master, it will distribute the jobs to spark standalone cluster or YARN cluster. The clients should not require any setting up or jar dependencies.
3. Cluster Sync Issue in Cassandra 1.2.13
http://stackoverflow.com/questions/23345045/cassandra-cas-delete-does-not-work
http://wiki.apache.org/cassandra/DistributedDeletes
Need to use ntpd to sync the clock
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
https://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/
Cluster of Cassandra, all the nodes will do write operation with timestamp, if the system time are different across the cluster nodes. The cassandra can run into wired status. Sometimes, delete, update can not work.
4. Upgrade to 1.3.1 Version
https://spark.apache.org/docs/latest/
Download the Spark source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
Unzip and place the spark file in working directory
> sudo ln -s /opt/spark-1.3.1 /opt/spark
My Java version and Scala version are as follow:
> java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
Build the binary
> build/sbt clean
> build/sbt compile
Compile is not working for lack of dependencies. I will not spend time on that. I will directly download the binary.
>wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Unzip it and add it to the classpath.
Then my project sillycat-spark can easily run.
Simple Spark Cluster
download the source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
build the source
> build/sbt clean
> build/sbt compile
Not build on ubuntu as well. Using binary instead.
> wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Prepare Configuration
Go to the CONF directory.
> cp spark-env.sh.template spark-env.sh
> cp slaves.template slaves
> cat slaves
# A Spark Worker will be started on each of the machines listed below.
ubuntu-dev1
ubuntu-dev2
>cat spark-env.sh
export SPARK_WORKER_MEMORY=768m
export SPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
export USER=carl
copy the same settings to all the slaves
> scp -r ubuntu-master:/home/carl/tool/spark-1.3.1-hadoop2.6 ./
Call the shell to start the standalone cluster
> sbin/start-all.sh
How to build
https://spark.apache.org/docs/1.1.0/building-with-maven.html
> mvn -DskipTests clean package
Build successfully.
Build with Yarn and hive and JDBC support
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package
Go to directory
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install
Error Message:
[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-assembly_2.10: Failed during scalastyle execution: Unable to find configuration file at location scalastyle-config.xml -> [Help 1]
Solution:
copy the [spark_root]/scalastyle-config.xml to [spark_root]/examples/scalastyle-config.xmlcan solve the problem
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Pbigtop-dist -DskipTests clean package
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install
Changes in Resolver.scala
var mavenLocal = Resolver.mavenLocal
I set it up and running on batch mode on spark single cluster and yarn cluster. I will keep working on streaming mode and dynamic SQL.
All the based core codes are in project sillycat-spark now.
References:
Spark
http://sillycat.iteye.com/blog/1871204
http://sillycat.iteye.com/blog/1872478
http://sillycat.iteye.com/blog/2083193
http://sillycat.iteye.com/blog/2083194
http://sillycat.iteye.com/blog/2103288
http://sillycat.iteye.com/blog/2103457
http://sillycat.iteye.com/blog/2105430
Spark deployment
http://sillycat.iteye.com/blog/2166583
http://sillycat.iteye.com/blog/2167216
http://sillycat.iteye.com/blog/2183932
spark test
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
http://stackoverflow.com/questions/26170957/using-funsuite-to-test-spark-throws-nullpointerexception
http://blog.quantifind.com/posts/spark-unit-test/
spark docs
http://www.sparkexpert.com/
https://github.com/sujee81/SparkApps
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
http://dataunion.org/category/tech/spark-tech
http://dataunion.org/6308.html
http://endymecy.gitbooks.io/spark-programming-guide-zh-cn/content/spark-sql/README.html
http://zhangyi.farbox.com/post/access-postgresql-based-on-spark-sql
https://github.com/mkuthan/example-spark.git
1. Can upgrade to Java8?
Fix the BouncyCastleProvider Problem
Visit https://www.bouncycastle.org/latest_releases.html, download the file bcprov-jdk15on-152.jar
Place the file in directory
/usr/lib/jvm/java-8-oracle/jre/lib/ext
And then go to this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security
edit this file
sudo vi java.security
Add this line
security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider
I should download this file
http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15%2b/1.46/bcprov-jdk15%2b-1.46.jar
Fix the JCE Problem
Download the file from here
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
Unzip the file and place the jars in this directory
/usr/lib/jvm/java-8-oracle/jre/lib/security
2. Fat Jar?
https://github.com/apache/spark/pull/288?
https://issues.apache.org/jira/browse/SPARK-1154
http://apache-spark-user-list.1001560.n3.nabble.com/Clean-up-app-folders-in-worker-nodes-td20889.html
https://spark.apache.org/docs/1.0.1/spark-standalone.html
Based on my understanding, we should keep using assembly jar in scala, submit the task job to master, it will distribute the jobs to spark standalone cluster or YARN cluster. The clients should not require any setting up or jar dependencies.
3. Cluster Sync Issue in Cassandra 1.2.13
http://stackoverflow.com/questions/23345045/cassandra-cas-delete-does-not-work
http://wiki.apache.org/cassandra/DistributedDeletes
Need to use ntpd to sync the clock
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
https://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/
Cluster of Cassandra, all the nodes will do write operation with timestamp, if the system time are different across the cluster nodes. The cassandra can run into wired status. Sometimes, delete, update can not work.
4. Upgrade to 1.3.1 Version
https://spark.apache.org/docs/latest/
Download the Spark source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
Unzip and place the spark file in working directory
> sudo ln -s /opt/spark-1.3.1 /opt/spark
My Java version and Scala version are as follow:
> java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
Build the binary
> build/sbt clean
> build/sbt compile
Compile is not working for lack of dependencies. I will not spend time on that. I will directly download the binary.
>wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Unzip it and add it to the classpath.
Then my project sillycat-spark can easily run.
Simple Spark Cluster
download the source file
>wget http://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
build the source
> build/sbt clean
> build/sbt compile
Not build on ubuntu as well. Using binary instead.
> wget http://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Prepare Configuration
Go to the CONF directory.
> cp spark-env.sh.template spark-env.sh
> cp slaves.template slaves
> cat slaves
# A Spark Worker will be started on each of the machines listed below.
ubuntu-dev1
ubuntu-dev2
>cat spark-env.sh
export SPARK_WORKER_MEMORY=768m
export SPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
export USER=carl
copy the same settings to all the slaves
> scp -r ubuntu-master:/home/carl/tool/spark-1.3.1-hadoop2.6 ./
Call the shell to start the standalone cluster
> sbin/start-all.sh
How to build
https://spark.apache.org/docs/1.1.0/building-with-maven.html
> mvn -DskipTests clean package
Build successfully.
Build with Yarn and hive and JDBC support
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package
Go to directory
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install
Error Message:
[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-assembly_2.10: Failed during scalastyle execution: Unable to find configuration file at location scalastyle-config.xml -> [Help 1]
Solution:
copy the [spark_root]/scalastyle-config.xml to [spark_root]/examples/scalastyle-config.xmlcan solve the problem
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Pbigtop-dist -DskipTests clean package
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests clean package install
Changes in Resolver.scala
var mavenLocal = Resolver.mavenLocal
I set it up and running on batch mode on spark single cluster and yarn cluster. I will keep working on streaming mode and dynamic SQL.
All the based core codes are in project sillycat-spark now.
References:
Spark
http://sillycat.iteye.com/blog/1871204
http://sillycat.iteye.com/blog/1872478
http://sillycat.iteye.com/blog/2083193
http://sillycat.iteye.com/blog/2083194
http://sillycat.iteye.com/blog/2103288
http://sillycat.iteye.com/blog/2103457
http://sillycat.iteye.com/blog/2105430
Spark deployment
http://sillycat.iteye.com/blog/2166583
http://sillycat.iteye.com/blog/2167216
http://sillycat.iteye.com/blog/2183932
spark test
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
http://stackoverflow.com/questions/26170957/using-funsuite-to-test-spark-throws-nullpointerexception
http://blog.quantifind.com/posts/spark-unit-test/
spark docs
http://www.sparkexpert.com/
https://github.com/sujee81/SparkApps
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
http://dataunion.org/category/tech/spark-tech
http://dataunion.org/6308.html
http://endymecy.gitbooks.io/spark-programming-guide-zh-cn/content/spark-sql/README.html
http://zhangyi.farbox.com/post/access-postgresql-based-on-spark-sql
https://github.com/mkuthan/example-spark.git
本文介绍了如何解决Spark 1.3.1版本部署过程中遇到的问题,包括Java 8兼容性调整、BouncyCastle Provider设置、JCE限制解除等,并详细记录了Spark集群搭建步骤及配置细节。
2904

被折叠的 条评论
为什么被折叠?



