1、SparkPi.scala源码(官网例子)
import scala.math.random
import org.apache.spark._
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
2、在Intellij IDE集成开发环境中运行,会出错,需要修改代码,增加
val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")
3、利用IDE环境,把代码打成jar,只需要源码程序即可(其它的引用包去掉)
4、然后在IDE的代码中增加
spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")
把helloworld.jar分发到各个worker中
5、运行即可
14/12/31 15:28:57 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:21) finished in 4.500 s
14/12/31 15:28:58 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:21, took 8.608873 s
Pi is roughly 3.14468
修改后的运行代码如下:
import scala.math.random
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by cec on 12/31/14.
*/
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")
val spark = new SparkContext(conf)
spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")
val slices = if (args.length > 0) args(0).toInt else 2
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}