Spark 自带Demo(计算圆周率)的运行

本文介绍了如何在Ubuntu系统上运行Spark自带的计算圆周率的Demo。首先,确认Spark安装在/usr/local/Spark路径下,接着进入bin目录,然后在examples目录中找到并运行计算圆周率的示例程序,通过简单的命令行操作,大约耗时0.7秒完成计算,展示了Spark快速处理数据的能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本人Spark 安装路径:/usr/local/Spark

(1)进入Spark安装目录;

(2)注意到该目录下有bin目录和examples目录,提交命令被放置于bin目录中,计算圆周率的示例放置于examples中

        安装目录下输入下面的指令:

bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.2.1.jar

(3)键入回车,得到下面的输出结果:

        大约耗时0.7秒

        

        全部输出如下:

18/05/05 16:08:41 INFO spark.SparkContext: Running Spark version 2.2.1
18/05/05 16:08:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/05 16:08:42 INFO spark.SparkContext: Submitted application: Spark Pi
18/05/05 16:08:42 INFO spark.SecurityManager: Changing view acls to: cims
18/05/05 16:08:42 INFO spark.SecurityManager: Changing modify acls to: cims
18/05/05 16:08:42 INFO spark.SecurityManager: Changing view acls groups to: 
18/05/05 16:08:42 INFO spark.SecurityManager: Changing modify acls groups to: 
18/05/05 16:08:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cims); groups with view permissions: Set(); users  with modify permissions: Set(cims); groups with modify permissions: Set()
18/05/05 16:08:42 INFO util.Utils: Successfully started service 'sparkDriver' on port 36623.
18/05/05 16:08:42 INFO spark.SparkEnv: Registering MapOutputTracker
18/05/05 16:08:42 INFO spark.SparkEnv: Registering BlockManagerMaster
18/05/05 16:08:42 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/05/05 16:08:42 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/05/05 16:08:42 INFO storage.DiskBlockManager: Created local directory at /usr/local/spark/blockmgr-d2b44b19-3fd0-4f27-ade8-6de1fc445951
18/05/05 16:08:42 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
18/05/05 16:08:42 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/05/05 16:08:43 INFO util.log: Logging initialized @2001ms
18/05/05 16:08:43 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/05/05 16:08:43 INFO server.Server: Started @2091ms
18/05/05 16:08:43 INFO server.AbstractConnector: Started ServerConnector@596df867{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/05/05 16:08:43 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@425357dd{/jobs,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@52eacb4b{/jobs/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a551a63{/jobs/job,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ec2bf82{/jobs/job/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6cc0bcf6{/stages,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32f61a31{/stages/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@669253b7{/stages/stage,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49a64d82{/stages/stage/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66d23e4a{/stages/pool,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d9d1b69{/stages/pool/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@251f7d26{/storage,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@52d10fb8{/storage/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1fe8d51b{/storage/rdd,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@22680f52{/storage/rdd/json,null,AVAILABLE,@Spark}
18/05/05 16:08:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandle
### 使用 Spark 实现蒙特卡洛方法计算圆周率 Spark 提供了一种高效的方式来并行化处理大规模数据集,因此非常适合用来实现蒙特卡洛模拟来估算圆周率 π。以下是详细的说明以及 Python 和 Scala 的示例代码。 #### 蒙特卡洛方法简介 蒙特卡洛方法是一种通过随机抽样来进行数值估计的技术。对于圆周率计算,可以通过在单位正方形内随机生成点,并统计这些点落在四分之一单位圆内的比例来近似得到 π 值[^2]。 #### Spark 并行化的思路 为了利用 Spark 的分布式特性加速计算过程,可以将大量随机数的生成和判断分布到多个节点上执行。具体而言: - 每个分区独立生成一定数量的随机点。 - 判断每个点是否位于单位圆内部。 - 将各分区的结果汇总起来得出最终的 π 估值。 下面分别给出 Python 和 Scala 版本的具体实现: --- #### Python 示例代码 ```python from pyspark import SparkConf, SparkContext import random def inside(p): x, y = random.random(), random.random() return 1 if x*x + y*y < 1 else 0 conf = SparkConf().setAppName("Estimate Pi") sc = SparkContext(conf=conf) num_samples = 100000000 # 总样本数 count = sc.parallelize(range(0, num_samples)) \ .map(lambda _: inside(None)) \ .reduce(lambda a, b: a + b) pi_estimate = 4 * count / num_samples print(f"Estimated value of Pi is {pi_estimate}") sc.stop() ``` 此脚本定义了一个 `inside` 函数用于检测单个点是否处于单位圆内,随后创建一个 RDD 来表示所有的采样点,并调用 map-reduce 方法完成计数操作[^4]。 --- #### Scala 示例代码 ```scala import org.apache.spark.{SparkConf, SparkContext} object MonteCarloPi { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("Monte Carlo Pi Estimation") val sc = new SparkContext(conf) val numSamples = 100000000L // 总样本数 val nInside = sc.parallelize(1 to numSamples.toInt).map { _ => val x = Math.random() val y = Math.random() if (x * x + y * y < 1) 1 else 0 }.reduce(_ + _) val piEstimate = 4.0 * nInside.toDouble / numSamples println(s"Estimated value of Pi is $piEstimate") sc.stop() } } ``` 这段 Scala 程序同样实现了相同的逻辑:生成指定数量的随机坐标对 `(x,y)` ,并通过条件筛选确定它们是否落入单位圆范围内[^1]。 --- #### 如何运行该程序? 假设已经配置好 Spark 环境,则可通过如下命令提交作业至本地模式或者 YARN 集群运行[^4]: ```bash spark-submit monte_carlo_pi.py --master local[*] # 或者针对集群环境调整参数如 master=yarn deploy-mode=client ``` 以上即为完整的解决方案及其部署指南。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值