spark Parallel Computation log

最新推荐文章于 2025-05-23 16:30:41 发布

原创最新推荐文章于 2025-05-23 16:30:41 发布 · 601 阅读

0 ·

CC 4.0 BY-SA版权

spark 专栏收录该内容

4 篇文章

订阅专栏

本文提供了一个使用Java实现的算法模拟程序，该程序通过循环计算来模拟算法运行过程，并展示了如何利用Scala进行大规模数据处理任务，包括数据集的创建、转换及存储。

1. build a method to simulate a algorithm run (Do not use the thread technology to simulate)

import java.util.Random;

public class TimerSimulate {
	Random r=new Random();
	public String test(String a) {
		String ret="";
		for (int i = 0; i < 1000; i++) {
			for (int j = 0; j < 10000; j++) {
				for (int x = 0; x < 10; x++) {
					long k = i * j * x;
					k = 888;
					ret=a+" "+k;
				}
			}
		}
		return ret;
	}

	public static void main(String[] args) {
		TimerSimulate t = new TimerSimulate();
		long begin = System.currentTimeMillis();
		for(int i=0;i<10;i++)
		{
		String str=t.test("a");
		System.out.println(str);
		}
		
		
		long end = System.currentTimeMillis();
		System.out.println("Total Time in second:" + (end - begin) / 1000);
	}
}

2. build the scala programe, please pay attention to the cluster number and the core, and the split

import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import java.lang.Thread
import java.util.Random

object NTest {
  val t:TimerSimulate =new TimerSimulate()
  def nf(str:String):(String)=
  {
     val ret=t.test(str)
    (str+"--"+ret)
  }
  
  def main(args: Array[String]) {
    val r:Random=new Random();
    val sc = new SparkContext("spark://ip:7077", "ntest", System.getenv("SPARK_HOME"), SparkContext.jarOfClass(this.getClass))
    val arr = new Array[String](args(0).toInt)
    for (i <- 0 to arr.length - 1) {
        val a=r.nextInt(1000)
    	arr(i)=a+""
    }
    val dataset1:RDD[(String)]= sc.parallelize(arr,args(1).toInt).map(nf)
    dataset1.coalesce(1, true).saveAsTextFile("hdfs://ip:9000/test/ntest");

  }
}