spark任务提交之SparkLauncher

本文详细介绍了如何使用SparkLauncher在Linux和Windows环境下运行Spark程序,包括配置环境参数、设置主类和资源路径,以及常见问题解决方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近需要做一个UI,在UI上做一个可以提交的spark程序的功能;

1-zeppelin就是这样的一个工具,其内部也是比较繁琐的。有兴趣的可以了解下。

2-SparkLauncher,spark自带的类

linux下其基本用法:

    public static void main(String[] args) throws Exception {
        HashMap<String, String> envParams = new HashMap<>();
        envParams.put("YARN_CONF_DIR", "/home/hadoop/cluster/hadoop-release/etc/hadoop");
        envParams.put("HADOOP_CONF_DIR", "/home/hadoop/cluster/hadoop-release/etc/hadoop");
        envParams.put("SPARK_HOME", "/home/hadoop/cluster/spark-new");
        envParams.put("SPARK_PRINT_LAUNCH_COMMAND", "1");

        SparkAppHandle spark = new SparkLauncher(envParams)
                .setAppResource("/home/hadoop/cluster/spark-new/examples/jars/spark-examples_2.11-2.2.1.jar")
                .setMainClass("org.apache.spark.examples.SparkPi")
                .setMaster("yarn")
                .startApplication();

        Thread.sleep(100000);
    }

运行结果:

信息: 18/12/03 18:12:12 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.462 s
十二月 03, 2018 6:12:12 下午 org.apache.spark.launcher.OutputRedirector redirect
信息: 18/12/03 18:12:12 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.395705 s
十二月 03, 2018 6:12:12 下午 org.apache.spark.launcher.OutputRedirector redirect
信息: Pi is roughly 3.1461157305786527

 

windows下运行:

如果linux能运行,那就安装windows下所依赖包,包含jdk,hadoop,scala,spark;

可以参考https://blog.youkuaiyun.com/u011513853/article/details/52865076

代码贴上:

public class SparkLauncherTest {
    private static String YARN_CONF_DIR = null;
    private static String HADOOP_CONF_DIR = null;
    private static String SPARK_HOME = null;
    private static String SPARK_PRINT_LAUNCH_COMMAND = "1";
    private static String Mater = null;
    private static String appResource = null;
    private static String mainClass = null;

    public static void main(String[] args) throws Exception {
        if (args.length != 1) {
            System.out.println("Usage: ServerStatisticSpark <local>");
            System.exit(1);
        }

        TrackerConfig trackerConfig = TrackerConfig.loadConfig();

        if ("local".equals(args[0])){
            YARN_CONF_DIR="D:\\software\\hadoop-2.4.1\\etc\\hadoop";
            HADOOP_CONF_DIR="D:\\software\\hadoop-2.4.1\\etc\\hadoop";
            SPARK_HOME="D:\\spark-new";
            Mater = "local";
            appResource = "D:\\spark-new\\examples\\jars\\spark-examples_2.11-2.2.1.jar";
        } else {
            YARN_CONF_DIR="/home/hadoop/cluster/hadoop-release/etc/hadoop";
            HADOOP_CONF_DIR="/home/hadoop/cluster/hadoop-release/etc/hadoop";
            SPARK_HOME="/home/hadoop/cluster/spark-new";
            Mater = "yarn";
            appResource = "/home/hadoop/cluster/spark-new/examples/jars/spark-examples_2.11-2.2.1.jar";
        }

        HashMap<String, String> envParams = new HashMap<>();
        envParams.put("YARN_CONF_DIR", YARN_CONF_DIR);
        envParams.put("HADOOP_CONF_DIR", HADOOP_CONF_DIR);
        envParams.put("SPARK_HOME", SPARK_HOME);
        envParams.put("SPARK_PRINT_LAUNCH_COMMAND", SPARK_PRINT_LAUNCH_COMMAND);

        mainClass = "org.apache.spark.examples.SparkPi";
        SparkAppHandle spark = new SparkLauncher(envParams)
                .setAppResource(appResource)
                .setMainClass(mainClass)
                .setMaster(Mater)
                .startApplication();
        Thread.sleep(100000);
    }

}

运行结果:

信息: 18/12/04 17:01:11 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.808691 s
十二月 04, 2018 5:01:11 下午 org.apache.spark.launcher.OutputRedirector redirect
信息: Pi is roughly 3.1455757278786396

遇到的问题,sparkLauncher一直运行不了;

这时hadoop,jdk都用了很长时间,排除其原因;

本地可以编写和运行scala,应该也不属于其中的问题;

最后发现cmd运行spark\bin下的spark-submit会出现问题。于是重新拷贝linux下的spark包;

发现spark-shell可以正常运行,原来会报错:不是内部或外部命令,也不是可运行的程序或批处理文件

 

 

现在还存在的问题:

打jar包时,会有部分类打不进去,报错信息类没有找到;

 

等UI做成后,会更新整个流程。

转载于:https://www.cnblogs.com/parent-absent-son/p/10060364.html

Java使用SparkLauncher提交任务的步骤如下: 1. 引入相关依赖 ```xml <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.4.7</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-launcher_2.11</artifactId> <version>2.4.7</version> </dependency> ``` 2. 创建SparkLauncher实例 ```java SparkLauncher launcher = new SparkLauncher() .setAppName("MyApp") .setMaster("local") .setSparkHome("/path/to/spark/home") .setAppResource("/path/to/my/app.jar") .setMainClass("com.mycompany.MyApp") .addAppArgs("arg1", "arg2") .setConf(SparkLauncher.DRIVER_MEMORY, "2g"); ``` 3. 启动任务 ```java Process process = launcher.launch(); ``` 4. 监控任务状态 ```java InputStream stdout = process.getInputStream(); InputStream stderr = process.getErrorStream(); // 启动一个线程来处理stdout和stderr new Thread() { public void run() { try (BufferedReader reader = new BufferedReader(new InputStreamReader(stdout))) { String line = null; while ((line = reader.readLine()) != null) { System.out.println(line); } } catch (IOException e) { e.printStackTrace(); } } }.start(); new Thread() { public void run() { try (BufferedReader reader = new BufferedReader(new InputStreamReader(stderr))) { String line = null; while ((line = reader.readLine()) != null) { System.err.println(line); } } catch (IOException e) { e.printStackTrace(); } } }.start(); // 等待任务完成 int exitCode = process.waitFor(); System.out.println("Task completed with exit code: " + exitCode); ``` 其中,stdout和stderr分别是任务的标准输出和标准错误。启动一个线程来读取它们的内容,避免阻塞主线程。waitFor方法会等待任务完成并返回任务的退出码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值