问题求助:Java开发Spark Standalone出现MojoExecutionException,InvocationTargetException,OutOfMemoryError错误

本文探讨了在使用Java开发Spark程序时遇到的OutOfMemoryError问题,并提供了可能的解决方案。重点分析了设置运行内存参数后仍然存在的问题,并对比了使用Maven构建项目与普通Java项目打包成Jar的不同之处。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近在学Spark开发,是调用的Java API开发的Standalone程序,Spark版本为0.9.1,Scala为2.10.3,JDK为1.7,分布式环境为一台Master、三台Worker。采用Maven构建项目,Maven版本为3.2.1。pom.xml文件的内容如下所示:

<groupId>spark</groupId>
<artifactId>testspark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>testspark</name>
<url>http://maven.apache.org</url>

<properties>
  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
  <repository>
    <id>Akka repository</id>
    <url>http://repo.akka.io/releases</url>
  </repository>
</repositories>
<dependencies>
  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>3.8.1</version>
    <scope>test</scope>
  </dependency>
  <dependency> 
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>0.9.1</version>
  </dependency>
</dependencies>

Spark程序如下所示。

public static void main(String[] args) {
    String sparkHome = System.getenv("SPARK_HOME");
    System.out.println(sparkHome);
    String logFile = "/usr/Java/spark-0.9.1-bin-hadoop1/README.md";//spark:
    SparkConf conf = new SparkConf();
    conf.setMaster("spark://192.168.23.123:7077")
       .setAppName("Simple App")
       .setSparkHome(System.getenv("SPARK_HOME"))
       .setJars(new String[] { "target/testspark-0.0.1-SNAPSHOT.jar" })
       .set("spark.executor.memory", "1g");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> logData = sc.textFile(logFile).cache();
    long numAs = logData.filter(new Function<String, Boolean>() {
        public Boolean call(String s) {
            return s.contains("a");
        }
    }).count();

    long numBs = logData.filter(new Function<String, Boolean>() {
        public Boolean call(String s) {
            return s.contains("b");
        }
    }).count();


    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
    sc.stop();
}

运行命令如下所示:

mvn package

mvn -X exec:java -Dexec.mainClass="spark.testspark.SimpleApp

运行之后就会出现错误,调试的错误信息如下所示:

Caused by: org.apache.maven.plugin.MojoExecutionException: An exception occured while executing the Java class. null
at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:345)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:133)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 4 times (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java heap space)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[ERROR] 
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
对于错误中的OutOfMemoryError,我在程序中通过set("spark.executor.memory", "1g")设置了运行内存,为何仍然不起作用。

另外采用Java开发Spark,采用Maven构建项目和创建普通的Java项目然后打包成Jar运行,哪种方式更常用、更方便。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值