Spark Idea Maven 开发环境搭建

本文详细介绍了如何使用IDEA配合Maven搭建Spark开发环境的过程,包括安装配置JDK、Maven、IDEA,创建Spark项目,解决常见错误,以及打包部署。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



Spark Idea Maven 开发环境搭建

一、安装jdk

jdk版本最好是1.7以上,设置好环境变量,安装过程,略。

二、安装Maven

我选择的Maven版本是3.3.3,安装过程,略。

编辑Maven安装目录conf/settings.xml文件,

?
1
2
<!-- 修改Maven 库存放目录-->
<localRepository>D:\maven-repository\repository</localRepository>

三、安装Idea

安装过程,略。

四、创建Spark项目

1、新建一个Spark项目,

 2、选择Maven,从模板创建项目,

 3、填写项目GroupId等,

 4、选择本地安装的Maven和Maven配置文件。

 

 

 

 5、next

 

 6、创建完毕,查看新项目结构:

7、自动更新Maven pom文件

8、编译项目

如果出现这种错误,这个错误是由于Junit版本造成的,可以删掉Test,和pom.xml文件中Junit的相关依赖,

 

即删掉这两个Scala类:

 和pom.xml文件中的Junit依赖:

?
1
2
3
4
5
<dependency>
   <groupId>junit</groupId>
   <artifactId>junit</artifactId>
   <version> 4.12 </version>
</dependency>

 9、刷新Maven依赖

 10、引入Jdk和Scala开发库

 

 

 

 

11、在pom.xml加入相关的依赖包,包括Hadoop、Spark等

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
<dependency>
             <groupId>commons-logging</groupId>
             <artifactId>commons-logging</artifactId>
             <version> 1.1 . 1 </version>
             <type>jar</type>
         </dependency>
         <dependency>
             <groupId>org.apache.commons</groupId>
             <artifactId>commons-lang3</artifactId>
             <version> 3.1 </version>
         </dependency>
         <dependency>
             <groupId>log4j</groupId>
             <artifactId>log4j</artifactId>
             <version> 1.2 . 9 </version>
         </dependency>
         <dependency>
             <groupId>junit</groupId>
             <artifactId>junit</artifactId>
             <version> 4.12 </version>
         </dependency>
 
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-client</artifactId>
             <version> 2.7 . 1 </version>
         </dependency>
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-common</artifactId>
             <version> 2.7 . 1 </version>
         </dependency>
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-hdfs</artifactId>
             <version> 2.7 . 1 </version>
         </dependency>
 
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-core_2. 10 </artifactId>
             <version> 1.5 . 1 </version>
         </dependency>
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-sql_2. 10 </artifactId>
             <version> 1.5 . 1 </version>
         </dependency>

  然后刷新maven的依赖,

 12、新建一个Scala Object。

 

 

 

测试代码为:

?
1
2
3
4
5
def main(args: Array[String]) {
   println( "Hello World!" )
   val sparkConf = new SparkConf().setMaster( "local" ).setAppName( "test" )
   val sparkContext = new SparkContext(sparkConf)
}

  执行,

如果报了以下错误,

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
java.lang.SecurityException: class "javax.servlet.FilterRegistration" 's signer information does not match signer information of other classes in the same package
     at java.lang.ClassLoader.checkCerts(ClassLoader.java: 952 )
     at java.lang.ClassLoader.preDefineClass(ClassLoader.java: 666 )
     at java.lang.ClassLoader.defineClass(ClassLoader.java: 794 )
     at java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 142 )
     at java.net.URLClassLoader.defineClass(URLClassLoader.java: 449 )
     at java.net.URLClassLoader.access$ 100 (URLClassLoader.java: 71 )
     at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 361 )
     at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 355 )
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java: 354 )
     at java.lang.ClassLoader.loadClass(ClassLoader.java: 425 )
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 308 )
     at java.lang.ClassLoader.loadClass(ClassLoader.java: 358 )
     at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 136 )
     at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 129 )
     at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 98 )
     at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala: 110 )
     at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala: 101 )
     at org.apache.spark.ui.WebUI.attachPage(WebUI.scala: 78 )
     at org.apache.spark.ui.WebUI$$anonfun$attachTab$ 1 .apply(WebUI.scala: 62 )
     at org.apache.spark.ui.WebUI$$anonfun$attachTab$ 1 .apply(WebUI.scala: 62 )
     at scala.collection.mutable.ResizableArray$ class .foreach(ResizableArray.scala: 59 )
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala: 47 )
     at org.apache.spark.ui.WebUI.attachTab(WebUI.scala: 62 )
     at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala: 61 )
     at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala: 74 )
     at org.apache.spark.ui.SparkUI$.create(SparkUI.scala: 190 )
     at org.apache.spark.ui.SparkUI$.createLiveUI(SparkUI.scala: 141 )
     at org.apache.spark.SparkContext.<init>(SparkContext.scala: 466 )
     at com.test.Test$.main(Test.scala: 13 )
     at com.test.Test.main(Test.scala)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 )
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 )
     at java.lang.reflect.Method.invoke(Method.java: 606 )
     at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 )

  可以把servlet-api 2.5 jar删除即可: 

 

 最好的办法是删除pom.xml中相关的依赖,即

?
1
2
3
4
5
<dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-client</artifactId>
   <version> 2.7 . 1 </version>
</dependency>

最后的pom.xml文件的依赖是

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<dependencies>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
       <version> 2.7 . 1 </version>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-hdfs</artifactId>
       <version> 2.7 . 1 </version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_2. 10 </artifactId>
       <version> 1.5 . 1 </version>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_2. 10 </artifactId>
       <version> 1.5 . 1 </version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-hive_2. 10 </artifactId>
       <version> 1.5 . 1 </version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming_2. 10 </artifactId>
       <version> 1.5 . 2 </version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-mllib_2. 10 </artifactId>
       <version> 1.5 . 2 </version>
     </dependency>
 
     <dependency>
       <groupId>com.databricks</groupId>
       <artifactId>spark-avro_2. 10 </artifactId>
       <version> 2.0 . 1 </version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming_2. 10 </artifactId>
       <version> 1.5 . 2 </version>
     </dependency>
 
   </dependencies>

  

  如果是报了这个错误,也没有什么问题,程序依旧可以执行,

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.
     at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java: 356 )
     at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java: 371 )
     at org.apache.hadoop.util.Shell.<clinit>(Shell.java: 364 )
     at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java: 80 )
     at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java: 611 )
     at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java: 272 )
     at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java: 260 )
     at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java: 790 )
     at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java: 760 )
     at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java: 633 )
     at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 )
     at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 )
     at scala.Option.getOrElse(Option.scala: 120 )
     at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala: 2084 )
     at org.apache.spark.SparkContext.<init>(SparkContext.scala: 311 )
     at com.test.Test$.main(Test.scala: 13 )
     at com.test.Test.main(Test.scala)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 )
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 )
     at java.lang.reflect.Method.invoke(Method.java: 606 )
     at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 )

  最后看到的正常输出:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Hello World!
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16 / 09 / 19 11 : 21 : 29 INFO SparkContext: Running Spark version 1.5 . 1
16 / 09 / 19 11 : 21 : 29 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.
     at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java: 356 )
     at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java: 371 )
     at org.apache.hadoop.util.Shell.<clinit>(Shell.java: 364 )
     at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java: 80 )
     at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java: 611 )
     at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java: 272 )
     at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java: 260 )
     at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java: 790 )
     at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java: 760 )
     at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java: 633 )
     at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 )
     at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 )
     at scala.Option.getOrElse(Option.scala: 120 )
     at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala: 2084 )
     at org.apache.spark.SparkContext.<init>(SparkContext.scala: 311 )
     at com.test.Test$.main(Test.scala: 13 )
     at com.test.Test.main(Test.scala)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 )
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 )
     at java.lang.reflect.Method.invoke(Method.java: 606 )
     at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 )
16 / 09 / 19 11 : 21 : 29 WARN NativeCodeLoader: Unable to load native -hadoop library for your platform... using builtin-java classes where applicable
16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: Changing view acls to: pc
16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: Changing modify acls to: pc
16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pc); users with modify permissions: Set(pc)
16 / 09 / 19 11 : 21 : 30 INFO Slf4jLogger: Slf4jLogger started
16 / 09 / 19 11 : 21 : 31 INFO Remoting: Starting remoting
16 / 09 / 19 11 : 21 : 31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp: //sparkDriver@192.168.51.143:52500]
16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'sparkDriver' on port 52500 .
16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering MapOutputTracker
16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering BlockManagerMaster
16 / 09 / 19 11 : 21 : 31 INFO DiskBlockManager: Created local directory at C:\Users\pc\AppData\Local\Temp\blockmgr-f9ea7f8c-68f9-4f9b-a31e-b87ec2e702a4
16 / 09 / 19 11 : 21 : 31 INFO MemoryStore: MemoryStore started with capacity 966.9 MB
16 / 09 / 19 11 : 21 : 31 INFO HttpFileServer: HTTP File server directory is C:\Users\pc\AppData\Local\Temp\spark-64cccfb4-46c8- 4266 -92c1-14cfc6aa2cb3\httpd-5993f955-0d92- 4233 -b366-c9a94f7122bc
16 / 09 / 19 11 : 21 : 31 INFO HttpServer: Starting HTTP Server
16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'HTTP file server' on port 52501 .
16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering OutputCommitCoordinator
16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'SparkUI' on port 4040 .
16 / 09 / 19 11 : 21 : 31 INFO SparkUI: Started SparkUI at http: //192.168.51.143:4040
16 / 09 / 19 11 : 21 : 31 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16 / 09 / 19 11 : 21 : 31 INFO Executor: Starting executor ID driver on host localhost
16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52520 .
16 / 09 / 19 11 : 21 : 31 INFO NettyBlockTransferService: Server created on 52520
16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMaster: Trying to register BlockManager
16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMasterEndpoint: Registering block manager localhost: 52520 with 966.9 MB RAM, BlockManagerId(driver, localhost, 52520 )
16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMaster: Registered BlockManager
16 / 09 / 19 11 : 21 : 31 INFO SparkContext: Invoking stop() from shutdown hook
16 / 09 / 19 11 : 21 : 32 INFO SparkUI: Stopped Spark web UI at http: //192.168.51.143:4040
16 / 09 / 19 11 : 21 : 32 INFO DAGScheduler: Stopping DAGScheduler
16 / 09 / 19 11 : 21 : 32 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16 / 09 / 19 11 : 21 : 32 INFO MemoryStore: MemoryStore cleared
16 / 09 / 19 11 : 21 : 32 INFO BlockManager: BlockManager stopped
16 / 09 / 19 11 : 21 : 32 INFO BlockManagerMaster: BlockManagerMaster stopped
16 / 09 / 19 11 : 21 : 32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16 / 09 / 19 11 : 21 : 32 INFO SparkContext: Successfully stopped SparkContext
16 / 09 / 19 11 : 21 : 32 INFO ShutdownHookManager: Shutdown hook called
16 / 09 / 19 11 : 21 : 32 INFO ShutdownHookManager: Deleting directory C:\Users\pc\AppData\Local\Temp\spark-64cccfb4-46c8- 4266 -92c1-14cfc6aa2cb3
 
Process finished with exit code 0

  至此,开发环境搭建完毕。

五、打jar包

1、新建一个Scala Object

代码是:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package com.test
 
import org.apache.spark.{SparkConf, SparkContext}
 
/**
   * Created by pc on 2016/9/20.
   */
object WorldCount {
 
   def main(args: Array[String]) {
     val dataFile = args( 0 )
     val output = args( 1 )
     val sparkConf = new SparkConf().setAppName( "WorldCount" )
     val sparkContext = new SparkContext(sparkConf)
     val lines = sparkContext.textFile(dataFile)
     val counts = lines.flatMap(_.split( "," )).map(s => (s, 1 )).reduceByKey((a,b) => a+b)
     counts.saveAsTextFile(output)
     sparkContext.stop()
   }
}

 

2、  File -》Project Structure 

 

3、点击ok

 

可以设置jar包输出目录:

 

4、build Artifact

5、运行:

把测试文件放到HDFS的/test/ 目录下,提交:

?
1
spark-submit -- class com.test.WorldCount --master spark: //192.168.18.151:7077 sparktest.jar /test/data.txt /test/test-01

6、如果出现以下错误

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
         at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java: 240 )
         at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java: 193 )
         at java.util.jar.JarVerifier.processEntry(JarVerifier.java: 305 )
         at java.util.jar.JarVerifier.update(JarVerifier.java: 216 )
         at java.util.jar.JarFile.initializeVerifier(JarFile.java: 345 )
         at java.util.jar.JarFile.getInputStream(JarFile.java: 412 )
         at sun.misc.JarIndex.getJarIndex(JarIndex.java: 137 )
         at sun.misc.URLClassPath$JarLoader$ 1 .run(URLClassPath.java: 674 )
         at sun.misc.URLClassPath$JarLoader$ 1 .run(URLClassPath.java: 666 )
         at java.security.AccessController.doPrivileged(Native Method)
         at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java: 665 )
         at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java: 638 )
         at sun.misc.URLClassPath$ 3 .run(URLClassPath.java: 366 )
         at sun.misc.URLClassPath$ 3 .run(URLClassPath.java: 356 )
         at java.security.AccessController.doPrivileged(Native Method)
         at sun.misc.URLClassPath.getLoader(URLClassPath.java: 355 )
         at sun.misc.URLClassPath.getLoader(URLClassPath.java: 332 )
         at sun.misc.URLClassPath.getResource(URLClassPath.java: 198 )
         at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 358 )
         at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 355 )
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java: 354 )
         at java.lang.ClassLoader.loadClass(ClassLoader.java: 425 )
         at java.lang.ClassLoader.loadClass(ClassLoader.java: 358 )
         at java.lang.Class.forName0(Native Method)
         at java.lang.Class.forName(Class.java: 270 )
         at org.apache.spark.util.Utils$.classForName(Utils.scala: 173 )
         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala: 641 )
         at org.apache.spark.deploy.SparkSubmit$.doRunMain$ 1 (SparkSubmit.scala: 180 )
         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala: 205 )
         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: 120 )
         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

  就使用WinRAR打开jar包, 删除META-INF目录下的除了mainfest.mf,.rsa及maven目录以外的其他所有文件

 

### 配置和搭建Spark开发环境 #### 1. 安装必要的软件 为了在 IntelliJ IDEA 中成功配置 Spark 开发环境,需要先完成以下几个步骤: - **安装 IntelliJ IDEA** 确保已安装最新版本的 IntelliJ IDEA(社区版或 Ultimate 版均可)。如果尚未安装,请前往官方网站下载并按照官方指南完成安装。 - **安装 JDK** 在 C 盘或其他自定义路径下创建 `java` 文件夹,并将下载好的 JDK 程序安装至该目录。安装完成后需配置系统的环境变量[^1]。具体操作如下: - 打开系统属性 -> 高级系统设置 -> 环境变量。 - 新建或编辑 `JAVA_HOME` 变量,将其指向 JDK 的安装路径(如 `C:\java\jdk-version`)。 - 将 `%JAVA_HOME%\bin` 添加到 `Path` 系统变量中。 #### 2. 下载 Apache SparkMaven 构建工具 - **Apache Spark** 访问 [Apache Spark](https://spark.apache.org/downloads.html),选择适合的操作系统版本并下载压缩包。解压后放置于指定位置以便后续引用。 - **Maven 工具** 如果未安装 Maven,则需要从其官网获取对应版本并完成安装。同样需要配置 `MAVEN_HOME` 环境变量以及更新 `Path` 路径。 #### 3. 创建项目并导入依赖项 打开 IntelliJ IDEA 并执行以下操作: - **新建 Scala 或 Java 项目** - 使用 File -> New Project 来启动新工程向导。 - 选择基于 Maven 的构建方式以方便管理外部库文件。 - **修改 pom.xml 文件引入 Spark 核心组件** 以下是典型的 POM 文件片段用于加载 Spark 支持的相关模块: ```xml <dependencies> <!-- Spark Core --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>3.4.0</version> <!-- 替换为目标版本号 --> </dependency> <!-- Spark SQL (可选) --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>3.4.0</version> </dependency> </dependencies> ``` #### 4. 编写测试代码验证环境连通性 编写一段简单的 WordCount 示例来确认整个流程无误: ```scala import org.apache.spark.{SparkConf, SparkContext} object WordCount { def main(args: Array[String]): Unit = { // 初始化 Spark Conf 对象 val sparkConf = new SparkConf() .setMaster("local[*]") // 设置运行模式为本地调试 .setAppName("Word Count Application") // 实例化 Spark Context 上下文实例 val sc = new SparkContext(sparkConf) try { // 加载数据源进行处理逻辑实现... val linesRDD = sc.textFile("input.txt") val wordsRDD = linesRDD.flatMap(_.split("\\s+")) val wordCounts = wordsRDD.map(word => (word, 1)).reduceByKey(_ + _) // 输出统计结果保存回磁盘或者打印屏幕展示 wordCounts.collect().foreach(println) } finally { // 正常退出前释放资源 sc.stop() } } } ``` 注意以上脚本中的输入文件名应替换实际存在的文本资料地址[^2]。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值