大数据进阶之路——Spark SQL基本配置_spark,2024年最新面试突击版

本文介绍了Spark SQL的基础配置和使用,包括编译问题解决、环境搭建、Standalone模式、本地IDE配置、HiveContextAPP的使用以及Spark Shell的启动。在Standalone模式下,通过配置spark-env.sh和slaves文件启动master和worker。在遇到编译失败时,需检查环境变量和依赖。在IDE中,设置-Dspark.master=local以本地运行。对于HiveContextAPP,不需要预先安装Hive即可使用,并需引入hive-site.xml配置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

[hadoop@hadoop001 spark-2.1.0]$ cat pom.xml 
[hadoop@hadoop001 spark-2.1.0]$ pwd
/home/hadoop/source/spark-2.1.0

<properties>
    <hadoop.version>2.2.0</hadoop.version>
    <protobuf.version>2.5.0</protobuf.version>
    <yarn.version>${hadoop.version}</yarn.version>
......
</properties>

...............
<profile>
  <id>hadoop-2.6</id>
  <properties>
    <hadoop.version>2.6.4</hadoop.version>
    <jets3t.version>0.9.3</jets3t.version>
    <zookeeper.version>3.4.6</zookeeper.version>
    <curator.version>2.6.0</curator.version>
  </properties>
</profile>





路径下执行

[hadoop@hadoop001 spark-2.1.0]$ pwd
/home/hadoop/source/spark-2.1.0

==> ./build/mvn -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package

编译可以运行的包

./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0

make-distribution.sh

spark-$VERSION-bin-$NAME.tgz

—>spark-2.1.0-bin-2.6.0-cdh5.7.0.tgz

编译失败
Failed to execute goal on project ...: Could not resolve dependencies for project ...

pom.xml中添加

<repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

如果scala2.10
需要添加./dev/change-scala-version.sh 2.10

环境搭建

local

  • tar -zxvf park-2.1.0-bin-2.6.0-cdh5.7.0.tgz -C ~/app/
  • 配置环境SPARK_HOME
  • source ~./bash_profile

运行
spark-shell --master local[2]

	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
a)
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259)
	
 java:104)

.............................................

  at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
 571)
  at org.apache.hadoop.hive.metastore.HiveMetaStore$HM
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值