1、创建maven project,这个就在不说了。project结构如下所示,

2、添加pom文件的spark依赖。
<properties>
<spark.version>2.2.0</spark.version>
<scala.version>2.11</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<!--<dependency>-->
<!--<groupId>commons-dbutils</groupId>-->
<!--<artifactId>commons-dbutils</artifactId>-->
<!--<version>1.6</version>-->
<!--</dependency>-->
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
3、我的hive版本是1.2.2,Hadoop-2.7.1 把hive的配置文件hive-site.xml拷贝到maven project 下面的src/resource/下面。同时在该目录下面创建log4j.properties的配置文件,方便我们查看运行的日志信息:下面是log4j的配置信息,可以直接拷贝使用,
###\u8BBE\u7F6E ###
log4j.rootLogger = info,stdout
###\u8F93\u51FA\u4FE1\u606F\u5230\u63A7\u5236\u62AC ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n
下面给出hive-site.xml文件的内容。
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://172.16.0.37:9000/user/hive/warehouse</value> # 这里的地址要改成你自己的hadoop的namenode的地址
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.0.37:3306/hive?createDatabaseIfNotExist=true</value> # 这个地址是hive元数据保存的mysql的主机地址。
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
下面是demo测试程序。

下面是运行的结果显示:

813

被折叠的 条评论
为什么被折叠?



