SparkSQL读写Hive整合

最新推荐文章于 2025-11-01 13:29:54 发布

原创最新推荐文章于 2025-11-01 13:29:54 发布 · 485 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#spark

Spark 专栏收录该内容

12 篇文章

订阅专栏

博客主要围绕 Spark 展开，包含修改 Hive 配置、Spark 配置与启动，还提及启动时的异常及解决办法。此外，介绍了用相关操作创建 Hive 表、加载数据，以及切换时的启动和语句使用。

修改 Hive 的 `hive-site.xml`

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  
  <property>
    <name>hive.metastore.uris</name>
    <value>thirft://node01:9083<value/>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.34.25:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

启动 nohup /export/servers/hive1.2.2/bin/hive --service metastore 2>&1 >> /var/log.log &

配置 Spark

1 . 复制 hive-site.xml
命令 cp hive-site.xml /export/servers/spark/conf/

2 . 复制 core-site.xml hdfs-site.xml
命令 cp core-site.xml hdfs-site.xml /export/servers/spark/conf/

启动 `bin/hive`

异常：

Logging initialized using configuration in jar:file:/export/servers/hive1.2.2/lib/hive-common-1.2.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        at org.apache.hadoop.fs.Path.initialize(Path.java:205)
        at org.apache.hadoop.fs.Path.<init>(Path.java:171)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
        ... 8 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        at java.net.URI.checkPath(URI.java:1823)
        at java.net.URI.<init>(URI.java:745)
        at org.apache.hadoop.fs.Path.initialize(Path.java:202)
        ... 11 more

解决：

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive/local</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

用 `Hive` 创建表

create external table account
(
  name STRING,
  app_name STRING,
  integral_val INT
)
row format delimited fields terminated by '\t' lines terminated by '\n' stored as TEXTFILE location '/dataset/hive';

加载数据

load data inpath '/dataset/t_new_account.txt' overwrite into table account;

在这里插入图片描述

切换至 `Spark`

1 . ./bin/spark-shell --master local[6] 启动

2 . 直接使用语句
spark.sql("use spark01")
spark.sql("select * from account limit 10")
res2.show

SparkSQL读写Hive整合

修改 Hive 的 hive-site.xml

配置 Spark

启动 bin/hive

用 Hive 创建 表

切换至 Spark

修改 Hive 的 `hive-site.xml`

启动 `bin/hive`

用 `Hive` 创建表

切换至 `Spark`