6.hive基础使用

一、验证hive元数据

1. hive

show databases;

[root@hadoop sbin]# hive
...
hive> show databases;
OK
default
school
Time taken: 3.881 seconds, Fetched: 2 row(s)
hive> 

2. MySQL

mysql -uroot -p

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| KEY_CONSTRAINTS           |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
+---------------------------+
57 rows in set (0.00 sec)

二、hive内部命令使用

1、基础操作

  • 进入hive后,并不需要退出,也可以使用bash shell;命令前加上!后边带上;
hive> !pwd;
/opt/hadoop/sbin
hive> !cd /root/;
hive> !pwd;
/opt/hadoop/sbin
hive> 
  • 查看数据库HDFS分布式文件系统根目录下文件
    MySQL查看
mysql> select DB_LOCATION_URI, NAME, OWNER_NAME from DBS;
+--------------------------------------------------+---------+------------+
| DB_LOCATION_URI                                  | NAME    | OWNER_NAME |
+--------------------------------------------------+---------+------------+
| hdfs://hadoop:8020/user/hive/warehouse           | default | public     |
| hdfs://hadoop:8020/user/hive/warehouse/school.db | school  | root       |
+--------------------------------------------------+---------+------------+
2 rows in set (0.00 sec)

hive使用dfs命令查看

hive> dfs -ls hdfs://hadoop:8020/;
Found 2 items
drwx-wx-wx   - root supergroup          0 2023-11-20 21:16 hdfs://hadoop:8020/tmp
drwxr-xr-x   - root supergroup          0 2023-11-20 22:28 hdfs://hadoop:8020/user

2、hive基本数据类型

  • hive创建一个school库,创建students表

create database if not exists school;
create table school.students(id bigint, name string, score double, age int) row format delimited fields terminated by ',';

hive> create database if not exists school;
OK
Time taken: 0.024 seconds
hive> use school;
OK
Time taken: 0.027 seconds
hive> create table school.students(id bigint, name string, score double, age int) row format delimited fields terminated by ',';
OK
Time taken: 0.092 seconds

hive> show tables;
OK
students
test
Time taken: 0.027 seconds, Fetched: 2 row(s)
  • 构造数据,创建文件写入下列内容
[root@hadoop ~]# vim student.txt
[root@hadoop ~]# cat student.txt 
10010,li,100,23
10087,zgs,90,22
10086,ming,98,21

load data local inpath '/root/s.txt' into table students;

hive> load data local inpath '/root/student.txt' into table students;
Loading data to table school.students
OK
Time taken: 1.552 seconds
hive> select * from students;
OK
10010   li      100.0   23
10087   zgs     90.0    22
10086   ming    98.0    21
NULL    NULL    NULL    NULL
Time taken: 1.39 seconds, Fetched: 4 row(s)

3. hive集合数据类型

  • 1、STRUCT数据类型

create database enterprise;

use enterprise;

create table enterprise.employee(id int, info struct<name:string, age:int>) row format delimited fields terminated by ',' collection items terminated by ':';

load data local inpath '/root/e.txt' into table employee;

select * from employee;

  • 准备数据文件
[root@hadoop ~]# vim e.txt
[root@hadoop ~]# cat e.txt 
10001,zhangsan:22
10010,lilili:24
10011,wuyi:21
  • hive创建表导入数据
hive> create database enterprise;
OK
Time taken: 0.132 seconds

hive> show databases;
OK
default
enterprise
school
Time taken: 0.055 seconds, Fetched: 3 row(s)
hive> use enterprise;
OK
Time taken: 0.023 seconds
hive> create table enterprise.employee(id int, info struct<name:string, age:int>) row format delimited fields terminated by ',' collection items terminated by ':';
OK
Time taken: 0.116 seconds
hive> load data local inpath '/root/e.txt' into table employee;
Loading data to table enterprise.employee
OK
Time taken: 0.464 seconds
hive> select * from employee;
OK
10001   {"name":"zhangsan","age":22}
10010   {"name":"lilili","age":24}
10011   {"name":"wuyi","age":21}
Time taken: 0.106 seconds, Fetched: 3 row(s)
  • 2、map数据类型
  • 准备数据
[root@hadoop ~]# vim e1.txt
[root@hadoop ~]# cat e1.txt 
12345   job:80,team:90,person:50
12344   job:60,team:70
12343   job:90,team:80,person:100
  • hive数据表创建,导入数据
hive> use enterprise;
OK
Time taken: 3.407 seconds
hive> create table enterprise.employee1(id int, perf map<string, int>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';
OK
Time taken: 0.607 seconds
hive> show tables;
OK
employee
employee1
Time taken: 0.09 seconds, Fetched: 2 row(s)
hive> load data local inpath '/root/e1.txt' into table employee1;
Loading data to table enterprise.employee1
OK
Time taken: 1.099 seconds
hive> select * from employee1;
OK
12345   {"job":80,"team":90,"person":50}
NULL    NULL
12343   {"job":90,"team":80,"person":100}
Time taken: 1.161 seconds, Fetched: 3 row(s)hive> select perf['team'] from employee1;
OK
90
NULL
80
Time taken: 0.357 seconds, Fetched: 3 row(s)
  • 3、ARRAY数据类型
  • 准备数据
[root@hadoop ~]# cat e2.txt 
zhangsan,123321:231312:312123
lisi,978798:798987:798798
wanger,210120:120012:120210
  • 创建表导入数据

create table enterprise.employee2(name string, emp_id_list array<int>) row format delimited fields terminated by ',' collection items terminated by ':' ;

load data local inpath '/root/e2.txt' into table employee2;

hive> use enterprise;
OK
Time taken: 0.035 seconds
hive> create table enterprise.employee2(name string, emp_id_list array<int>) row format delimited fields terminated by ',' collection items terminated by ':' ;
OK
Time taken: 0.05 seconds
hive> load data local inpath '/root/e2.txt' into table employee2;
Loading data to table enterprise.employee2
OK
Time taken: 0.321 seconds
hive> select * from employee2;
OK
zhangsan        [123321,231312,312123]
lisi    [978798,798987,798798]
wanger  [210120,120012,120210]
Time taken: 0.13 seconds, Fetched: 3 row(s)
hive> select emp_id_list[0] from employee2;
OK
123321
978798
210120
Time taken: 0.359 seconds, Fetched: 3 row(s)

三、综合应用

1、创建三类数据的表

create table enterprise.employee3(name string, salary double, subordinates array<string>, deductions map<string,float>, address struct<street:string,city:string, state:string, zip:int>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';
hive> create table enterprise.employee3(name string, salary double, subordinates array<string>, deductions map<string,float>, address struct<street:string,city:string, state:string, zip:int>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';
OK
Time taken: 0.126 seconds
hive> show tables;
OK
employee
employee1
employee2
employee3
Time taken: 0.074 seconds, Fetched: 4 row(s)
hive> desc employee3;
OK
name                    string                                      
salary                  double                                      
subordinates            array<string>                               
deductions              map<string,float>                           
address                 struct<street:string,city:string,state:string,zip:int>                      
Time taken: 0.064 seconds, Fetched: 5 row(s)

2、导入数据

[root@hadoop ~]# vim e3.txt 
[root@hadoop ~]# cat e3.txt 
zs      8000    li1,li2,li3     cd:30,zt:50,sw:100      nanjingdajie,NJ,ziyou,10012
lis     9000    w1,w2,w3        cd:10,zt:40,sw:33       anhualu,BJ,ziiyou2,10021
hive> load data local inpath '/root/e3.txt' into table employee3;
Loading data to table enterprise.employee3
OK
Time taken: 0.281 seconds
hive> select * from employee3;
OK
zs      8000.0  ["li1","li2","li3"]     {"cd":30.0,"zt":50.0,"sw":100.0}        {"street":"nanjingdajie","city":"NJ","state":"ziyou","zip":10012}
lis     9000.0  ["w1","w2","w3"]        {"cd":10.0,"zt":40.0,"sw":33.0} {"street":"anhualu","city":"BJ","state":"ziiyou2","zip":10021}
Time taken: 0.162 seconds, Fetched: 2 row(s)
hive> select subordinates[0], deductions['cd'], address.city from enterprise.employee3;
OK
li1     30.0    NJ
w1      10.0    BJ
Time taken: 0.421 seconds, Fetched: 2 row(s)
<think>我们正在处理一个Java运行时异常:`java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.IMetaStoreClient`。这个异常表明JVM在类路径中找不到`IMetaStoreClient`类,该类属于Hive Metastore客户端。通常,这个问题发生在使用Hive或与Hive交互的应用程序(如Sqoop、Spark等)时,缺少必要的Hive依赖库。 根据引用[3]中的经验,类似的问题(如缺少`org.apache.hadoop.hive.conf.HiveConf`)可以通过将Hive的JAR包(特别是`hive-common.jar`)复制到应用程序(如Sqoop)的lib目录下解决。但是,这里缺少的是`IMetaStoreClient`,它位于`hive-metastore`相关的JAR包中。 ### 解决步骤 1. **确认Hive Metastore客户端类的位置**: `IMetaStoreClient`类通常包含在Hive的`hive-exec`或`hive-metastore`JAR包中。具体取决于Hive版本,但一般情况下,它位于`hive-metastore-<version>.jar`中。例如,在CDH5.14.2中,可能是`hive-metastore-1.1.0-cdh5.14.2.jar`。 2. **检查应用程序的类路径**: 确保运行Java程序时,类路径中包含了包含`IMetaStoreClient`类的JAR文件。如果使用命令行运行,确保通过`-cp`参数指定了所有必要的JAR包。如果是在容器(如Tomcat)中运行,确保JAR包放在`WEB-INF/lib`目录下。 3. **添加缺失的JAR包**: - 找到Hive安装目录下的`lib`文件夹,里面应该包含`hive-metastore-<version>.jar`(或类似名称的JAR包)。 - 将这个JAR包复制到你的应用程序的类路径中。例如: - 如果是独立Java程序,复制到项目的`lib`目录,并确保构建路径包含它。 - 如果是Sqoop,复制到Sqoop的`lib`目录(如引用[3]的做法)。 - 如果是Hadoop作业,确保在提交作业时通过`-libjars`参数指定。 4. **检查版本兼容性**: 确保添加的Hive JAR包版本与你的Hadoop集群版本兼容。不兼容的版本可能会导致其他问题。 5. **使用构建工具管理依赖(推荐)**: 如果项目使用Maven或Gradle,可以在构建文件中添加Hive Metastore客户端的依赖。例如,在Maven的`pom.xml`中添加: ```xml <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-metastore</artifactId> <version>你的Hive版本</version> </dependency> ``` 注意:请将`你的Hive版本`替换为实际版本,如`1.2.1`。同时,确保仓库中有该依赖。 6. **检查Hive Metastore服务状态**: 如果应用程序需要连接Hive Metastore服务,请确保该服务正在运行且网络可达。 ### 示例操作 假设你的Hive安装在`/opt/hive`,应用程序是Sqoop,安装在`/opt/sqoop`,并且报错信息是运行Sqoop时出现的。那么,你可以执行以下命令: ```bash # 查找包含IMetaStoreClient类的JAR包 find /opt/hive/lib -name "hive-metastore*.jar" # 假设找到的JAR包是/opt/hive/lib/hive-metastore-1.1.0.jar cp /opt/hive/lib/hive-metastore-1.1.0.jar /opt/sqoop/lib/ ``` 然后重新运行Sqoop命令。 ### 注意事项 - 如果问题仍然存在,检查是否有多个版本的Hive JAR包在类路径中,这可能会导致冲突。 - 确保复制的JAR包与集群中其他组件的版本一致。 ### 引用说明 参考引用[3]中解决类似`ClassNotFoundException`的方法,将Hive的JAR包复制到应用程序的类路径下是常见且有效的解决方案[^3]。同时,引用[4]提到设置环境变量`HADOOP_HOME`也可能相关,但本例中更可能是缺少具体的JAR包[^4]。 §§相关问题§§ 1. 如何查看Java程序的类路径? 2. Hive Metastore的作用是什么? 3. 如何解决Hadoop生态系统中常见的类冲突问题? 4. 在Maven项目中如何管理Hive依赖的版本?</think>### 解决 `java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.IMetaStoreClient` 的方法 此错误表示 Java 运行时环境在类路径中找不到 Hive Metastore 客户端类,通常由以下原因导致: #### 1. **缺少 Hive Metastore 依赖库** - **问题本质**:`IMetaStoreClient` 类位于 `hive-metastore` JAR 包中,未包含在类路径。 - **解决方案**: - 找到 Hive 安装目录下的 `lib` 文件夹(如 `/opt/hive/lib`)。 - 复制以下 JAR 包到应用程序的类路径中: ```bash cp /path/to/hive/lib/hive-metastore-*.jar /your/app/lib/ cp /path/to/hive/lib/hive-exec-*.jar /your/app/lib/ # 依赖项 cp /path/to/hive/lib/hive-common-*.jar /your/app/lib/ # 基础依赖 ``` - **验证**:检查 JAR 包是否包含目标类: ```bash jar -tf hive-metastore-3.1.2.jar | grep IMetaStoreClient ``` #### 2. **构建工具配置问题(Maven/Gradle)** - **问题本质**:项目依赖未正确声明。 - **解决方案**: - 在 Maven 的 `pom.xml` 中添加 Hive Metastore 依赖(版本需匹配集群环境): ```xml <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-metastore</artifactId> <version>3.1.2</version> <!-- 替换为实际版本 --> </dependency> ``` - 执行 `mvn clean install` 重新构建项目。 #### 3. **环境变量配置错误** - **问题本质**:`HADOOP_CLASSPATH` 未包含 Hive 库。 - **解决方案**: - 在启动脚本中添加 Hive 库到类路径: ```bash export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/hive/lib/* ``` - 或直接在 Java 命令中指定: ```bash java -cp "/your/app/lib/*:/path/to/hive/lib/*" com.your.MainClass ``` #### 4. **版本兼容性问题** - **问题本质**:HadoopHive 或依赖库版本冲突。 - **解决方案**: - 检查组件版本是否一致(如 Hadoop 3.x 需匹配 Hive 3.x)。 - 使用 `mvn dependency:tree` 排查冲突依赖,通过 `<exclusions>` 解决冲突。 #### 5. **Hive Metastore 服务未运行** - **问题本质**:客户端需连接 Metastore 服务。 - **解决方案**: - 启动 Hive Metastore 服务: ```bash hive --service metastore & ``` - 确认服务端口(默认 9083)可访问。 ### 验证步骤 1. 检查类路径是否包含所有必要 JAR 包: ```bash echo $CLASSPATH ``` 2. 使用简单代码测试类加载: ```java public class TestLoad { public static void main(String[] args) { try { Class.forName("org.apache.hadoop.hive.metastore.IMetaStoreClient"); System.out.println("Class found!"); } catch (ClassNotFoundException e) { e.printStackTrace(); } } } ``` ### 引用说明 - 类似问题可通过添加 Hive JAR 包到类路径解决(参考引用[3])[^3]。 - 环境变量配置错误可能导致类加载失败(参考引用[4])[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值