Hive Day01~了解Hive

最新推荐文章于 2023-03-27 09:58:35 发布

buzhidaoyaa

最新推荐文章于 2023-03-27 09:58:35 发布

阅读量188

点赞数

本文链接：https://blog.youkuaiyun.com/buzhidaoyaa/article/details/101224635

版权

Hive是一个将SQL转换为MapReduce任务的工具，用于大数据分析。它允许用户将HDFS上的文件作为表结构进行查询。Hive支持交互式查询、批处理和通过HiveServer提供服务。主要使用方式包括命令行、脚本执行和通过Beeline客户端。Hive有两种执行模式：直接执行和脚本执行。此外，Hive包含内部表和外部表的概念，内部表删除时会连同数据一起删除，而外部表仅删除元数据，保留数据。Hive的建表、数据加载和查询操作是其核心功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.Hive 是什么？

HIVE是一个可以将sql翻译为MR程序的工具
HIVE支持用户将HDFS上的文件映射为表结构，然后用户就可以输入
SQL对这些表（HDFS上的文件）进行查询分析
HIVE将用户定义的库、表结构等信息存储hive的元数据库（可以是本地derby，也可以是远程mysql）中

2.Hive的用途有哪些？

解放大数据分析程序员，不用自己写大量的mr程序来分析数据，只需要写sql脚本即可
HIVE可用于构建大数据体系下的数据仓库

3.Hive的使用方式？

方式1：可以交互式查询：

** bin/hive -----> hive>select * from t_test;
** 将hive启动为一个服务： bin/hiveserver ，然后可以在任意一台机器上使用beeline客户端连接hive服务，进行交互式查询

方式2：可以将hive作为命令一次性运行：
** bin/hive -e “sql1;sql2;sql3;sql4”
** 事先将sql语句写入一个文件比如 q.hql ，然后用hive命令执行：　　bin/hive -f q.hql

方式3：可以将方式2写入一个xxx.sh脚本中

4.hive执行的两种模式

[root@node01 ~]# hive -e “use default;create table tset_1(id int,name string); ”
这样我们创建了一个hive表
我们也可以创建一个文件，执行这个文件

[root@node01 home]# vi /home/userinfo.txt
1,xiaoming,20
2,xiaowang,22
[root@node01 home]# vi a.hql
use default;
create table test_2(id int,name string,age int)
row format delimited
fields terminated by ‘,’;
load data local inpath ‘/home/userinfo.txt’ into table test_2;
select count(*) from test_2;
然后执行
hive -f a.hql

注：HIVE看起来就像一个大的数据库

5.建表

表定义信息会被记录到hive的元数据中(mysql的hive库)
会在hdfs上的hive库目录中创建一个跟表名一致的文件夹

在这里插入图片描述
3. 往表目录中放入文件就有数据了

我们看到test_1下并没有数据，此时我们进入到hive中
hive> desc test_1；
查看test_1表的描述信息

有三个字段id和name,age
我们创建一个文件，好后上传到hdfs对应的test_1表对应的目录下
[root@node01 home]# vi test_1.txt

在这里插入图片描述
[root@node01 home]# hadoop fs -put test_1.txt /user/hive/warehouse/test_1
hive> select * from test_1;

这是因为建表语句是：create table test_1(id string,name string,age int);并没有指定分隔符”,”
所以我们删掉这个表
hive> drop table test_1;
在这里插入图片描述
create table test_1(id string,name string,age int)
row format delimited
fields terminated by ‘,’;

这样我们创建了一个test_1表，三个字段id和name，age用”,”作为分割符
在重新上传文件到hdfs对应的test_1表目录下
[root@node01 home]# hadoop fs -put test_1.txt /user/hive/warehouse/test_1
hive> select * from test_1;

在这里插入图片描述
我们在建一个文件
[root@node01 home]# vi test_1.txt.1

[root@node01 home]# hadoop fs -put test_1.txt.1 /user/hive/warehouse/test_1
hive> select * from test_1;

6.内部表和外部表（external)

内部
hive> create table t_2(id int,name string,salary bigint,add string)
row format delimited
fields terminated by ‘,’
location ‘/aa/bb’;

Vi /home/salary.txt
在这里插入图片描述
hive> load data local inpath ‘/home/salary.txt’ into table t_2;
hive> select * fromat_2;

在这里插入图片描述
然后我们删除表：
hive> drop table t_2;
在hdfs中的文件也会被删除

外部

hive> create external table t_3(id int,name string,salary bigint,add string)
row format delimited
fields terminated by ‘,’
location ‘/aa/bb’;
[root@nodeo11 home]# hadoop fs -put salary.txt /aa/bb/
[root@node01 home]# hadoop fs -cat /aa/bb/salary.txt
在这里插入图片描述
hive> select * from t_3;