Hive常用命令记录

最新推荐文章于 2025-06-04 11:41:22 发布

madman1990

最新推荐文章于 2025-06-04 11:41:22 发布

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据学习

本文链接：https://blog.youkuaiyun.com/u010316188/article/details/79560723

大数据学习专栏收录该内容

30 篇文章

订阅专栏

这篇博客主要介绍了Hive的DDL语法和常用命令，包括如何查看帮助信息。重点讲解了分区表的概念，指出分区表是在普通表基础上增加文件夹层次，利用分区字段的值来组织数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Hive的常用命令
HIve DDL语法参考
使用bin/hive -help查看帮助命令

[hadoop@hadoop apache-hive-0.13.1-bin]$ bin/hive -help
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
 -h <hostname>                    connecting to Hive Server on remote host
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -p <port>                        connecting to Hive Server on port number
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

直接指定使用hadoop14数据库
[hadoop@hadoop apache-hive-0.13.1-bin]$ bin/hive --database hadoop14

在-e后面直接执行HQL语句
bin/hive -e 'show databases'

把执行结果追加到文件中
bin/hive -e 'show databases' > hivetest.txt

在linuxshell命令行执行一个写有sql语句的文件
bin/hive -f /opt/datas/hive.sql

只针对于当前shell生效的更改配置的参数
bin/hive --hiveconf hive.cli.print.current.db=false

查看当前参数的设置的值是什么
set hive.cli.print.current.db;
set hive.cli.print.current.db=false;
同样也可以更改当前参数的值，只针对于当前shell生效

新建库
hive (default)> create database if not exists db_01 location '/locate';
新建表
create table db_01.tb_01(
name string
)row format delimited fields terminated by '\t';
删除空的数据库：
drop database db02;
删除表语法
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
删除非空的数据库
drop database db01_loc CASCADE;
表的第一种创建方式：普通创建
create table if not exists stu_info(
num int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile;
load data local inpath '/opt/datas/student.txt' into table stu_info;
清空表的内容,保留了表的结构
truncate table student;

删除表：
drop table if exists student;
创建表
create table if not exists student(
num int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile;
从本地加载
load data local inpath '/opt/datas/student.txt' into table student;
从HDFS加载
load data inpath '/student.txt' into table student;
本地加载和HDFS加载的区别，一个本地的复制拷贝，一个是移动数据文件的位置到对应的表目录下
表的第二种创建方式：子查询
create table stu_as as select name from student;
特点：将子查询的结构赋予一张新的表
表的第三种创建方式：like方式
create table stu_like like student;
特点：复制表的结构
建库：
create database if not exists db_emp;
员工表：
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t';
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/emp.txt' into table emp;
加上overwrite参数执行的操作是，先删除数据，后加载数据
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/emp.txt' overwrite into table emp;
部门表：
create table dept(
deptno int,
dname string,
loc string
)
row format delimited fields terminated by '\t';
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/dept.txt' into table dept;
创建表的时候还可以直接指定数据库文件加上LOCATION关键字，这样是共享同一份元数据，共享中的任意一个表如果被drop掉了，那么元数据库也就没有了，可以通过创建外部表解决共享元数据问题。
create table emp1(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t'
LOCATION '/user/hive/warehouse/emp';
创建外部表，通过EXTERNAL关键字
create EXTERNAL table dept_ext(
deptno int,
dname string,
loc string
)
row format delimited fields terminated by '\t'
LOCATION '/user/hive/warehouse/db_emp.db/dept';
管理表删除的时候是删除元数据和表的对应文件夹
外部表删除的时候只删除元数据
首先创建管理表，然后可以创建多个外部表
作用：保证数据的安全性

分区表

分区表就是在原来表的基础上下一层添加文件夹，并标识这个分区就是分区字段的值…(这里暂时难以表述清楚..不好用文字描述)

创建分区表：分区表的分区是虚拟逻辑的
create table emp_part(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (date string)
row format delimited fields terminated by '\t';
加载数据并指定分区
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/emp.txt' into table emp_part partition (date='20170513');
创建多个分区字段的表，下面案例是指定了date和hour两个分区字段
create table emp_part3(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (date string,hour string)
row format delimited fields terminated by '\t';
加载两个分区字段的数据
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/emp.txt' into table emp_part3 partition (date='20170513',hour='11');