HiveQL 数据定义

hive> alter database hive set dbproperties ('name' = 'zjk');
hive> describe database extended hive;
hive hdfs://localhost:9000/usr/local/hive/warehouse/hive.db hive USER {name=zjk}

切换数据库目录

先设置属性可以看到当前目录，在切换

hive> set hive.cli.print.current.db=true;
hive (default)> use zjk;
hive (zjk)>

删除数据库

hive (zjk)> drop database zjk;
避免数据库不存在而抛出警告，可这样删：

hive (zjk)> drop database if exists zjk;

默认情况下，hive 是不允许删除一个包含有表的数据库的，要么先删除表，再删数据库；要么在最后加关键字 cascade(restrict)，使 hive 先自行删除数据库中的表：

hive (zjk)> drop database if exists zjk cascade;

表

创建表

hive> create table if not exists hive.employees(
> name string comment 'employee name',
> salary float comment 'employee salary',
> subordinates array<string> comment 'names of subordinates',
> deductions map<string,float>
> comment 'key name,valuespercentages',
> address struct<street:string,city:string,state:string,zip:int>
> comment 'home')
> comment 'employee table'
> tblproperties ('creata'='zjk');
array 数组类型，map 键值对，struct 结构，tblproperties 增加键值对描述信息

表查看

在当前数据库

hive> show tables;

即使不在那个数据库下，还是可以列举指定数据库下的表

hive> use default;
hive> show tables in hive;
employees

还可以用正则匹配相应的表

hive> use hive;
Time taken: 0.057 seconds
hive> show tables 'emp.*';
employees

查看表的详细信息 extended formatted 等价

hive> describe extended hive.employees;
name     string     employee name
salary     float    employee salary
subordinates     array<string>    names of subordinates
deductions     map<string,float>    key name,valuespercentages
address    struct<street:string,city:string,state:string,zip:int>   home

。。。。

查看某一列的信息，在后面加上列名即可

hive> describe employees.salary;
salary float from deserializer

内部表，外部表

内部表（管理表）：

表存储在配置项 hive.metastore.warehouse.dir 所定义的目录的子目录下，当删除这个表 hive 也会删除它

外部表：

假定在分布式文件系统 /data/stocks 目录下有股票数据，创建一个外部表，读取所有位于 /data/stocks 目录下的以逗号分隔的数据，extenal 告诉 hive 这表是外部表，localtion 告诉数据位于哪个路径，删除该表并不会删除数据

create extenal table if not exists stocks(

exchange string,

symbol string,

price int)

row format delimited fields terminated by ','

location '/data/stocks';

查看表是内部表还是外部表：

describe extened tablename

在末尾的详细表信息中，可以看到如下信息区分：

......tabletype:MANAGED_TABLE（管理表）

......tabletype:EXTERNAL_TABLE（外部表）

复制管理表，对一张存在的表进行表结构复制，不会复制数据

create external table if not exists mydb.tablename2

like mydb.tablename2

localtion '/path/to/data';

分区表

假设有一个表，全球你各地所有员工信息，已经按照地区，洲分区，要做一个查询，hive 不得不扫描读取每个文件目录，如果表中的数据以及分区个数都非常大的话，执行这样一个包含有所有分区的查询会触发一个非常巨大的 MapReduce 任务，一种建议是将 hive 设为 “strict（严格）” 模式，没有 where 子句过滤的话，禁止提交这个任务

hive> set hive.mapred.mode=strict

也可以设置为非严格 “nonstrict”

hive> set hive.mapred.mode=snontrict

查看表中存在的分区

hive> show partitions employees;

如果存在很多分区，只想看是否存储了某个特定分区键的分区的话，指定子句

hive> show partitions employees partition(country='us');

这样写可以看到分区键，分区键也就是 struct 类型的数据结构

hive> describe extended employees;

增加分区

创建一个外部分区表

create external table if not exists log(

hms int,

server string)

partitioned by (year int,month int,day int)

row format delimited fields terminated by '\t';

增加一个 2013 年 1 月 1日分区

alter table los add partition(year=2013,month=1,day=1)

location 'hdfs://master_server/log/2013/1/1';

删除表

hive> drop table if exists employees;

表重命名

hive> alter table log rename to los;

将表 log 重命名为 los

修改，删除表分区

移动某个分区的路径，不会删除旧的数据

alter table log drop if exists partion(year=2011,month=1,day=1)

set localtion 'xxxx';

删除分区

alter table logs drop if exists partiton(year=2011,month=1,day=1);

修改列信息

alter table log

change columns hms hours_mins int

comment 'xxxx'

after server;

修改表 log 的 hms 列，列名改为 hours_mins，数据类型为 int，修改描述信息，放在 server 字段之后(修改列信息)

如果需要移动到第一个，after server 可改为 first

增加列

新增加列增加到已有的列之后

alter table log add columns(

app string comment '');

删除或替换列

移除之前所有的字段并重新指定新的字段

alter table log replace columns(

hour int);

只改变了元数据信息，不改变数据

修改表属性

alter table log set tabproperties ('creata'='zjk');

只能增加修改，不能删除属性

多字节分隔符

有一份数据的格式是这样：

1::F::1::10::48067

hive不支持解析多字节的分隔符，需要使用能解析多字节分隔符的 Serde 即可使用 RegexSerde，创建一个表

create table t_user(
userid bigint,
sex string,
age int,
occupation string,
zipcode string) 
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' 
with serdeproperties('input.regex'='(.*)::(.*)::(.*)::(.*)::(.*)','output.format.string'='%1$s %2$s %3$s %4$s %5$s')
stored as textfile;

stored as textfile; textfile 意味着所有字段都使用字母，数字，字符编码，意味着每一行被认为是一个单独的记录。

row format serde 指定使用的 serde