Hive DDL&DML

最新推荐文章于 2024-07-08 19:10:26 发布

weixin_33929309

最新推荐文章于 2024-07-08 19:10:26 发布

阅读量152

点赞数

CC 4.0 BY-SA版权

文章标签：数据库 python 大数据

原文链接：https://my.oschina.net/gordonnemo/blog/1830587

2019独角兽企业重金招聘Python工程师标准>>>

一、DDL(Data Definition Language)

1.Hive数据抽象的结构

描述Hive表数据的结构：create alter drop.....

打开Hive官网：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Hive构建在Hadoop之上

Hive的数据存放在HDFS之上

Hive的元数据可以存放在RDBMS之上

2.Database

Hive中包含了多个数据库，默认的数据库为default,特殊没有名字，无论你是否创建数据库，默认都会存在，对应于HDFS目录是/user/hive/warehouse，可以通过hive.metastore.warehouse.dir参数进行配置（hive-site.xml中配置）

hive> show databases;

default

Time taken: 0.461 seconds, Fetched: 1 row(s)

hive>

####Database语法：

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name

[COMMENT database_comment]

[LOCATION hdfs_path]

[WITH DBPROPERTIES (property_name=property_value, ...)];

IF NOT EXISTS：加上这句话代表判断数据库是否存在，不存在就会创建，存在就不会创建（生产中一定要添加IF NOT EXISTS）。如：create database if not exists dbname;

COMMENT：数据库的描述

LOCATION：创建数据库的地址，不加默认在/user/hive/warehouse/路径下

WITH DBPROPERTIES：数据库的属性

####例子：

hive>

> create database hive;

Time taken: 2.073 seconds

hive> show databases;

default

hive

Time taken: 0.124 seconds, Fetched: 2 row(s)

hive>

####查看存储在hdfs的什么路径上

[hadoop@hadoop05 ~]$ hadoop fs -ls /user/hive/warehouse

18/06/14 20:43:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

drwxr-xr-x - hadoop supergroup 0 2018-06-14 20:38 /user/hive/warehouse/hive.db

drwxr-xr-x - hadoop supergroup 0 2018-06-13 23:03 /user/hive/warehouse/xx

/user/hive/warehouse/<databasename>.db

注意：生产上：IF NOT EXISTS 选项都建议加上。

####创建一个数据库指定路径，往库中插入一张表

hive> create database hive2 LOCATION '/gordon_01';

Time taken: 0.435 seconds

[hadoop@hadoop05 ~]$ hadoop fs -ls /

18/06/14 20:45:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2018-06-14 20:45 /gordon_01

drwx-wx-wx - hadoop supergroup 0 2018-06-13 22:44 /tmp

drwxr-xr-x - hadoop supergroup 0 2018-06-13 23:03 /user

> use hive2;

Time taken: 3.164 seconds

hive> create table b(id int);

Time taken: 2.25 seconds

hive>

[hadoop@hadoop05 ~]$ hadoop fs -ls /gordon_01

18/06/14 20:53:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2018-06-14 20:53 /gordon_01/b

####测试创建一个数据库并给数据库添加加一个备注，增加一些描述的信息，信息是key value

create database hive2_gordon comment 'this is gordon 03 test database ' with dbproperties('create'='gordon','date'='2018-08-08');

####查看刚刚每个创建的数据库的信息

hive> show databases;

default

hive

hive2

hive2_gordon

Time taken: 0.48 seconds, Fetched: 4 row(s)

hive> desc database hive2_gordon;

hive2_gordon this is gordon 03 test database hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db hadoop USER

Time taken: 0.176 seconds, Fetched: 1 row(s)

hive> desc database default;

default Default Hive database hdfs://192.168.199.105:9000/user/hive/warehouse public ROLE

Time taken: 0.101 seconds, Fetched: 1 row(s)

####刚刚创建的hive2_ruozedata的属性没有显示出来，一条命令可以查看

hive> desc database extended hive2_gordon;

hive2_gordon this is gordon 03 test database hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db hadoop USER {date=2018-08-08, create=gordon}

Time taken: 0.094 seconds, Fetched: 1 row(s)

####通过MySQL查看信息

mysql> use gordon ;

mysql> show tables;

mysql> select * from dbs\G;

*************************** 1. row ***************************

DB_ID: 1

DESC: Default Hive database

DB_LOCATION_URI: hdfs://192.168.199.105:9000/user/hive/warehouse

NAME: default

OWNER_NAME: public

OWNER_TYPE: ROLE

*************************** 2. row ***************************

DB_ID: 6

DESC: NULL

DB_LOCATION_URI: hdfs://192.168.199.105:9000/user/hive/warehouse/hive.db

NAME: hive

OWNER_NAME: hadoop

OWNER_TYPE: USER

*************************** 3. row ***************************

DB_ID: 7

DESC: NULL

DB_LOCATION_URI: hdfs://192.168.199.105:9000/gordon_01

NAME: hive2

OWNER_NAME: hadoop

OWNER_TYPE: USER

*************************** 4. row ***************************

DB_ID: 11

DESC: this is gordon 03 test database

DB_LOCATION_URI: hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db

NAME: hive2_gordon

OWNER_NAME: hadoop

OWNER_TYPE: USER

4 rows in set (0.03 sec)

ERROR:

No query specified

####一定要看官网：

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

RESTRICT：默认是restrict，如果该数据库还有表存在则报错；

CASCADE：级联删除数据库(当数据库还有表时，级联删除表后在删除数据库)

把刚刚创建的hive2表直接删除，会有报错，因为Hive2里面有一张B表，所以无法删除

####先删除b表再删除数据库Hive2，查看是否有Hive2数据库存在

hive> drop database hive2;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database hive2 is not empty. One or more tables exist.)

hive> use hive2;

Time taken: 0.171 seconds

hive> show tables;

Time taken: 1.828 seconds, Fetched: 1 row(s)

hive> drop table b;

Time taken: 0.862 seconds

hive> show tables;

Time taken: 0.085 seconds

####删除级联数据库

drop database hive CASCADE;

常见操作：create alter drop show desc use

####Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

Use Database

USE database_name;

USE DEFAULT;

Show Databases

SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'

“ | ”：可以选择其中一种

“[ ]”：可选项

LIKE ‘identifier_with_wildcards’：模糊查询数据库

####Describe Database

DESCRIBE DATABASE [EXTENDED] db_name;

DESCRIBE DATABASE db_name：查看数据库的描述信息和文件目录位置路径信息；

EXTENDED：加上数据库键值对的属性信息。

二、Hive的基本数据类型&分隔符

官网：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

看着官网多，要疯！！！！！！！！！！！！！！

1.常用基本数据类型

生产中常用数据类型： int bigint float double decimal （数值类型选择一个其他都用string） string (函数也可以用)

生产中不建议使用数据类型：date/timestamp boolean（不建议使用）-> 都用string类型存

时间可以用UDF函数比。

####分隔符

行: \n

列: \001 ^A ==>

1 zhangsan 30

1$$$zhangsan$$$30

map/struct/array

一般情况下，在创建表的时候就直接指定了分隔符:\t

2.Table

####例子：

hive> use hive2_gordon;

Time taken: 1.503 seconds

hive> CREATE TABLE gordon_person

> (id int comment 'this is id', name string comment 'this id name' )

> comment 'this is gordon_person'

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY '\t' ;

Time taken: 4.471 seconds

hive>

> desc gordon_person;

id int this is id

name string this id name

Time taken: 1.458 seconds, Fetched: 2 row(s)

hive> desc formatted gordon_person;

# col_name data_type comment

id int this is id

name string this id name

# Detailed Table Information

Database: hive2_gordon

Owner: hadoop

CreateTime: Thu Jun 14 21:27:27 CST 2018

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db/gordon_person

Table Type: MANAGED_TABLE

Table Parameters:

comment this is gordon_person

transient_lastDdlTime 1528982847

# Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat: org.apache.hadoop.mapred.TextInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed: No

Num Buckets: -1

Bucket Columns: []

Sort Columns: []

Storage Desc Params:

field.delim \t

serialization.format \t

Time taken: 0.635 seconds, Fetched: 29 row(s)

对于存放数据库的路径我们可以修改（一般不做修改）hive-site.xml

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

<description>location of default database for the warehouse</description>

</property>

####例子：

create table gordon_emp

(empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t' ;

####加载数据到表

LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE gordon_emp;

LOCAL: 从本地文件系统加载数据到hive表

非LOCAL：从HDFS文件系统加载数据到hive表

OVERWRITE: 加载数据到表的时候数据的处理方式，覆盖

非OVERWRITE：追加

hive> select * from gordon_emp;

7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20

7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30

7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30

7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20

7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30

7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30

7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10

7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20

7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10

7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30

7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20

7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30

7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20

7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10

8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL

Time taken: 3.296 seconds, Fetched: 15 row(s)

####从另外表复制数据【CTAS】

hive> CREATE table gordon_emp2 as select * from gordon_emp; --跑mapreduce

Query ID = hadoop_20180614211111_2ccb7b43-6e66-4de4-8bb7-52aa4a7669f2

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Job running in-process (local Hadoop)

2018-06-14 21:41:22,958 Stage-1 map = 0%, reduce = 0%

2018-06-14 21:41:24,044 Stage-1 map = 100%, reduce = 0%

Ended Job = job_local1375787547_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db/.hive-staging_hive_2018-06-14_21-41-14_000_4839247582555161403-1/-ext-10001

Moving data to: hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db/gordon_emp2

Table hive2_gordon.gordon_emp2 stats: [numFiles=1, numRows=15, totalSize=708, rawDataSize=693]

MapReduce Jobs Launched:

Stage-Stage-1: HDFS Read: 2100 HDFS Write: 1489 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

Time taken: 11.628 seconds

hive>

####从另外表复制结构

CREATE table gordon_emp3 like gordon_emp;

####从修改表名字；

ALTER TABLE gordon_emp3 RENAME TO gordon_emp3_new;

####常用操作：create alter drop show desc

Show create table aa;

####Hive中的表又分为内部表和外部表 ,Hive 中的每张表对应于HDFS上的一个文件夹，HDFS目录为：/user/hadoop/hive/warehouse/[databasename.db]/table

3.创建表默认使用的是MANAGED_TABLE：内部表

gordon_emp2

drop：hdfs+meta

####删除表gordon_emp2

hive> drop table gordon_emp2;

hive> show tables;

gordon_emp

gordon_emp3_new

gordon_person

####hdfs上数据信息没有

[hadoop@hadoop05 data]$ hadoop fs -text hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db/gordon_emp2

18/06/14 21:51:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

text: `hdfs://192.168.199.105:9000/user/hive/warehouse/hive2_gordon.db/gordon_emp2': No such file or directory

####元数据信息也查不到这张表

use gordon;

select * from tbls \G；

查询不到gordon_emp2

内部表总结：只要drop表，就会把hdfs + 元数据的数据全部都删除

4.EXTERNAL：外部表

create EXTERNAL table gordon_emp_external

(empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LOCATION "/gordon/external/emp" ;

drop: drop meta

hive> select * from gordon_emp_external;

Time taken: 0.963 seconds

hive> desc formatted fordon_emp_external;

FAILED: SemanticException [Error 10001]: Table not found fordon_emp_external

hive> desc formatted gordon_emp_external;

# col_name data_type comment

empno int

ename string

job string

mgr int

hiredate string

salary double

comm double

deptno int

# Detailed Table Information

Database: hive2_gordon

Owner: hadoop

CreateTime: Thu Jun 14 22:30:06 CST 2018

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: hdfs://192.168.199.105:9000/gordon/external/emp

Table Type: EXTERNAL_TABLE

Table Parameters:

EXTERNAL TRUE

transient_lastDdlTime 1528986606

# Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat: org.apache.hadoop.mapred.TextInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed: No

Num Buckets: -1

Bucket Columns: []

Sort Columns: []

Storage Desc Params:

field.delim \t

serialization.format \t

Time taken: 0.429 seconds, Fetched: 35 row(s)

[hadoop@hadoop05 data]$ hadoop fs -ls hdfs://192.168.199.105:9000/gordon/external/emp

18/06/14 22:32:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[hadoop@hadoop05 data]$ hadoop fs -put emp.txt /gordon/external/emp

18/06/14 22:34:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[hadoop@hadoop05 data]$ hadoop fs -ls hdfs://192.168.199.105:9000/gordon/external/emp

18/06/14 22:34:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

-rw-r--r-- 1 hadoop supergroup 700 2018-06-14 22:34 hdfs://192.168.199.105:9000/gordon/external/emp/emp.txt

[hadoop@hadoop05 data]$

查看到有数据了，以上的这些操作是在工作中非常常见的，只要把数据移动到目录上去立刻就查的出来。

hive> select * from gordon_emp_external;

7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20

7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30

7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30

7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20

7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30

7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30

7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10

7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20

7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10

7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30

7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20

7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30

7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20

7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10

8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL

Time taken: 1.094 seconds, Fetched: 15 row(s)

hive>

####查看 MySQL表的类型信息

mysql> select * from tbls\G;

*************************** 6. row ***************************

TBL_ID: 21

CREATE_TIME: 1528986606

DB_ID: 11

LAST_ACCESS_TIME: 0

OWNER: hadoop

RETENTION: 0

SD_ID: 21

TBL_NAME: gordon_emp_external

TBL_TYPE: EXTERNAL_TABLE

VIEW_EXPANDED_TEXT: NULL

VIEW_ORIGINAL_TEXT: NULL

####删除外部表 gordon_emp_external

hive> drop table gordon_emp_external;

查看hdfs上是有数据的

查看MySQL元数据没数据

外部表总结：只要drop表，就会把元数据的数据删除，但是HDFS上的数据是保留的。

两个表的总结：

Hive上有两种类型的表，一种是Managed Table(默认的)，另一种是External Table（加上EXTERNAL关键字）。内部表数据由Hive自身管理，外部表数据由HDFS管理；

它俩的主要区别在于：当我们drop表时，Managed Table会同时删去data（存储在HDFS上）和meta data（存储在MySQL），而External Table只会删meta data。内部表数据存储的位置是hive.metastore.warehouse.dir（默认：/user/hive/warehouse），外部表数据的存储位置由自己制定；对内部表的修改会将修改直接同步给元数据，而对外部表的表结构和分区进行修改，则需要修复（MSCK REPAIR TABLE table_name;）

生产中99%都用到的是外部表，因为可以有一份备份，如果删除表用LOAD DATA LOCAL INPATH 在传一份就可以了。

生产中1%用到的内部表的情况，外面的数据传过来，你这里可建立外部表，如果表删没了再导一份。

hdfs://hadoop000:8020/ddddd

s3a://xxx/yyyy

s3n

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)

[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]

ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)

[STORED AS DIRECTORIES]

[

[ROW FORMAT row_format]

[STORED AS file_format]

| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)

]

[LOCATION hdfs_path]

[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)

[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name

LIKE existing_table_or_view_name

[LOCATION hdfs_path];

data_type

: primitive_type

| array_type

| map_type

| struct_type

| union_type -- (Note: Available in Hive 0.7.0 and later)

primitive_type

: TINYINT

| SMALLINT

| INT

| BIGINT

| BOOLEAN

| FLOAT

| DOUBLE

| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)

| STRING

| BINARY -- (Note: Available in Hive 0.8.0 and later)

| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)

| DECIMAL -- (Note: Available in Hive 0.11.0 and later)

| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)

| DATE -- (Note: Available in Hive 0.12.0 and later)

| VARCHAR -- (Note: Available in Hive 0.12.0 and later)

| CHAR -- (Note: Available in Hive 0.13.0 and later)

array_type

: ARRAY < data_type >

map_type

: MAP < primitive_type, data_type >

struct_type

: STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type

: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)

row_format

: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]

[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)

| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:

: SEQUENCEFILE

| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)

| RCFILE -- (Note: Available in Hive 0.6.0 and later)

| ORC -- (Note: Available in Hive 0.11.0 and later)

| PARQUET -- (Note: Available in Hive 0.13.0 and later)

| AVRO -- (Note: Available in Hive 0.14.0 and later)

| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

constraint_specification:

: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]

[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

[hadoop@hadoop05 data]$ jps -m

2210 SecondaryNameNode

1987 NameNode

2507 NodeManager

6075 Jps -m

2076 DataNode

2412 ResourceManager

4348 RunJar /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/hive-cli-1.1.0-cdh5.7.0.jar org.apache.hadoop.hive.cli.CliDriver

5.TEMPORARY（临时表）

Hive从0.14.0开始提供创建临时表的功能，表只对当前session有效，session退出后，表自动删除。

语法：CREATE TEMPORARY TABLE …

注意：

1. 如果创建的临时表表名已存在，那么当前session引用到该表名时实际用的是临时表，只有drop或rename临时表名才能使用原始表

2. 临时表限制：不支持分区字段和创建索引

insert时目标表必须存在

hive> INSERT OVERWRITE TABLE gordon_emp4

select * FROM gordon_emp;

FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 'gordon_emp4'

hive> create table gordon_emp4 like gordon_emp;

####插入不同的列

hive> INSERT OVERWRITE TABLE gordon_emp4

select empno, deptno FROM gordon_emp;

FAILED: SemanticException [Error 10044]: Line 1:24 Cannot insert into target table because column number/types are different 'gordon_emp4': Table insclause-0 has 8 columns, but query has 2 columns.

插入的时候如果有一列，和源数据信息的列，位置搞错了，不按正常写，就会出现报错，数据错乱

所有在插入的时候需要慎重，列的数量，列的类型，以及列的顺序都写清楚。

hive>INSERT INTO TABLE gordon_emp4

SELECT empno,job, ename,mgr, hiredate, salary, comm, deptno from gordon_emp;

插入的时候如果有一列，和源数据信息的列，位置搞错了，不按正常写，就会出现报错，数据错乱

所有在插入的时候需要慎重，列的数量，列的类型，以及列的顺序都写清楚。

万一出错了，对于大数据而言，没有回滚的概念，只有重跑job

重跑：幂等 ***** 跑100结果也一样。所有产品需要支持幂等

log

2055 ==> 20

2054 ==> 20

....