数据库操作
1:创建数据库
语法:
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
e g:
hive (default)> create database IF NOT EXISTS db_rachel location '/user/rachel/rachelwarehouse/db_rachel' ;
OK
2:删除数据库
PS:由于第一步创建的是内部表,所以此处删除表将同步删掉对应的HDFS目录上的数据
语法:
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
CASCADE是会级联删除数据库中的所有表和该数据库。
e g:
hive (default)> drop database db_rachel IF EXISTS;
Time taken: 1.395 seconds
3:使用数据库
USE database_name;
表操作
1:创建表
语法:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement];
注:在创建表的时候,语法中的顺序性是强顺序的,不能在创建的时候将顺序打乱
(以下的创建方式均创建出来的为内部表)
方式1:create
语法:
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type)
COMMENT table_comment#表的注释
ROW FORMAT DELIMITED FIELDS TERMINATED BY char
STORED AS TEXTFILE
e g :
CREATE TABLE student_ext(
userid STRING,
username STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
方式2:create table like
hive (rachel_db_hive)> CREATE TABLE IF NOT EXISTS student_from_like
> LIKE rachel_db_hive.student;
OK
Time taken: 2.838 seconds
hive (rachel_db_hive)> show tables;
OK
tab_name
student
student_from_like
u_data
方式3:create table as select(CTAS)
需要跑MR程序的
CREATE TABLE student_from_AsSelect
AS
SELECT *
FROM rachel_db_hive.student
2:Hive表的操作以及表创建的场景
hive数据分析完成之后的结果保存到两个地方
存储到HDFS上
存储在hive的表中(临时表)
因为在前端和RDBMS交互比较多,当我们将数据放在hive的临时表中
可以通过临时表和RDBMS之间的交互。
3:删除表
删除表结构和表中的数据
DROP TABLE [IF EXISTS] table_name [PURGE];
只删除表中的数据
TRUNCATE TABLE table_name [PARTITION partition_spec];
4:导入数据
将外部数据内容追加到表中
load data local inpath '/opt/datas/student.txt' into table student;
用导入的数据,覆盖之前表内的数据
load data local inpath '/opt/datas/student.txt' overwrite into table student;
5:insert into table stu select * from student;
也会追加数据到表中,但是会跑MR程序
6:外部表的创建
>>将文件上传到HDFS
bin/hdfs dfs -put /opt/datas/student.txt /user/rachel/datas/
>>导入数据
load data inpath '/user/rachel/datas/student.txt' overwrite into table student
这个过程其实是将HDFS中'/user/rachel/datas/student.txt'下的文件移动到student表所在的目录下。
个人理解:因为在创建数据库的时候,已经指定了该库对应的HDFS目录,
所以该库下的表的目录是固定的,如果我们采用上述方式创建,其实等同于移动操作