实践一:Bucket
数据集:create_rating_table_b.sql(创建bucket)
create external table rating_table_b
(userId INT,
movieId STRING,
rating STRING
)
clustered by (userId) into 32 buckets;
创建userid movieid, rating三个字段
clustered by (userid) into 32buckets :按userid做32个分库,用userid除32取模,定位到reduce
创表:
hive -f create_rating_table_b.sql
查看数据表:
hive> show tables;
OK
movie_table
rating_table
rating_table_b
rating_table_p
Time taken: 0.042 seconds, Fetched: 4 row(s)
hive> desc rating_table_b;
OK
userid int
movieid string
rating string
Time taken: 0.133 seconds, Fetched: 3 row(s)
hive> desc formatted rating_table_b;
OK
# col_name data_type comment
userid int
movieid string
rating string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Sun May 26 15:29:30 CST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://master:9000/user/hive/warehouse/rating_table_b
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
transient_lastDdlTime 1558855770