背景
hive中默认是不支持事物的,Hive的常见也一般都是一次写入,频繁读取。加入事物之后,支持update、insert、delete操作
- 保存的文件格式必须为crc格式file
- 需要修改配置文件开启事物
- 表必须分桶
找到hive目录下conf文件夹中的hive-site.xml文件,需要修改的项和它对应的value值:
These configuration parameters must be set appropriately to turn on transaction support in Hive:
hive.support.concurrency – true
//这项不用修改
hive.enforce.bucketing – true (Not required as of Hive 2.0)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true (for exactly one instance of the Thrift metastore service)
hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service
创建一个事物表
create table tx
(id int,name String,age int)
clustered by (id) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true');
由于hive中load data 命令只能移动文件到hdfs相应的表目录下,不会对数据进行分桶操作,我们从t2表中导入数据到tx事物表,t2表中数据如下
hive> select * from t2;
OK
1 tom1 14
2 tom2 16
3 tom3 18
4 tom4 19
5 jenny 22
1 tom1 14
2 tom2 16
3 tom3 18
4 tom4 19
5 jenny 22
Time taken: 2.535 seconds, Fetched: 10 row(s)
向tx中导入数据
hive> insert into tx select * from t2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions.
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180622103250_b82eb770-deeb-4ddd-b501-c8c00e6578ed
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-06-22 10:32:55,488 Stage-1 map = 100%, reduce = 0%
2018-06-22 10:32:56,499 Stage-1 map = 100%, reduce = 67%
2018-06-22 10:32:57,524 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1300705513_0001
Loading data to table default.tx
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 408 HDFS Write: 4741 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 6.86 seconds
执行更新操作,并再次查询tx表。发现已经更新了tx表中数据
update tx set name='tom3new' where id = 3;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions.
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180622103525_a8545f32-982d-4f53-8d3f-cd142806a48d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-06-22 10:35:26,809 Stage-1 map = 100%, reduce = 0%
2018-06-22 10:35:27,825 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local433693055_0002
Loading data to table default.tx
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 30256 HDFS Write: 16836 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.933 seconds
hive> select * from tx;
OK
3 tom3new 18
3 tom3new 18
4 tom4 19
1 tom1 14
4 tom4 19
1 tom1 14
5 jenny 22
2 tom2 16
5 jenny 22
2 tom2 16
Time taken: 0.101 seconds, Fetched: 10 row(s)