hive中的事物操作

最新推荐文章于 2024-11-07 14:58:10 发布

fantasticqiang

最新推荐文章于 2024-11-07 14:58:10 发布

阅读量1k

点赞数 1

CC 4.0 BY-SA版权

分类专栏： hadoop

本文链接：https://blog.youkuaiyun.com/fantasticqiang/article/details/80770322

hadoop 专栏收录该内容

16 篇文章

订阅专栏

本文介绍了如何在Hive中启用并使用事务功能。通过修改配置文件及创建分桶表，实现了对Hive表的支持更新、插入和删除操作。示例展示了创建事务表的过程及如何执行更新操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

背景

hive中默认是不支持事物的，Hive的常见也一般都是一次写入，频繁读取。加入事物之后，支持update、insert、delete操作

保存的文件格式必须为crc格式file
需要修改配置文件开启事物
表必须分桶

找到hive目录下conf文件夹中的hive-site.xml文件，需要修改的项和它对应的value值：

These configuration parameters must be set appropriately to turn on transaction support in Hive:

hive.support.concurrency – true
//这项不用修改
hive.enforce.bucketing – true (Not required as of Hive 2.0)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true (for exactly one instance of the Thrift metastore service)
hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service

创建一个事物表

create table tx
(id int,name String,age int) 
clustered by (id) into 3 buckets 
stored as orc TBLPROPERTIES ('transactional'='true');

由于hive中load data 命令只能移动文件到hdfs相应的表目录下，不会对数据进行分桶操作，我们从t2表中导入数据到tx事物表，t2表中数据如下

hive> select * from t2;
OK
1	tom1	14
2	tom2	16
3	tom3	18
4	tom4	19
5	jenny	22
1	tom1	14
2	tom2	16
3	tom3	18
4	tom4	19
5	jenny	22
Time taken: 2.535 seconds, Fetched: 10 row(s)

向tx中导入数据

hive> insert into tx select * from t2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. 
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180622103250_b82eb770-deeb-4ddd-b501-c8c00e6578ed
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-06-22 10:32:55,488 Stage-1 map = 100%,  reduce = 0%
2018-06-22 10:32:56,499 Stage-1 map = 100%,  reduce = 67%
2018-06-22 10:32:57,524 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1300705513_0001
Loading data to table default.tx
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 408 HDFS Write: 4741 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 6.86 seconds

执行更新操作，并再次查询tx表。发现已经更新了tx表中数据

update tx set name='tom3new' where id = 3;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. 
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180622103525_a8545f32-982d-4f53-8d3f-cd142806a48d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-06-22 10:35:26,809 Stage-1 map = 100%,  reduce = 0%
2018-06-22 10:35:27,825 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local433693055_0002
Loading data to table default.tx
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 30256 HDFS Write: 16836 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.933 seconds
hive> select * from tx;
OK
3	tom3new	18
3	tom3new	18
4	tom4	19
1	tom1	14
4	tom4	19
1	tom1	14
5	jenny	22
2	tom2	16
5	jenny	22
2	tom2	16
Time taken: 0.101 seconds, Fetched: 10 row(s)