hive递归尝试失败,使用python脚本解决的尝试

本文介绍了如何在Hive中处理表中涉及自身关联的场景,通过Python脚本结合HSQL,利用递归查询和层级字段解决了新旧ID之间的对应问题。详细步骤包括创建临时表、SQL查询递归和最终结果整理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在一张hive表中 存储了自身关联自身两键值 即: 当前id(newid) 和 原先id (oldid);
每一个 newid 和 oldid 两id有可能 关联本表中 其他的id

现有需求找到 最后newid 和 最早oldid 使其一一对应
网上有sqlserver 等关系型数据库的递归说明大抵就是利用with as 建立子表 自身调用自身实现递归,但是hive中不支持这种方式,
后来想使用datax 提交代码到sqlserver运行,运行即结果直接传输到建立的hive表中, 测试后发现,这种递归方式只能找到最初的 oldid;
发现表中存有 oolevel 第几个关联层级字段,这样就简单很多
最后选用python脚本+hsql的模式:

首先在hive中创建表

create  table zltest.cycle_table(
ooOutputorderid string comment 'newid',
listid  string comment '中间关联的id',
ooOperateorderid  string comment 'oldid'
)row format delimited FIELDS TERMINATED BY '\u0001' 
NULL DEFINED AS '' STORED AS Parquet;

这里后面面要到impala执行 最好使用 Parquet 格式


```python


# -*- coding:utf-8 -*-
from impala.dbapi import connect
import impala
from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo


sql1="""
insert overwrite table zltest.cycle_table
select ooOutputorderid,
concat_ws(',',ooOperateorderid )listid,ooOperateorderid 
from ods.oc_ocorderoperation
where oolevel =%s
group by ooOutputorderid,concat_ws(',',ooOperateorderid ),ooOperateorderid ;
"""


sql2="""
insert into table zltest.cycle_table
select d.ooOutputorderid, concat_ws(',',d.listid,b.ooOperateorderid)listid,b.ooOperateorderid 
from zltest.cycle_table as d
inner join (select * from ods.oc_ocorderoperation where oolevel=%s) as b
on d.ooOperateorderid = b.ooOutputorderid
group by d.ooOutputorderid, concat_ws(',',d.listid,b.ooOperateorderid),b.ooOperateorderid


"""



sql3="""
insert into table zltest.cycle_table
select a.ooOutputorderid,
concat_ws(',',ooOperateorderid )listid,ooOperateorderid
from ods.oc_ocorderoperation a
inner join (
        select a.ooOutputorderid
        from ods.oc_ocorderoperation as a 
        left join ods.oc_ocorderoperation as b on a.ooOutputorderid = b.ooOperateorderid 
        where b.ooOperateorderid is null and a.oolevel=%s
) b on a.ooOutputorderid = b.ooOutputorderid
where a.oolevel =%s and b.ooOutputorderid is not null
group by a.ooOutputorderid,concat_ws(',',ooOperateorderid ),ooOperateorderid ;
"""




sql4="""
insert overwrite table zltest.cycle_table
select ooOutputorderid,listid, ooOperateorderid 
from(
        select *,
        row_number()over(partition by ooOutputorderid  order by length(listid) desc) row_num
        from zltest.cycle_table
) as d where row_num =1;
"""




##先查出最大的层级 select max(oolevel)num from ods.oc_ocorderoperation;
def connImpala():
    conn = connect(host='XX.XX.XX.XX', port=21050)
    cur = conn.cursor()
    cur.execute("select max(oolevel)num from ods.oc_ocorderoperation;")
    data_list=cur.fetchall()
    for i in  data_list:
        data=int(i)
    t=data
    while(t>=0):
        if(t==data):
            print('首次插入数据')
            cur.execute(sql1%(t))
        else:
            print('%s次插入有上层跳转数据'%(t))
            cur.execute(sql2%(t))
            print('%s次插入无上层跳转数据'%(t))
            cur.execute(sql3%(t,t))
        t-=1
    cur.execute(sql4)
    conn.close()

if __name__ == '__main__':
    connImpala()

运行完成

当然,部分有时侯不有层级数
也可以这样处理
先执行一次:

insert overwrite table zltest.cycle_table
select t1.ooOutputorderid,concat_ws(',',t1.ooOperateorderid )listid,t2.ooOperateorderid 
from(
        select a.ooOutputorderid,a.ooOperateorderid 
        from ods.oc_ocorderoperation as a 
        left join ods.oc_ocorderoperation as b on a.ooOutputorderid = b.ooOperateorderid 
        where b.ooOperateorderid is null 
) as t1 
left join ods.oc_ocorderoperation t2 on t1.ooOperateorderid = t2.ooOutputorderid;

python循环
查看还有没有 有链式未关联完的

select count(t1.ooOperateorderid)num
from zltest.cycle_table t1 
left join ods.oc_ocorderoperation t2 on t1.ooOperateorderid = t2.ooOutputorderid
where t2.ooOutputorderid is null;

python 脚本中判断在num值为0以前都要循环

insert into table zltest.cycle_table
select a.ooOutputorderid,concat_ws(',',a.listid,a.ooOperateorderid )listid,b.ooOperateorderid 
from zltest.cycle_table a
inner join (
    select t3.ooOperateorderid ,t3.ooOutputorderid
    from ods.oc_ocorderoperation t3
    left join (
        select t1.ooOperateorderid
        from zltest.cycle_table t1 
        left join ods.oc_ocorderoperation t2 on t1.ooOperateorderid = t2.ooOutputorderid
        where t2.ooOutputorderid is null            ---找出已经完全找到所有id的
    ) as t4 on t3.ooOperateorderid = t4.ooOperateorderid
    where t4.ooOperateorderid is null               ---去除已经完全找到所有id的
) as d on a.ooOperateorderid = d.ooOutputorderid;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值