trinosql_prestosql问题

这篇博客记录了使用Trino进行数据库操作时遇到的各种问题及解决办法,包括连接配置、SQL语法错误、数据类型不匹配、Python接口使用、批量插入、日期转换等。同时提到了Trino与Hive、MySQL的集成,以及与Spark、sqlalchemy的版本兼容性问题。还分享了如何通过pandas和sqlalchemy高效读写数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

20221115

trino可以配置连接所有的数据库源,比如hive和mysql,hive,mysql作为一级类目,
hive,mysql本身的库名作为第二级类目
在这里插入图片描述

20220819

{'message': 'Insert query has mismatched column types: Table: [varchar, varchar, varchar, varchar, decimal(10,2), decimal(10,2), timestamp(6) with time zone, varchar, varchar], Query: [varchar(11), varchar(5), varchar(14), varchar(6), decimal(20,16), decimal(20,15), varchar(15), timestamp(3), varchar(10)]',

table:是数据库中的表,query是要插入的表
INSERT into iceberg.onekey.odc_hz_onekey_vendor_ct_test SELECT *FROM iceberg.onekey.odc_hz_onekey_vendor_ct

trino不能在iceberg上面直接建表,只能用spark建表再insert into

20220725

ModuleNotFoundError: No module named 'trino.sqlalchemy'
trino和sqlalchemy的版本比兼容
trino用0.313.0
sqlalchemy用 1.4.39
python 3.7.6 

20220718

SQL 错误 [65544]: Query failed (#20220718_081753_00078_3cm32): Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (192.168.1.54:8881)
资源不够

http://192.168.1.52:8881/ui/
trino控制台
账号密码随便输入

20220705
在这里插入图片描述
重启trino,打开表半天无反应

20220623

trino很容易挂,不稳定

20220308

select 
	format_datetime(create_time ,'yyyy-MM-dd')
	from
	iceberg.ice_ods.ods_o_hz_b2b_tb_order_item

timestamp转日期

https://blog.youkuaiyun.com/u010711495/article/details/112195655
timestamp日期转换重点

20220308

 date(order_day) > date_add('day',-30,current_date) 
 字符转日期,日期加减

20220215

SQL 错误: Error executing query
服务器连不上
堡垒机出问题了

20220207

ModuleNotFoundError: No module named ‘trino.sqlalchemy’
trino和sqlalchemy的版本兼容问题
trino需要0.306.0
sqlalchemy 1.4.23

sqlalchemy.exc.NoSuchModuleError: Can‘t load plugin: sqlalchemy.dialects:presto
sqlalchemy.exc.NoSuchModuleError: Can‘t load plugin: sqlalchemy.dialects:trino
需要安装PyHive 0.6.4

20220126

用python直连Trino不能delete表里面的数据只能删表
presto也可以直接使用mysql语法?
presto也是数据仓库

20220119

TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 20:49: mismatched input '#'. Expecting: '(', ')',

sql代码里面还有井号,去掉井号换成 --

在这里插入图片描述
元组直接转换成字符,如果只有一个元素的话,逗号需要去掉

20220118

HTTPConnectionPool(host='192.168.1.55', port=8881): Max retries exceeded with url: /v1/statement (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001E783328278>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))
服务挂了,重启
降低连接数,连接实例

20220114

TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 9:8: mismatched input '100'. Expecting: '%', '*', '+', '-', '.', '/', 'AND', 'AT', 'EXCEPT', 'FETCH', 'GROUP', 'HAVING', 'INTERSECT', 'LIMIT', 'OFFSET', 'OR', 'ORDER', 'UNION', 'WINDOW', '[', '||', <EOF>, <predicate>", query_id=20220114_081216_00157_decgk)

应该是sql语法错了   limit之前不能加and 

20220107

date(‘2021-10-01’)
cast(‘2021-10-01’ as date)
字符转日期

https://www.cnblogs.com/lixiaozhi/p/11752483.html
窗口函数 lead 和 lag的理解
partition by 分区之后 lead 取当前行的前几行,lag取当前行的后几行

20220106

date_diff('day',LEAD(uo.create_time,1,uo.create_time) over (partition by ui.user_id order by uo.create_time desc) ,uo.create_time)

20211223

第三方模块trino读写数据(不推荐)
参考:https://github.com/trinodb/trino-python-client

#########
# 20220301更新 批量插入,全部插入要报错
def insert_trino(df):

    http.client._MAXLINE = 655360
    trino_engine = create_engine('trino://root@192.168.1.55:8881/iceberg/ice_ods')

    times = int(np.ceil(df.shape[0] / 10000))
    for i in tqdm(range(times)):
        df.iloc[i * 10000: (i + 1) * 10000, :].to_sql(
            con=trino_engine, schema="ice_ods",
            name="ods_o_hz_onekey_jkzj_goods_washed_da", method="multi", if_exists='append',
            index=False
        )
    logger.debug("存入数据库成功")
#######


import trino
from trino import transaction
with trino.dbapi.connect(
    host='192.168.1.55',
    port=8881,
    user='root',
    catalog='iceberg',
    schema='ice_dwt',
	# isolation_level=transaction.IsolationLevel.REPEATABLE_READ,
) as conn:
  cur = conn.cursor()
    cur.execute('SELECT * FROM iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide')
    rows = cur.fetchall()(不推荐,批量写入速度太慢)
import trino
from trino import transaction
with trino.dbapi.connect(
    host='192.168.1.55',
    port=8881,
    user='root',
    catalog='iceberg',
    schema='ice_dwt',
	# isolation_level=transaction.IsolationLevel.REPEATABLE_READ,
) as conn:
  cur = conn.cursor()
  # 写入
  cur.execute("insert into iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide (original_name,standard_name,code,create_date)   values ('邛崃本地','无效',-2,date('2021-12-22'))")
  cur.fetchall()
pandas模块的 to_sql,read_sql方法读取写入数据(推荐)

参考:https://github.com/dungdm93/sqlalchemy-trino

SQLAlchemy 连接trino 的url格式:

trino://<username>:<password>@<host>:<port>/catalog/[schema]
import pandas as pd
from pandas import DataFrame

from sqlalchemy.engine import Engine, Connection
from sqlalchemy.engine import create_engine

trino_engine = create_engine('trino://root@192.168.1.55:8881/iceberg/ice_dwt')


def trino_pandas_write(engine: Engine):
    df: DataFrame = pd.read_csv('***.csv')
    df['create_date'] = pd.to_datetime(df['create_date'])
    df.to_sql(con=trino_engine, schema="ice_dwt",
    name="dwt_dm_bi_b2b_company_name_wide", method="multi", if_exists='append',
    index=False)


def trino_pandas_read(engine: Engine):
    connection: Connection = engine.connect()
    df = pd.read_sql("SELECT original_name, standard_name, code, create_date, update_date, note FROM iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide",
                     connection)

    print(df.shape)

trino_pandas_write(trino_engine)
trino_pandas_read(trino_engine)

写入数据报错requests.exceptions.ConnectionError

详情:requests.exceptions.ConnectionError: ('Connection aborted.', LineTooLong('got more than 65536 bytes when reading header line'))

原因分析:http.client模块限制了传输数据量大小

解决方案:
import http.client
http.client._MAXLINE = 655360
参考:https://stackoverflow.com/questions/63157046/python-http-request-exception-linetoolong

在这里插入图片描述
trino也可以批量直接写入数据库
https://github.com/trinodb/trino-python-client
engine从sqlchemey换到trino就行了
trino可以配合to_sql 参数 传入 mulit

20211102

date_format(rq,'%Y-%m-%d')='2021-10-22'

日期转字符

https://blog.youkuaiyun.com/u010711495/article/details/112290966
字符转日期trino

20211102

select 常量
直接可以形成一列

20211015

CAST(order_nums AS decimal(10,4)
int转浮点数

https://trino.io/docs/current/sql/create-table-as.html
建表 官方文档

-- 建表测试
create table iceberg.ice_dwd.t2  (user_id) as  select user_id from iceberg.ice_dwd.dwd_dm_hz_b2b_new_user_register ;
连接数据库
import trino
import pandas as pd
'''
python连接trino
'''
conn = trino.dbapi.connect(
    host='192.168.1.55',
    port=8881,
    user='root',
    catalog='iceberg',
    schema='ice_ods',
)
cur = conn.cursor()
cur.execute('SELECT * FROM ods_o_hz_b2b_tb_order_item limit 100' )
rows = cur.fetchall()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值