20221115
trino可以配置连接所有的数据库源,比如hive和mysql,hive,mysql作为一级类目,
hive,mysql本身的库名作为第二级类目
20220819
{'message': 'Insert query has mismatched column types: Table: [varchar, varchar, varchar, varchar, decimal(10,2), decimal(10,2), timestamp(6) with time zone, varchar, varchar], Query: [varchar(11), varchar(5), varchar(14), varchar(6), decimal(20,16), decimal(20,15), varchar(15), timestamp(3), varchar(10)]',
table:是数据库中的表,query是要插入的表
INSERT into iceberg.onekey.odc_hz_onekey_vendor_ct_test SELECT *FROM iceberg.onekey.odc_hz_onekey_vendor_ct
trino不能在iceberg上面直接建表,只能用spark建表再insert into
20220725
ModuleNotFoundError: No module named 'trino.sqlalchemy'
trino和sqlalchemy的版本比兼容
trino用0.313.0
sqlalchemy用 1.4.39
python 3.7.6
20220718
SQL 错误 [65544]: Query failed (#20220718_081753_00078_3cm32): Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (192.168.1.54:8881)
资源不够
http://192.168.1.52:8881/ui/
trino控制台
账号密码随便输入
20220705
重启trino,打开表半天无反应
20220623
trino很容易挂,不稳定
20220308
select
format_datetime(create_time ,'yyyy-MM-dd')
from
iceberg.ice_ods.ods_o_hz_b2b_tb_order_item
timestamp转日期
https://blog.youkuaiyun.com/u010711495/article/details/112195655
timestamp日期转换重点
20220308
date(order_day) > date_add('day',-30,current_date)
字符转日期,日期加减
20220215
SQL 错误: Error executing query
服务器连不上
堡垒机出问题了
20220207
ModuleNotFoundError: No module named ‘trino.sqlalchemy’
trino和sqlalchemy的版本兼容问题
trino需要0.306.0
sqlalchemy 1.4.23
sqlalchemy.exc.NoSuchModuleError: Can‘t load plugin: sqlalchemy.dialects:presto
sqlalchemy.exc.NoSuchModuleError: Can‘t load plugin: sqlalchemy.dialects:trino
需要安装PyHive 0.6.4
20220126
用python直连Trino不能delete表里面的数据只能删表
presto也可以直接使用mysql语法?
presto也是数据仓库
20220119
TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 20:49: mismatched input '#'. Expecting: '(', ')',
sql代码里面还有井号,去掉井号换成 --
元组直接转换成字符,如果只有一个元素的话,逗号需要去掉
20220118
HTTPConnectionPool(host='192.168.1.55', port=8881): Max retries exceeded with url: /v1/statement (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001E783328278>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))
服务挂了,重启
降低连接数,连接实例
20220114
TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 9:8: mismatched input '100'. Expecting: '%', '*', '+', '-', '.', '/', 'AND', 'AT', 'EXCEPT', 'FETCH', 'GROUP', 'HAVING', 'INTERSECT', 'LIMIT', 'OFFSET', 'OR', 'ORDER', 'UNION', 'WINDOW', '[', '||', <EOF>, <predicate>", query_id=20220114_081216_00157_decgk)
应该是sql语法错了 limit之前不能加and
20220107
date(‘2021-10-01’)
cast(‘2021-10-01’ as date)
字符转日期
https://www.cnblogs.com/lixiaozhi/p/11752483.html
窗口函数 lead 和 lag的理解
partition by 分区之后 lead 取当前行的前几行,lag取当前行的后几行
20220106
date_diff('day',LEAD(uo.create_time,1,uo.create_time) over (partition by ui.user_id order by uo.create_time desc) ,uo.create_time)
20211223
第三方模块trino读写数据(不推荐)
参考:https://github.com/trinodb/trino-python-client
#########
# 20220301更新 批量插入,全部插入要报错
def insert_trino(df):
http.client._MAXLINE = 655360
trino_engine = create_engine('trino://root@192.168.1.55:8881/iceberg/ice_ods')
times = int(np.ceil(df.shape[0] / 10000))
for i in tqdm(range(times)):
df.iloc[i * 10000: (i + 1) * 10000, :].to_sql(
con=trino_engine, schema="ice_ods",
name="ods_o_hz_onekey_jkzj_goods_washed_da", method="multi", if_exists='append',
index=False
)
logger.debug("存入数据库成功")
#######
import trino
from trino import transaction
with trino.dbapi.connect(
host='192.168.1.55',
port=8881,
user='root',
catalog='iceberg',
schema='ice_dwt',
# isolation_level=transaction.IsolationLevel.REPEATABLE_READ,
) as conn:
cur = conn.cursor()
cur.execute('SELECT * FROM iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide')
rows = cur.fetchall()
写(不推荐,批量写入速度太慢)
import trino
from trino import transaction
with trino.dbapi.connect(
host='192.168.1.55',
port=8881,
user='root',
catalog='iceberg',
schema='ice_dwt',
# isolation_level=transaction.IsolationLevel.REPEATABLE_READ,
) as conn:
cur = conn.cursor()
# 写入
cur.execute("insert into iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide (original_name,standard_name,code,create_date) values ('邛崃本地','无效',-2,date('2021-12-22'))")
cur.fetchall()
pandas模块的 to_sql,read_sql方法读取写入数据(推荐)
参考:https://github.com/dungdm93/sqlalchemy-trino
SQLAlchemy 连接trino 的url格式:
trino://<username>:<password>@<host>:<port>/catalog/[schema]
import pandas as pd
from pandas import DataFrame
from sqlalchemy.engine import Engine, Connection
from sqlalchemy.engine import create_engine
trino_engine = create_engine('trino://root@192.168.1.55:8881/iceberg/ice_dwt')
def trino_pandas_write(engine: Engine):
df: DataFrame = pd.read_csv('***.csv')
df['create_date'] = pd.to_datetime(df['create_date'])
df.to_sql(con=trino_engine, schema="ice_dwt",
name="dwt_dm_bi_b2b_company_name_wide", method="multi", if_exists='append',
index=False)
def trino_pandas_read(engine: Engine):
connection: Connection = engine.connect()
df = pd.read_sql("SELECT original_name, standard_name, code, create_date, update_date, note FROM iceberg.ice_dwt.dwt_dm_bi_b2b_company_name_wide",
connection)
print(df.shape)
trino_pandas_write(trino_engine)
trino_pandas_read(trino_engine)
写入数据报错requests.exceptions.ConnectionError
详情:requests.exceptions.ConnectionError: ('Connection aborted.', LineTooLong('got more than 65536 bytes when reading header line'))
原因分析:http.client模块限制了传输数据量大小
解决方案:
import http.client
http.client._MAXLINE = 655360
参考:https://stackoverflow.com/questions/63157046/python-http-request-exception-linetoolong
trino也可以批量直接写入数据库
https://github.com/trinodb/trino-python-client
engine从sqlchemey换到trino就行了
trino可以配合to_sql 参数 传入 mulit
20211102
date_format(rq,'%Y-%m-%d')='2021-10-22'
日期转字符
https://blog.youkuaiyun.com/u010711495/article/details/112290966
字符转日期trino
20211102
select 常量
直接可以形成一列
20211015
CAST(order_nums AS decimal(10,4)
int转浮点数
https://trino.io/docs/current/sql/create-table-as.html
建表 官方文档
-- 建表测试
create table iceberg.ice_dwd.t2 (user_id) as select user_id from iceberg.ice_dwd.dwd_dm_hz_b2b_new_user_register ;
连接数据库
import trino
import pandas as pd
'''
python连接trino
'''
conn = trino.dbapi.connect(
host='192.168.1.55',
port=8881,
user='root',
catalog='iceberg',
schema='ice_ods',
)
cur = conn.cursor()
cur.execute('SELECT * FROM ods_o_hz_b2b_tb_order_item limit 100' )
rows = cur.fetchall()