mysql/jdbc:设置useInformationSchema=true读取表注释信息(table_comment)

解决MySQL表注释返回null问题
本文介绍了解决MySQL中使用Java获取表注释信息返回null的问题。原因是Connector/J的useInformationSchema参数默认为false,导致无法从DatabaseMetaData正确获取表注释。通过设置useInformationSchema=true,可解决问题。

#问题描述
今天在读取表的注释信息(COMMENT)时,发现返回的REMARKS字段返回居然是null.
以下是代码示例:

DatabaseMetaData meta = this.pConnection.getMetaData();
// 获取所有表信息
ResultSet resultSet = this.meta.getTables(this.catalog, tableSchema, pattern, this.tableTypes);
while (resultSet.next()) {
	Table table = new Table();
	# 返回null
	String comment=resultSet.getString("REMARKS");
}
resultSet.close();

#原因分析
google找了半天,总算知道原因:
Connector/J 5.0.0以后的版本有一个名为useInformationSchema的数据库连接参数,
在默认连接参数情况下,useInformationSchema=false,导致Connection.getMetaData()方法返回的DatabaseMetaData 对象是com.mysql.jdbc.DatabaseMetaData,而不是com.mysql.jdbc。DatabaseMetaDataUsingInfoSchema,
DatabaseMetaDataUsingInfoSchemaDatabaseMetaData是的子类,看名称就能联想到是通过 INFORMATION_SCHEMA 数据库获取数据库的metadata,可以正确返回table_comment字段。

下面是useInformationSchema的官方说明

useInformationSchema

When connected to MySQL-5.0.7 or newer, should the driver use the INFORMATION_SCHEMA to derive information used by DatabaseMetaData?

Default: false
Since version: 5.0.0
摘自《5.1 Driver/Datasource Class Names, URL Syntax and Configuration Properties for Connector/J》

而父类DatabaseMetaData并不一定能正常返回table_comment字段.

关于INFORMATION_SCHEMA 这里不深入探讨,参见《Chapter 24 INFORMATION_SCHEMA Tables》
#解决方法
解决的方法也很简单:
数据库连接时设置useInformationSchema=true
如何设置数据库连接参数呢?有两个途径
##方法一:java代码实现

# 将所有参数装入java.util.Properties 对象
Properties props = new Properties();
props.setProperty("username",this.username);
props.setProperty("password",this.password); 
props.setProperty("useInformationSchema", "true");
# 调用getConnection(String,Properties)方法创建连接
this.pConnection = java.sql.DriverManager.getConnection(this.url, props);

##方法二:连接url参数
直接将参数加到数据库连接url,如下代码中在数据连接url中添加了两个参数characterEncoding=utf8useInformationSchema=true

String url="jdbc:mysql://localhost:3306/test?characterEncoding=utf8&&useInformationSchema=true"
this.pConnection = DriverManager.getConnection(this.url, this.username,this.password);

关于mysql 连接URL的语法参见:
《5.1 Driver/Datasource Class Names, URL Syntax and Configuration Properties for Connector/J》

#参考资料
《Connector/J does not retrieve the table comment in a InnoDB table》
《Retrieve mysql table comment using DatabaseMetaData》
《Chapter 24 INFORMATION_SCHEMA Tables》
《5.1 Driver/Datasource Class Names, URL Syntax and Configuration Properties for Connector/J》

from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, EnvironmentSettings, DataTypes from pyflink.table.expressions import col from pyflink.table.udf import udf import os # 定义处理CDC操作的UDF @udf(result_type=DataTypes.STRING()) def process_cdc_op(op): """将CDC操作转换为可读的标签""" return { 'I': 'INSERT', 'U': 'UPDATE', 'D': 'DELETE' }.get(op, 'UNKNOWN') def main(): env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) env.enable_checkpointing(5000) # 添加必要的连接器JAR包 flink_home = os.getenv('FLINK_HOME', '/opt/flink') env.add_jars( f"file://{flink_home}/lib/flink-connector-kafka-1.17.1.jar", f"file://{flink_home}/lib/flink-connector-jdbc-1.17.1.jar", f"file://{flink_home}/lib/flink-sql-connector-hive-3.1.2_2.12-1.16.3.jar", # 注意Scala版本 f"file://{flink_home}/lib/mysql-connector-java-8.0.28.jar", f"file://{flink_home}/lib/hive-exec-3.1.2.jar", ) settings = EnvironmentSettings.new_instance().in_streaming_mode().build() t_env = StreamTableEnvironment.create(env, environment_settings=settings) # 注册UDF t_env.create_temporary_function("process_cdc_op", process_cdc_op) # 注册Hive Catalog t_env.execute_sql(""" CREATE CATALOG hive_catalog WITH ( 'type' = 'hive', 'hive-conf-dir' = '/opt/hive/conf', 'hive.metastore.uris' = 'thrift://master:9083', 'default-database' = 'user_meal' ) """) t_env.use_catalog("hive_catalog") # 1. 创建Kafka源 t_env.execute_sql(""" CREATE TABLE kafka_user_meal ( id STRING, review STRING, rating DOUBLE, review_time STRING, user_id STRING, meal_id STRING, op STRING, -- CDC操作类型(I/U/D) ts AS TO_TIMESTAMP(FROM_UNIXTIME(CAST(review_time AS BIGINT))), WATERMARK FOR ts AS ts - INTERVAL '5' SECOND, PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'kafka', 'topic' = 'cleaned-user-meal-reviews', 'properties.bootstrap.servers' = 'master:9092,slave01:9092,slave02:9092', 'properties.group.id' = 'flink-cdc-group', 'scan.startup.mode' = 'latest-offset', 'format' = 'json', 'json.ignore-parse-errors' = 'true' ) """) # 2. 创建Hive(存储原始CDC操作) t_env.execute_sql(""" CREATE TABLE IF NOT EXISTS hive_user_meal_cdc ( id STRING, review STRING, rating DOUBLE, review_time STRING, user_id STRING, meal_id STRING, operation_type STRING, -- 转换后的操作类型 operation_ts TIMESTAMP(3), op STRING -- 原始操作类型 ) PARTITIONED BY (op) -- 按操作类型分区 TBLPROPERTIES ( 'sink.partition-commit.policy.kind' = 'metastore,success-file', 'auto-compaction' = 'true' ) """) # 3. 创建MySQL(存储最新状态) t_env.execute_sql(""" CREATE TABLE mysql_user_meal ( id STRING PRIMARY KEY NOT ENFORCED, review STRING, rating DOUBLE, review_time TIMESTAMP(3), user_id STRING, meal_id STRING, last_operation STRING, update_time TIMESTAMP(3) ) WITH ( 'connector' = 'jdbc', 'url' = 'jdbc:mysql://mysql-host:3306/user_meal', 'table-name' = 'user_meal_reviews', 'username' = 'root', 'password' = '5266', 'driver' = 'com.mysql.cj.jdbc.Driver', 'sink.buffer-flush.max-rows' = '100', 'sink.buffer-flush.interval' = '5s' ) """) # 4. 写入Hive(存储所有CDC操作) t_env.execute_sql(""" INSERT INTO hive_user_meal_cdc SELECT id, review, rating, review_time, user_id, meal_id, process_cdc_op(op) AS operation_type, -- 使用UDF转换操作类型 ts AS operation_ts, op FROM kafka_user_meal """) # 5. 从Hive读取最新状态并写入MySQL # 使用Flink的CDC特性获取最新状态 latest_data = t_env.sql_query(""" SELECT id, LAST_VALUE(review) AS review, LAST_VALUE(rating) AS rating, MAX(ts) AS review_time, LAST_VALUE(user_id) AS user_id, LAST_VALUE(meal_id) AS meal_id, LAST_VALUE(process_cdc_op(op)) AS last_operation, CURRENT_TIMESTAMP AS update_time FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) AS row_num FROM kafka_user_meal WHERE op <> 'D' -- 排除删除操作 ) WHERE row_num = 1 """) # 创建临时视图 t_env.create_temporary_view("latest_user_meal", latest_data) # 写入MySQL t_env.execute_sql(""" INSERT INTO mysql_user_meal SELECT * FROM latest_user_meal """) # 执行任务 env.execute("Flink CDC to Hive and MySQL Pipeline") if __name__ == '__main__': main() /home/hadoop/桌面/pyflink/bin/python3.8 /home/hadoop/PycharmProjects/SparkProject/src/flinkCDC.py Traceback (most recent call last): File "/home/hadoop/PycharmProjects/SparkProject/src/flinkCDC.py", line 169, in <module> main() File "/home/hadoop/PycharmProjects/SparkProject/src/flinkCDC.py", line 42, in main t_env.execute_sql(""" File "/home/hadoop/桌面/pyflink/lib/python3.8/site-packages/pyflink/table/table_environment.py", line 837, in execute_sql return TableResult(self._j_tenv.executeSql(stmt)) File "/home/hadoop/桌面/pyflink/lib/python3.8/site-packages/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/home/hadoop/桌面/pyflink/lib/python3.8/site-packages/pyflink/util/exceptions.py", line 146, in deco return f(*a, **kw) File "/home/hadoop/桌面/pyflink/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o44.executeSql. : org.apache.flink.table.api.ValidationException: Unable to create catalog 'hive_catalog'. Catalog options are: 'default-database'='user_meal' 'hive-conf-dir'='/opt/hive/conf' 'hive.metastore.uris'='thrift://master:9083' 'type'='hive' at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:439) at org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1468) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1214) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.table.api.ValidationException: Unsupported options found for 'hive'. Unsupported options: hive.metastore.uris Supported options: default-database hadoop-conf-dir hive-conf-dir hive-version property-version at org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:632) at org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:931) at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:69) at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:436) ... 14 more
06-19
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

10km

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值