Hive 中文分区 展示 乱码 注释乱码问题

本文针对Hive中出现的中文注释乱码、中文分区名插入失败及查询结果乱码等问题,提供了一套详细的解决方案。通过修改元数据库字符集为UTF-8,并调整Hive配置,成功解决了中文乱码的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

遇到神一样的客户,遇到神一样的问题,用神一样的脚本来解决。

一、神问题:

1.中文注释乱码
例如:
hive> desc table;
OK
content                 string
hour                    string                  ??

# Partition Information
# col_name              data_type               comment

hour                    string                  ??
2.要求分区名是中文 例如 dt=今天/city=杭州

不改编码,插入不进去。报错参考下面。

3.Select * from table 时,展示乱码

例如:

hive>
    > select * from table;
OK
工作流目录   ?????

可见这里本来两个都是中文结果分区字段显示乱码了。
而hdfs路径,和hive的mysql元数据库都没问题!!!

二、报错记录:

1:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Transaction failed to commit)
2:

18/07/04 16:56:46 ERROR Hive: MetaException(message:Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`PART_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `A0`.`PART_NAME` = ?)

三、神脚本:

-- 先use相应的 database。执行失败的地方略过继续往下执行

alter database hive_meta default character set utf8;
alter table BUCKETING_COLS default character set utf8;
alter table CDS default character set utf8;
alter table COLUMNS_V2 default character set utf8;
alter table DATABASE_PARAMS default character set utf8;
alter table DBS default character set utf8;
alter table FUNCS default character set utf8;
alter table FUNC_RU default character set utf8;
alter table GLOBAL_PRIVS default character set utf8;
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEYS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table PARTITION_PARAMS default character set utf8;
-- alter table PART_COL_STATS default character set utf8;
alter table ROLES default character set utf8;
alter table SDS default character set utf8;
alter table SD_PARAMS default character set utf8;
alter table SEQUENCE_TABLE default character set utf8;
alter table SERDES default character set utf8;
alter table SERDE_PARAMS default character set utf8;
alter table SKEWED_COL_NAMES default character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP default character set utf8;
alter table SKEWED_STRING_LIST default character set utf8;
alter table SKEWED_STRING_LIST_VALUES default character set utf8;
alter table SKEWED_VALUES default character set utf8;
alter table SORT_COLS default character set utf8;
alter table TABLE_PARAMS default character set utf8;
alter table TAB_COL_STATS default character set utf8;
alter table TBLS default character set utf8;
alter table VERSION default character set utf8;
alter table BUCKETING_COLS convert to character set utf8;
alter table CDS convert to character set utf8;
alter table COLUMNS_V2 convert to character set utf8;
alter table DATABASE_PARAMS convert to character set utf8;
alter table DBS convert to character set utf8;
alter table FUNCS convert to character set utf8;
alter table FUNC_RU convert to character set utf8;
alter table GLOBAL_PRIVS convert to character set utf8;
alter table PARTITIONS convert to character set utf8;
alter table PARTITION_KEYS convert to character set utf8;
alter table PARTITION_KEY_VALS convert to character set utf8;
alter table PARTITION_PARAMS convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
alter table ROLES convert to character set utf8;
alter table SDS convert to character set utf8;
alter table SD_PARAMS convert to character set utf8;
alter table SEQUENCE_TABLE convert to character set utf8;
alter table SERDES convert to character set utf8;
alter table SERDE_PARAMS convert to character set utf8;
alter table SKEWED_COL_NAMES convert to character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP convert to character set utf8;
alter table SKEWED_STRING_LIST convert to character set utf8;
alter table SKEWED_STRING_LIST_VALUES convert to character set utf8;
alter table SKEWED_VALUES convert to character set utf8;
alter table SORT_COLS convert to character set utf8;
alter table TABLE_PARAMS convert to character set utf8;
alter table TAB_COL_STATS convert to character set utf8;
alter table TBLS convert to character set utf8;
alter table VERSION convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_client = utf8 ;
-- SET character_set_connection = utf8 ;

-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_database = utf8 ;
SET character_set_results = utf8 ;
SET character_set_server = utf8 ;
-- SET collation_connection = utf8 ;
-- SET collation_database = utf8 ;
-- SET collation_server = utf8 ;
SET NAMES 'utf8';
SET NAMES 'utf8';

四、解决后效果:

注释:
hive> desc test_table_xuehuan15;
OK
content                 string
hour                    string                  时间

# Partition Information
# col_name              data_type               comment

hour                    string                  时间
数据:

hive>
    > select * from table;
OK
工作流目录   工作流目录
### Hive 中文乱码问题的解决方案与字符编码配置 Hive 中出现中文乱码问题通常与元数据存储的编码设置、Hive 的配置文件以及底层数据库(如 MySQL)的字符集设置相关。以下是解决该问题的具体方法和步骤。 #### 1. 配置 Hive 元数据存储的连接信息 在 `hive-site.xml` 文件中,确保 Hive 使用 UTF-8 编码读取元数据。通过以下配置项设置 JDBC 连接字符串: ```xml <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop777:3306/metastore?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value> </property> ``` 此配置确保 Hive 在访问 MySQL 数据库时使用 UTF-8 字符集[^1]。 #### 2. 修改 Hive 元数据库的字符集 如果 Hive 元数据库(例如 MySQL)的默认字符集不是 UTF-8,则需要修改相关的表和字段以支持中文字符。具体操作包括: - **修改字段注释字符集**: ```sql ALTER TABLE COLUMNS_V2 MODIFY COLUMN COMMENT VARCHAR(256) CHARACTER SET utf8; ``` - **修改表注释字符集**: ```sql ALTER TABLE TABLE_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; ``` - **修改分区参数以支持中文表示**: ```sql ALTER TABLE PARTITION_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; ALTER TABLE PARTITION_KEYS MODIFY COLUMN PKEY_COMMENT VARCHAR(4000) CHARACTER SET utf8; ``` - **修改索引名注释以支持中文表示**: ```sql ALTER TABLE INDEX_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; ``` - **修改视图以支持中文表示**: ```sql ALTER TABLE TBLS MODIFY COLUMN VIEW_EXPANDED_TEXT MEDIUMTEXT CHARACTER SET utf8; ALTER TABLE TBLS MODIFY COLUMN VIEW_ORIGINAL_TEXT MEDIUMTEXT CHARACTER SET utf8; ``` 这些 SQL 操作可以确保 Hive 元数据库中的所有关键字段支持 UTF-8 编码,从而避免中文字符存储或检索时的乱码问题[^4]。 #### 3. 设置 Hive 元数据库的默认字符集 除了修改特定字段外,还需要确保整个元数据库的默认字符集为 UTF-8。可以通过以下 SQL 命令修改数据库的字符集: ```sql ALTER DATABASE hive CHARACTER SET utf8; ``` 这一操作将数据库的整体字符集设置为 UTF-8,进一步保证了中文字符的正确处理[^3]。 #### 4. 验证并测试配置 完成上述配置后,执行以下步骤验证是否解决了中文乱码问题: 1. 创建包含中文注释的表: ```sql CREATE TABLE test_table ( id INT, name STRING COMMENT '姓名' ) COMMENT='测试表'; ``` 2. 查询表结构及注释信息: ```sql DESCRIBE FORMATTED test_table; ``` 3. 确认输出结果中中文注释显示正常,无乱码现象。 #### 5. 可选:调整 Hive 客户端的编码设置 在某些情况下,Hive 客户端(如 Beeline 或 Hive CLI)可能未正确识别终端的编码格式,导致中文显示异常。可以通过以下方式确保客户端使用 UTF-8 编码: - 在启动 Hive CLI 时指定编码: ```bash hive --hiveconf hive.root.logger=INFO,console --Dfile.encoding=UTF-8 ``` - 如果使用 Beeline,确保 JDBC 连接字符串中包含 UTF-8 参数: ```bash beeline -u jdbc:hive2://localhost:10000/default?characterEncoding=UTF-8 ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值