AL32UTF8的varchar2,AL16UTF16的nclob

本文探讨了在Oracle数据库中NCLOB字段出现乱码的原因及解决思路。具体分析了由于字符集不匹配导致的数据存储与读取过程中的乱码现象,并深入到PreparedStatement.setString方法的应用细节。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

环境:

操作系统:中文版windowsXP

数据库:Oracle9iR2

java文件编码UTF-8,jsp页面编码UTF-8

原库字符集为AL32UTF8,国家字符集为UTF8,导出时客户端字符集也为UTF8,新创建数据库字符集为AL32UTF8

,国家字符集为AL16UTF16,导入数据时,客户端字符集为UTF8,导入数据后一切正常。

当从表A中读取一条信息a,将其nclob列内容插入表B的nclob列,同时向表B中varchar2列插入带中文内容,操作完成后

,生成一条信息b,查询表B此条信息,varchar2列内容无问题,nclob内容乱码。

首先查看A表此条a信息,nclob列的部分内容如下:

-----------------------------------------------------------------------------------------------------------

<table cellspacing="0" cellpadding="5" width="100%" align="center" border="0">
    <tbody>
        <tr>
            <td align="center" height="48"><strong><font face="黑体"

-----------------------------------------------------------------------------------------------------------

dump得到:dump1

-----------------------------------------------------------------------------------------------------------

Typ=1 Len=2000: 0,60,0,116,0,97,0,98,0,108,0,101,0,32,0,99,

0,101,0,108,0,108,0,115,0,112,0,97,0,99,0,105,0,110,0,103,0,

61,0,34,0,48,0,34,0,32,0,99,0,101,0,108,0,108,0,112,0,97,0,

100,0,100,0,105,0,110,0,103,0,61,0,34,0,53,0,34,0,32,0,119,

0,105,0,100,0,116,0,104,0,61,0,34,0,49,0,48,0,48,0,37,0,34,

0,32,0,97,0,108,0,105,0,103,0,110,0,61,0,34,0,99,0,101,0,110,

0,116,0,101,0,114,0,34,0,32,0,98,0,111,0,114,0,100,0,101,0,

114,0,61,0,34,0,48,0,34,0,62,0,13,0,10,0,32,0,32,0,32,0,32,0,

60,0,116,0,98,0,111,0,100,0,121,0,62,0,13,0,10,0,32,0,32,0,

32,0,32,0,32,0,32,0,32,0,32,0,60,0,116,0,114,0,62,0,13,0,10,

0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,

0,60,0,116,0,100,0,32,0,97,0,108,0,105,0,103,0,110,0,61,0,

34,0,99,0,101,0,110,0,116,0,101,0,114,0,34,0,32,0,104,0,101,

0,105,0,103,0,104,0,116,0,61,0,34,0,52,0,56,0,34,0,62,0,60,

0,115,0,116,0,114,0,111,0,110,0,103,0,62,0,60,0,102,0,111,

0,110,0,116,0,32,0,102,0,97,0,99,0,101,0,61,0,34,158,209      ==>黑的UTF-16的2个十进制数

-----------------------------------------------------------------------------------------------------------

其中"<"符号Unicode编码如下:

UTF-8:3C

UTF-16:003C

其中3C为16进制数,转换为10进制则为3*16+12=60

此外,黑体的“黑”字UTF-8编码为E9 BB 91=233 187 145,UTF-16编码为9E D1=158 209。

我们可以看到,实际内容确实是以UTF-16编码方式存储。

 

再查询B表的b,部分内容如下:

-----------------------------------------------------------------------------------------------------------

㱴慢汥⁣敬汳灡捩湧㴢〢⁣敬汰慤摩湧㴢㔢⁷楤瑨㴢㄰〥∠慬楧渽≣敮瑥爢⁢潲摥爽∰∾††㱴扯摹㸠†††‼瑲㸠†††††‼瑤⁡汩杮㴢捥湴敲∠桥楧桴㴢㐸∾㱳瑲潮朾㱦潮琠晡捥㴢釤붓∠獩穥㴢㘢㻧뺎꿥뢸蓨꾾駥궦뻨꺡ꇦ鶿ㄼ⽦潮琾㰯獴牯湧㸼⽴搾††††㰯瑲㸠†††‼瑲㸠†††††‼瑤⁡汩杮㴢捥湴敲∠桥楧桴㴢ㄵ∾駥뢈㨼甾♮扳瀻♮扳瀻♮扳瀻♮扳瀻♮

-----------------------------------------------------------------------------------------------------------

dump得到:dump2

-----------------------------------------------------------------------------------------------------------

Typ=1 Len=2000: 60,116,97,98,108,101,32,99,101,108,108,115,112,97,

99,105,110,103,61,34,48,34,32,99,101,108,108,112,97,100,100,105,110,

103,61,34,53,34,32,119,105,100,116,104,61,34,49,48,48,37,34,32,97,108,

105,103,110,61,34,99,101,110,116,101,114,34,32,98,111,114,100,101,

114,61,34,48,34,62,32,32,32,32,60,116,98,111,100,121,62,32,32,32,32,

32,32,32,32,60,116,114,62,32,32,32,32,32,32,32,32,32,32,32,32,60,116,

100,32,97,108,105,103,110,61,34,99,101,110,116,101,114,34,32,104,101,

105,103,104,116,61,34,52,56,34,62,60,115,116,114,111,110,103,62,60,

102,111,110,116,32,102,97,99,101,61,34,233,187,145         ==>黑的UTF-8编码的三个十进制数

-----------------------------------------------------------------------------------------------------------

我们仔细区分就会发现,dump2其实就是dump1将0丢掉后的内容,这里的nclob列的编码方式为UTF-16,

那么nclob列在查询显示时,第一个字符即为60,116=3C,74

,但是,这两个字节的内容在UTF-16编码表当中不存在,结果即以㱴来显示。

究其原因,其实表B的这条信息b是以UTF-8编码方式保存的。

 

从表A信息a内容中的首个汉字“黑”来说,我们知道,dump1中,其对应的值是158 209,而再从dump2中分析

可以确定“黑”字其实是以UTF-8的编码E9 BB 91保存的。

 

那么问题的症结就在于,数据是以UTF-8方式存入,以UTF-16的方式读取,自然会乱码了。

 

为什么数据会以UTF-8方式存入

查看保存调用的代码,注意到PreparedStatement.setString方法的API注释写道:

void java.sql.PreparedStatement.setString(int parameterIndex, String x) throws SQLException

Sets the designated parameter to the given Java String value. The driver converts this to an SQL VARCHAR or LONGVARCHAR

value (depending on the argument's size relative to the driver's limits on VARCHAR values) when it sends it to the database.

原来当参数String X超过限定值时,其值会被驱动转换为一个SQL VARCHAR型值,因为这里数据库字符集是AL32UTF8,而国家字符集是

AL16UTF16,最终,数据因为过长而以varchar2类型即,以AL32UTF8字符集存入数据库,也就因此产生了乱码。

 

 

 

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值