Character Set and Globalization Support Considerations

最新推荐文章于 2025-11-25 13:53:13 发布

原创最新推荐文章于 2025-11-25 13:53:13 发布 · 118 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#oracle

Oracle 专栏收录该内容

193 篇文章

订阅专栏

本文解析了Oracle数据库中EXPDP/IMPDP及EXP/IMP工具在数据迁移过程中涉及的字符集转换问题，包括不同字符集间的转换机制及其对数据的影响。

Expdp & impdp

Expdp导出文件字符集就是数据库相应设置字符集，与NLS_LANG没关

在导入时如果dumpfile与目标数据库字符集不一致会转换

Exp & imp

Exp导出数据分为两部分(user data与ddl)，它们使用的字符集设置方式不同：

User Data

User Data与源数据库字符集一致，不受NLS_LANG影响，如果它与导入库字符集不同，在导入时发生转换

The Export utility always exports user data, including Unicode data, in the character sets of the Export server. (Character sets are specified at database creation.) If the character sets of the source database are different than the character sets of the import database, then a single conversion is performed to automatically convert the data to the character sets of the Import server.

如果字符集的排序不同可能导入产生不同结果

If the export character set has a different sorting order than the import character set, then tables that are partitioned on character columns may yield unpredictable results. For example, consider the following table definition, which is produced on a database having an ASCII character set:

CREATE TABLE partlist (part VARCHAR2(10), partno NUMBER(2))

PARTITION BY RANGE (part)

(PARTITION part_low VALUES LESS THAN ('Z') TABLESPACE tbs_1,

PARTITION part_mid VALUES LESS THAN ('z') TABLESPACE tbs_2,

PARTITION part_high VALUES LESS THAN (MAXVALUE) TABLESPACE tbs_3);

This partitioning scheme makes sense because z comes after Z in ASCII character sets.

When this table is imported into a database based upon an EBCDIC character set, all of the rows in the part_mid partition will migrate to the part_low partition because z comes before Z in EBCDIC character sets. To obtain the desired results, the owner of partlist must repartition the table following the import.

Data Definition Language (DDL)

DDL的字符集受exp客户端的NLS_LANG设置：

导出时源库字符集与exp client NLS_LANG不同：在导出时数据库字符集按NLS_LANG转换
导入时dumpfile文件字符集与imp client NLS_LANG不同：在导入时将发生转换，但只能转化single-byte字符集，所以应将imp client NLS_LANG设置与exp client NLS_LANG相同
导入时imp client NLS_LANG与导入库不同：在导入时把NLS_LANG字符集按数据库转换

Up to three character set conversions may be required for data definition language (DDL) during an export/import operation:

Export writes export files using the character set specified in the NLS_LANG environment variable for the user session. A character set conversion is performed if the value of NLS_LANG differs from the database character set.
If the export file's character set is different than the import user session character set, then Import converts the character set to its user session character set. Import can only perform this conversion for single-byte character sets. This means that for multibyte character sets, the import file's character set must be identical to the export file's character set.
A final character set conversion may be performed if the target database's character set is different from the character set used by the import user session.

最简单的方法把上面4个字符集相关选项设置相同：

To minimize data loss due to character set conversions, ensure that the export database, the export user session, the import user session, and the import database all use the same character set.

注：第二种情况单字节转化也可能产生数据丢失：

Single-Byte Character Sets and Export and Import

Some 8-bit characters can be lost (that is, converted to 7-bit equivalents) when you import an 8-bit character set export file. This occurs if the system on which the import occurs has a native 7-bit character set, or the NLS_LANG operating system environment variable is set to a 7-bit character set. Most often, this is apparent when accented characters lose the accent mark.

To avoid this unwanted conversion, you can set the NLS_LANG operating system environment variable to be that of the export file character set.

Multibyte Character Sets and Export and Import

During character set conversion, any characters in the export file that have no equivalent in the target character set are replaced with a default character. (The default character is defined by the target character set.) To guarantee 100% conversion, the target character set must be a superset (or equivalent) of the source character set.

Note: When the character set width differs between the Export server and the Import server, truncation of data can occur if conversion causes expansion of data. If truncation occurs, then Import displays a warning message.