Character Set and Globalization Support Considerations

本文解析了Oracle数据库中EXPDP/IMPDP及EXP/IMP工具在数据迁移过程中涉及的字符集转换问题,包括不同字符集间的转换机制及其对数据的影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • Expdp & impdp

Expdp导出文件字符集就是数据库相应设置字符集,与NLS_LANG没关

在导入时如果dumpfile与目标数据库字符集不一致会转换

  • Exp & imp

Exp导出数据分为两部分(user data与ddl),它们使用的字符集设置方式不同:

  1. User Data

User Data与源数据库字符集一致,不受NLS_LANG影响,如果它与导入库字符集不同,在导入时发生转换

The Export utility always exports user data, including Unicode data, in the character sets of the Export server. (Character sets are specified at database creation.) If the character sets of the source database are different than the character sets of the import database, then a single conversion is performed to automatically convert the data to the character sets of the Import server.

如果字符集的排序不同可能导入产生不同结果

If the export character set has a different sorting order than the import character set, then tables that are partitioned on character columns may yield unpredictable results. For example, consider the following table definition, which is produced on a database having an ASCII character set:

CREATE TABLE partlist (part VARCHAR2(10), partno NUMBER(2))

PARTITION BY RANGE (part)

  (PARTITION part_low VALUES LESS THAN ('Z') TABLESPACE tbs_1,

  PARTITION part_mid VALUES LESS THAN ('z') TABLESPACE tbs_2,

  PARTITION part_high VALUES LESS THAN (MAXVALUE) TABLESPACE tbs_3);

This partitioning scheme makes sense because z comes after Z in ASCII character sets.

When this table is imported into a database based upon an EBCDIC character set, all of the rows in the part_mid partition will migrate to the part_low partition because z comes before Z in EBCDIC character sets. To obtain the desired results, the owner of partlist must repartition the table following the import.

  1. Data Definition Language (DDL)

DDL的字符集受exp客户端的NLS_LANG设置:

  1. 导出时源库字符集与exp client NLS_LANG不同:在导出时数据库字符集按NLS_LANG转换
  2. 导入时dumpfile文件字符集与imp client NLS_LANG不同:在导入时将发生转换,但只能转化single-byte字符集,所以应将imp client NLS_LANG设置与exp client NLS_LANG相同
  3. 导入时imp client NLS_LANG与导入库不同:在导入时把NLS_LANG字符集按数据库转换

Up to three character set conversions may be required for data definition language (DDL) during an export/import operation:

  1. Export writes export files using the character set specified in the NLS_LANG environment variable for the user session. A character set conversion is performed if the value of NLS_LANG differs from the database character set.
  2. If the export file's character set is different than the import user session character set, then Import converts the character set to its user session character set. Import can only perform this conversion for single-byte character sets. This means that for multibyte character sets, the import file's character set must be identical to the export file's character set.
  3. A final character set conversion may be performed if the target database's character set is different from the character set used by the import user session.

最简单的方法把上面4个字符集相关选项设置相同:

To minimize data loss due to character set conversions, ensure that the export database, the export user session, the import user session, and the import database all use the same character set.

注:第二种情况单字节转化也可能产生数据丢失:

Single-Byte Character Sets and Export and Import

Some 8-bit characters can be lost (that is, converted to 7-bit equivalents) when you import an 8-bit character set export file. This occurs if the system on which the import occurs has a native 7-bit character set, or the NLS_LANG operating system environment variable is set to a 7-bit character set. Most often, this is apparent when accented characters lose the accent mark.

To avoid this unwanted conversion, you can set the NLS_LANG operating system environment variable to be that of the export file character set.

  1. Multibyte Character Sets and Export and Import

During character set conversion, any characters in the export file that have no equivalent in the target character set are replaced with a default character. (The default character is defined by the target character set.) To guarantee 100% conversion, the target character set must be a superset (or equivalent) of the source character set.

Note: When the character set width differs between the Export server and the Import server, truncation of data can occur if conversion causes expansion of data. If truncation occurs, then Import displays a warning message.

C#中的全球化(Globalization)是指为了支持不同地区和文化的用户,使应用程序能够适应不同的语言、日期和时间格式、货币符号等。C#提供了一些内置的功能来处理全球化,包括本地化资源、日期和时间处理、数字格式化等。 在C#中进行全球化的一个重要概念是区域设置(Culture),它定义了特定地区或文化所使用的语言、日期和时间格式、货币符号等。通过设置适当的区域设置,可以确保应用程序在不同地区或文化下的表现一致。 C#中的全球化功能主要通过System.Globalization命名空间提供。其中,CultureInfo类用于表示特定的区域设置,可以通过它获取和设置语言、日期和时间格式等信息。另外,还有一些其他类如NumberFormatInfo、DateTimeFormatInfo等用于处理数字和日期时间的格式化。 要使应用程序支持全球化,可以使用本地化资源。本地化资源是应用程序中的字符串、图像等资源的翻译版本,根据当前的区域设置加载相应的资源。通过使用资源管理器类(ResourceManager)和本地化资源文件(.resx),可以实现应用程序的本地化支持。 总之,C#提供了丰富的全球化功能来支持应用程序适应不同地区和文化的需求,包括区域设置、本地化资源、日期和时间处理、数字格式化等。这些功能可以帮助开发人员轻松地将应用程序国际化,以满足全球用户的需求。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值