Collation can be based on the Windows Collation or SQL Collation

本文详细介绍了SQL Server中Collation的使用方式及其不同层级的设置方法。通过具体实例展示了如何查看可用的Collation、确定服务器级默认Collation、创建具有不同Collation的数据库及表,并演示了在查询中强制指定Collation进行比较。
/*
        Background : SQL2000 supports collation at different levels.
        Collation can be based on the Windows Collation or SQL Collation.
        Collation can be specified at the server, database & column level.
        For more details, see the BOL topics on COLLATION. So this small
        article should give a fairly detailed view of the various options
        available & gotchas! COLLATION is very powerful but requires good
        understanding to use them efficiently.
*/
-- To determine all the available collations use the system function
-- fn_helpcollations. The example below gets all the SQL Latin collations.
SELECT * FROM ::fn_helpcollations()
WHERE [name] LIKE 'SQL_Latin%';
/*
name                            description
------------------------------- ---------------------------------------------------------------
SQL_Latin1_General_CP437_BIN Latin1-General, binary sort for Unicode Data,
                                SQL Server Sort Order 30 on Code Page 437 for non-Unicode Data
SQL_Latin1_General_CP437_CI_AI Latin1-General, case-insensitive, accent-insensitive,
                                kanatype-insensitive, width-insensitive for Unicode Data,
                                SQL Server Sort Order 34 on Code Page 437 for non-Unicode Data
SQL_Latin1_General_CP437_CI_AS Latin1-General, case-insensitive, accent-sensitive,
                                kanatype-insensitive, width-insensitive for Unicode Data,
                                SQL Server Sort Order 32 on Code Page 437 for non-Unicode Data
SQL_Latin1_General_CP437_CS_AS Latin1-General, case-sensitive, accent-sensitive,
                                kanatype-insensitive, width-insensitive for Unicode Data,
                                SQL Server Sort Order 31 on Code Page 437 for non-Unicode Data
SQL_Latin1_General_CP850_BIN Latin1-General, binary sort for Unicode Data,
                                SQL Server Sort Order 40 on Code Page 850 for non-Unicode Data
SQL_Latin1_General_CP850_CI_AI Latin1-General, case-insensitive, accent-insensitive,
                                kanatype-insensitive, width-insensitive for Unicode Data,
                                SQL Server Sort Order 44 on Code Page 850 for non-Unicode Data
SQL_Latin1_General_CP850_CI_AS Latin1-General, case-insensitive, accent-sensitive,
                                kanatype-insensitive, width-insensitive for Unicode Data,
                                SQL Server Sort Order 42 on Code Page 850 for non-Unicode Data
*/
GO

-- To determine the server-wide collation established during SETUP do:
SELECT SERVERPROPERTY( 'Collation' ) AS Server_Default_Collation;
/*
Server_Default_Collation
------------------------------
SQL_Latin1_General_CP437_CI_AS
*/
GO

-- Now let's create a database with a different collation
-- than the server's i.e., the SQL Latin with binary sort order
CREATE DATABASE CollateTestDB COLLATE SQL_Latin1_General_CP850_BIN;
GO
USE CollateTestDB;
GO
-- To check the database collation do:
SELECT DATABASEPROPERTYEX( 'CollateTestDB' , 'Collation' ) AS Database_Default_Collation
/*
Database_Default_Collation
----------------------------
SQL_Latin1_General_CP850_BIN
*/
GO

-- Default table creation behavior
CREATE TABLE Tbl1 (
Col1 char( 10 ) NOT NULL ,
Col2 varchar( 10 ) NULL
);
INSERT INTO Tbl1 VALUES( 'a' , 'b' );
INSERT INTO Tbl1 VALUES( 'A' , 'B' );
GO
/* Column properties from "EXEC sp_help 'Tbl1';" output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP850_BIN
Col2        varchar     10 SQL_Latin1_General_CP850_BIN
*/
GO
-- To verify binary sort order / collation
SELECT * FROM Tbl1 WHERE Col1 = 'a' And Col2 = 'b';
-- Expected output:
/*
Col1       Col2     
---------- ----------
a          b
*/
GO
-- To force a case-sensitive check
SELECT * FROM Tbl1
WHERE Col1 COLLATE SQL_Latin1_General_CP850_CI_AI = 'a' And
      Col2 COLLATE SQL_Latin1_General_CP850_CI_AI = 'b';
-- Expected output:
/*
Col1       Col2     
---------- ----------
a          b
A          B
*/

-- Create a table with columns having case-sensitive collation
CREATE TABLE Tbl2 (
Col1 char( 10 ) COLLATE SQL_Latin1_General_CP850_CS_AS NOT NULL ,
Col2 varchar( 10 ) COLLATE SQL_Latin1_General_CP850_CI_AI NULL
);
INSERT INTO Tbl2 VALUES( 'a' , 'b' );
INSERT INTO Tbl2 VALUES( 'A' , 'B' );
GO
/* Column properties from "EXEC sp_help 'Tbl2';" output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP850_CS_AS
Col2        varchar     10 SQL_Latin1_General_CP850_CI_AI
*/
GO

-- Some sample SELECT statements:
SELECT * FROM Tbl2 WHERE Col1 = 'a';
/*
Col1       Col2      
---------- ----------
a          b
*/
SELECT * FROM Tbl2 WHERE Col2 = 'b';
/*
Col1       Col2      
---------- ----------
a          b
A          B
*/

-- Now let's take this problem a step further & create a temporary table
-- using Tbl1 by performing a SELECT INTO. Notice that the columns are
-- created with the collation of the columns of Tbl1.
SELECT * INTO #Tbl1 FROM Tbl1;
EXEC( 'Use Tempdb EXEC sp_help ''#Tbl1''');
DROP TABLE #Tbl1;
/* Column properties from output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP850_BIN
Col2        varchar     10 SQL_Latin1_General_CP850_BIN
*/
Go


-- Similarly , to create a table using SELECT INTO with a different
-- collation for the columns, do:
SELECT Col1 COLLATE SQL_Latin1_General_CP437_CI_AS AS Col1,
       Col2 COLLATE SQL_Latin1_General_CP437_CI_AS AS Col2
INTO #Tbl1 FROM Tbl1;
EXEC( 'Use Tempdb EXEC sp_help ''#Tbl1''');
DROP TABLE #Tbl1;
/* Column properties from sp_help output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP437_CI_AS
Col2        varchar     10 SQL_Latin1_General_CP437_CI_AS
*/
Go

-- Now do the same by creating a temporary table using CREATE TABLE statement.
-- The created table will use the default of tempdb & not the current user db.
CREATE TABLE #Tbl1 (
Col1 char( 10 ) NOT NULL ,
Col2 varchar( 10 ) NULL
);
EXEC( 'Use Tempdb EXEC sp_help ''#Tbl1''');
DROP TABLE #Tbl1;
/* Column properties from output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP437_CI_AS
Col2        varchar     10 SQL_Latin1_General_CP437_CI_AS
*/
Go

/*
        To force the creation of the temporary table with the current user database
        collation default, use the DATABASE_DEFAULT clause with COLLATION like:
*/
CREATE TABLE #Tbl1 (
Col1 char( 10 ) COLLATE database_default NOT NULL ,
Col2 varchar( 10 ) COLLATE database_default NULL
);
EXEC( 'Use Tempdb EXEC sp_help ''#Tbl1''');
DROP TABLE #Tbl1;
/* Column properties from output is shown below:

Column_name Type    Length Collation
----------- ------- ------ ----------------------------
Col1        char        10 SQL_Latin1_General_CP850_BIN
Col2        varchar     10 SQL_Latin1_General_CP850_BIN
*/
Go

-- Now we are all set, drop the test database
USE master;
DROP DATABASE CollateTestDB;
 
### 三级标题:MySQL UTF8 Deprecation Warning and Migration Guide MySQL has been evolving over the years, and one of the significant changes in recent versions is the deprecation of the `utf8` character set, which historically only supported a maximum of three bytes per codepoint. This limitation meant that characters outside the Basic Multilingual Plane (BMP), such as emojis or certain rare characters, could not be stored properly. Starting from MySQL 5.5.2 and onwards, the `utf8mb4` character set was introduced to support four-byte characters, aligning MySQL with the full UTF-8 standard. The deprecation warning related to the `utf8` character set indicates that future versions of MySQL will change the default behavior of `utf8` from `utf8mb3` (three-byte characters) to `utf8mb4` (four-byte characters). This transition is part of MySQL's effort to provide full support for the UTF-8 standard. Users currently using the `utf8` character set should consider upgrading their databases to `utf8mb4` to avoid potential issues when the default behavior changes. To migrate from `utf8mb3` to `utf8mb4`, follow these steps: 1. **Check Current Character Set and Collation Settings**: Before proceeding with the migration, verify the current character set and collation settings for your database, tables, and columns. You can use the following SQL queries to check these settings: ```sql -- Check database character set and collation SELECT schema_name, default_character_set_name, default_collation_name FROM information_schema.schemata; -- Check table character set and collation SELECT table_schema, table_name, table_collation FROM information_schema.tables; -- Check column character set and collation SELECT table_schema, table_name, column_name, character_set_name, collation_name FROM information_schema.columns; ``` 2. **Upgrade MySQL Server**: Ensure that your MySQL server is upgraded to version 5.5.2 or later, as `utf8mb4` support is only available from this version onwards. 3. **Modify Database, Tables, and Columns**: Alter the database, tables, and columns to use `utf8mb4` and a compatible collation, such as `utf8mb4_unicode_ci`. Here are some example SQL statements: ```sql -- Alter database ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; -- Alter table ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; -- Alter specific column ALTER TABLE your_table MODIFY your_column VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ``` 4. **Update Connection Settings**: Make sure that all connections to the MySQL server are configured to use `utf8mb4`. For example, in applications connecting via JDBC, ODBC, or other drivers, update the connection strings to specify `utf8mb4`. 5. **Test the Migration**: After completing the migration, thoroughly test the application to ensure that it works correctly with the new character set. Pay special attention to any parts of the application that handle user input, especially those that might include emojis or other four-byte characters. 6. **Monitor for Issues**: After deployment, monitor the application and database for any issues related to character encoding. If any problems arise, you may need to revisit the migration steps or adjust the collation settings. By following these steps, you can ensure that your MySQL database is prepared for the future changes in the `utf8` character set and continues to support a wide range of characters without issues. ### 三级标题:Related Considerations When migrating from `utf8mb3` to `utf8mb4`, keep in mind that the storage requirements for `utf8mb4` are slightly higher than for `utf8mb3`, as it allows for four-byte characters. This increase in storage size may affect performance, especially for large tables with many text-based columns. To mitigate this, consider using appropriate indexing strategies and optimizing your database schema. Additionally, if you are using tools like the MySQL Migration Toolkit, ensure that they are up to date and capable of handling migrations involving `utf8mb4`. The MySQL Migration Toolkit supports migrations from various databases, including Oracle, Microsoft SQL Server, Microsoft Access, Sybase, and MaxDB to MySQL, and it should be able to handle character set conversions during the migration process [^2]. Finally, if you encounter errors such as "Illegal mix of collations" or "Unknown collation" during the migration, these are typically related to mismatched character sets or collations between different parts of the database or between the application and the database. Resolving these issues often involves ensuring that all components are consistently using `utf8mb4` and a compatible collation [^3]. ### 三级标题:Code Example for Migration Here is an example of how you might modify a database and table to use `utf8mb4`: ```sql -- Change the default character set and collation for the database ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; -- Convert an existing table to use utf8mb4 and a compatible collation ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ``` ### 三级标题:Entity Framework Core Example If you are using Entity Framework Core and need to apply migrations after changing the character set, you can use the following commands in the Package Manager Console: ```powershell Add-Migration v1_0_0 -Context DefaultDbContext Update-Database v1_0_0 -Context DefaultDbContext ``` These commands will create and apply a migration to update the database schema to use `utf8mb4` [^4]. ### 三级标题:Best Practices - Always back up your database before performing any migration or alteration of character sets. - Test the migration in a development or staging environment before applying it to production. - Monitor performance after the migration to ensure that there are no adverse effects due to the increased storage requirements of `utf8mb4`. ### 三级标题:Conclusion Migrating from `utf8mb3` to `utf8mb4` is essential for ensuring full UTF-8 support in MySQL, especially for applications that need to handle modern character sets, including emojis and other four-byte characters. By following the steps outlined above, you can smoothly transition your database to use `utf8mb4` and avoid potential issues when the default behavior of `utf8` changes in future MySQL versions.
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值