处理:SqlServer FOR XML 无法对节点 'NoName' 的数据进行序列化,因为它包含在 XML 中不允许出现的字符(0x0000)。

DECLARE @table TABLE(
ID INT
,[DESC] VARCHAR(1000)
)


INSERT INTO @table(ID, [DESC])
VALUES(1,'this is test <中文'+CHAR(0)+'测试>')


SELECT * FROM @table


--for xml这种写法可以将表中的一列数据串成一个字符串
--并且在遇到xml保留字符时也不会出错,如:0x0000
--但是有一个确定就是,如果列值中含有xml的保留字,如:<,>这些不会直接输出,而是转换为代码
SELECT CAST(a.ID AS VARCHAR(10))+a.[Desc]
FROM @table a
FOR XML PATH('')


--for xml这种写法,是利用type指令,告诉sqlserver,形成的是一个xml对象
--这时,利用xml的value方法就可以获取到当前元素的值,并且一些xml的保留字符这里不会被转换为代码
--但是这个写法有一个不好的地方就是,如果在列值中含有一些xml的控制符,则运行时会报错
--错误提示如下:
--FOR XML 无法对节点 'NoName' 的数据进行序列化,因为它包含在 XML 中不允许出现的字符(0x0000)。
--若要使用 FOR XML 检索此数据,请将它转换为 binary、varbinary 或 image 数据类型,然后使用 BINARY BASE64 指令。
--所以我们这里自己写了一个函数Udf_RemoveInvalidXmlChar用于将xml控制字符从列值中删除。
--SELECT 
--(
-- SELECT  cAST(a.ID AS VARCHAR(10))+a.[Desc]
-- FROM @table a
-- FOR XML PATH(''),TYPE 
--).value('.','varchar(max)')


SELECT 
(
SELECT  artdb.dbo.Udf_RemoveInvalidXmlChar( CAST(a.ID AS VARCHAR(10))+a.[Desc])
FROM @table a
FOR XML PATH(''),TYPE 
).value('.','varchar(max)')


--删除列值中xml控制符的函数实现如下:
/*
CREATE FUNCTION [dbo].[Udf_RemoveInvalidXmlChar]
(
@aString VARCHAR(1000)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @result VARCHAR(1000)=''
DECLARE @i INT
SET @i = 1


WHILE @i <= LEN(@aString)
BEGIN
IF UNICODE(SUBSTRING(@aString, @i, 1)) <> 0 --去掉0x0000
BEGIN
SET @result=@result+SUBSTRING(@aString,@i,1)
END
SET @i = @i + 1
END

RETURN @result
END
 */


综上所述:

如果出现标题所示错误时:

1、不要在for xml时加上type指令,当然一些xml关键字可能会被以代码形式显示;

2、for xml使用type指令,但是在for xml之前,将列值中的xml控制字符删除;


参考文章:

1、http://dba.stackexchange.com/questions/82094/cannot-remove-char-0x0000-from-nvarchar-column

A bug in the REPLACE function? Or in SQL Server in general? I'm not so sure about that. The only "issue" here is merely not fully understanding how string comparisons are handled.

Collations define how certain characters will compare against other characters. Sometimes there are rules about certain combinations of characters equating to one or more other characters. And there are rules about characters such as 0x00 (null) and 0x20 (space) equating to each other or other characters. And, to make life more interesting, there are a few nuances specific to VARCHAR data using a SQL Server Collation (i.e. one starting with SQL_), as the following example shows:

SELECT REPLACE('VARCHAR with SQL_Latin1_General_CP1_CI_AS'+CHAR(0)+'Matches', CHAR(0),
               ': ' COLLATE SQL_Latin1_General_CP1_CI_AS);
SELECT REPLACE(N'NVARCHAR with SQL_Latin1_General_CP1_CI_AS'+NCHAR(0)+N'Matches', NCHAR(0),
               N': ' COLLATE SQL_Latin1_General_CP1_CI_AS);
SELECT REPLACE('VARCHAR with Latin1_General_100_CI_AS'+CHAR(0)+'Matches', CHAR(0),
               ': ' COLLATE Latin1_General_100_CI_AS);
SELECT REPLACE(N'NVARCHAR with Latin1_General_100_CI_AS'+NCHAR(0)+N'Matches', NCHAR(0),
               N': ' COLLATE Latin1_General_100_CI_AS);

Returns:

VARCHAR with SQL_Latin1_General_CP1_CI_AS: Matches

NVARCHAR with SQL_Latin1_General_CP1_CI_AS

VARCHAR with Latin1_General_100_CI_AS

NVARCHAR with Latin1_General_100_CI_AS

So let's take a look at this behavior using queries based on the ones found in @Max's answer. I added the N prefix to the string literals and to the CHAR(0), and I also added an extra NCHAR(0) just to make the next part easier to see. And I added queries to show the actual Code Points being used (to prove that the 0x0000 values are really in there, and a call to REPLACE() to see if that really has a bug in it).

DECLARE @Data NVARCHAR(255);
SELECT @Data = N'this is' + NCHAR(0) + N'a test' + NCHAR(0) + N'of null';

SELECT @Data;
SELECT CONVERT(VARBINARY(50), @Data);
SELECT REPLACE(@Data, NCHAR(0), N'~');

this is

0x7400680069007300200069007300000061002000740065007300740000006F00660020006E0075006C006C00

this is

The first result shows that the string ends just after the "is" due to (null) termination. The second result shows the underlying codes, and I emphasized the two instances of the 0x0000 character. The third result shows that the REPLACE function doesn't seem to match the 0x0000 character to the NCHAR(0) that is passed in.

But should we be expecting that NCHAR(0) would be matching here? We can effectively disable all of the equivalence rules that are usually applied to string comparisons by forcing a binary collation. We will use a _BIN2 collation since the _BIN collations are deprecated and shouldn't be used unless you have a specific need for them.

Add the following query to the set above and re-run the batch.

SELECT REPLACE(@Data, NCHAR(0) COLLATE Latin1_General_100_BIN2, N'~');

You should get the following additional result:

this is~a test~of null

So the REPLACE function does in fact work, and this was tested on both SQL Server 2008 R2, SP3 as well as SQL Server 2012 SP2.


Ok, so that only addressed the issue of REPLACE not working with NCHAR(0), but did not address NCHAR(0) equating to a space (i.e. NCHAR(32) or NCHAR(0x20)).

Now we will use an adaptation of the main query from @Max's answer. I again added an extra NCHAR(0) to the test string (really just replaced the space at position 8 with it), and I added the Code Point of the matching character to the RAISERROR message.

SET NOCOUNT ON;
GO
DECLARE @Data NVARCHAR(255);
SELECT @Data = N'this is' + NCHAR(0) + N'a test' + NCHAR(0) + N'of null';

DECLARE @i INT,
        @CodePoint INT;
SET @i = 1;

WHILE @i < LEN(@Data)
BEGIN
    IF SUBSTRING(@Data, @i, 1) = NCHAR(0) --COLLATE Latin1_General_100_BIN2
    BEGIN
        SET @CodePoint = UNICODE(SUBSTRING(@Data, @i, 1));
        RAISERROR (N'Found a NULL char (Code Point = %d) at position: %d',
                   10, 1, @CodePoint, @i) WITH NOWAIT;
    END;
    SET @i = @i + 1;
END;

This query (with the COLLATE clause still commented out) will return:

Found a NULL char (Code Point = 32) at position: 5
Found a NULL char (Code Point = 0) at position: 8
Found a NULL char (Code Point = 32) at position: 10
Found a NULL char (Code Point = 0) at position: 15
Found a NULL char (Code Point = 32) at position: 18

These are the same positions reported in @Max's test, but now it shows what Code Point it is matching in each case. And yes, it is equating to both 32 and 0.

Now, uncomment the COLLATE clause and re-run it. It will return:

Found a NULL char (Code Point = 0) at position: 8
Found a NULL char (Code Point = 0) at position: 15

Another way to accomplish this, and without using the COLLATE clause, is to change the IFstatement to be:

IF ( UNICODE(SUBSTRING(@Data, @i, 1)) = 0 )

Of course, neither of these two fixes -- the WHILE loop using either the COLLATE clause or the UNICODE() function -- are needed to address the original problem of removing the 0x0000 characters from the input data since the simple REPLACE (using the COLLATE clause) handles that.


Summary:

  • If you are wanting to replace / remove characters from a string, there is no need to use a loop. The REPLACE function works just fine as long as you specify a _BIN2 collation for at least one of the 3 input parameters via the COLLATE keyword (and technically it doesn't matter which binary collation since binary collations only compare the numeric Code Point values).
  • If needing to test for a particular Code Point (such as in a loop as shown above), then it is probably fastest to use the UNICODE() function since that just reports what values are actually there. This should be faster than using the COLLATE keyword since that has to do more work.
  • If needing to test for a series of Code Points / characters, then use the COLLATE keyword, specifying a _BIN2 binary collation.
2、 http://sqlblog.com/blogs/rob_farley/archive/2010/04/15/handling-special-characters-with-for-xml-path.aspx

Rob Farley - Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and provides consulting and training courses around the world in SQL Server and BI topics.

Handling special characters with FOR XML PATH('')

Because I hate seeing &gt; or &amp; in my results…

Since SQL Server 2005, we’ve been able to use FOR XML PATH('') to do string concatenation. I’ve blogged about it before several times. But I don’t think I’ve blogged about the fact that it all goes a bit wrong if you have special characters in the strings you’re concatenating.

Generally, I don’t even worry about this. I should, but I don’t, particularly when the solution is so easy.

Suppose I want to concatenate the list of user databases on my system, in alphabetical order. Easy: Note the lack of

select 
  stuff( 
     (select ', ' + name -- Note the lack of column name 
     from sys.databases 
     where database_id > 4 
     order by name 
     for xml path('') 
     ) 
   , 1, 2, '') as namelist;

This is easy, and it just works. The STUFF command lets me strip the leading comma and space, and my list of database names is done.

But if I wanted to list the names of database with triangular brackets around them… then that’s bit more complicated.

select 
  stuff( 
     (select ', <' + name + '>' 
     from sys.databases 
     where database_id > 4 
     order by name 
     for xml path('') 
     ) 
   , 1, 2, '') as namelist;

It still runs, but I my results don’t show the triangular brackets, it shows &lt;databasename&gt;, &lt;databasename2&gt;. It’s not what I want to see. I effectively need to HTMLDecode the results – but T-SQL doesn’t have a function like that.

However, if I actually make my FOR XML call return actual well-formed XML, then I can extract the contents out, and it returns the data to me in its original (correct) form.

select 
  stuff( 
     (select ', <' + name + '>' 
     from sys.databases 
     where database_id > 4 
     order by name 
     for xml path(''), root('MyString'), type 
     ).value('/MyString[1]','varchar(max)') 
   , 1, 2, '') as namelist;

To be valid XML, it needs a ROOT element. I also need to tell it to be XML, using the TYPE directive.

But this lets me hook into the value of /MyString[1], and return that as varchar(max).

And it works, my data comes back as <databasename>, <databasename2>, etc.

It’s a habit I need to use more often.

Edit: I can also skip the ROOT element (but keep TYPE) and use .value('.','varchar(max)') - but for some reason it's always felt strange to me to be returning something which I declare to be XML (using TYPE) without it being well-formed.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值