SQL NOT IN constraint and NULL values

探讨了 SQL 中 NOT IN 语句在包含 NULL 值时的行为差异,解析了不同情况下查询返回结果的原因。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.

To state it simply, why does query A return a result but B doesn't?

A: select 'true' where 3 in (1, 2, 3, null) 
B: select 'true' where 3 not in (1, 2, null) 

This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.

link | improve this question

 
 
feedback

11 Answers

up vote 50 down vote accepted

Query A is the same as:

select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null 

Since 3 = 3 is true, you get a result.

Query B is the same as:

select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null 

When ansi_nulls is on, 3 <> null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.

When ansi_nulls is off, 3 <> null is true, so the predicate evaluates to true, and you get a row.

link | improve this answer
 
7 
one correction: it is actually 3 value logic, so 3 <> NULL is not FALSE but UNKNOWN if it was false the query NOT (3 <> null) would evaluate to True but it is not - it is still UNKNOWN. You can test it by calling select 'true' where 3 in (null) select 'true' where 3 not in(null) -both give no result –  kristof Sep 25 '08 at 9:25
 
@kristof: Yes, you're right. I will correct my answer. –  Brannon Sep 25 '08 at 17:05
1 
Has anybody ever pointed out that converting NOT IN to a series of <> and changes the semantic behavior of not in this set to something else? –  Ian Boyd Oct 13 '10 at 14:47
1 
@Ian - It looks like "A NOT IN ( 'X', 'Y' )" actually is an alias for A <> 'X' AND A <> 'Y' in SQL. (I see that you discovered this yourself in stackoverflow.com/questions/3924694/…, but wanted to make sure your objection was addressed in this question.) –  Ryan Olson Dec 27 '10 at 19:21
feedback

Whenever you use NULL you are really dealing with a Three-Valued logic.

Your first query returns results as the WHERE clause evaluates to:

    3 = 1 or 3 = 2 or 3 = 3 or 3 = null 
which is: 
    FALSE or FALSE or TRUE or UNKNOWN 
which evaluates to  
    TRUE 

The second one:

    3 <> 1 and 3 <> 2 and 3 <> null 
which evaluates to: 
    TRUE and TRUE and UNKNOWN 
which evaluates to: 
    UNKNOWN 

The UNKNOWN is not the same as FALSE you can easily test it by calling:

select 'true' where 3 <> null 
select 'true' where not (3 <> null) 

Both queries will give you no results

If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.

There is a very good article on this subject on SqlServerCentral

The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL

Some other read I would recommend is SQL Aggregate Functions and NULL

Hope that helps,

link | improve this answer
 
 
feedback

In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)

In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.

link | improve this answer
 
 
feedback

Compare to null is undefined, unless you use IS NULL.

So, when comparing 3 to NULL (query A), it returns undefined.

I.e. SELECT 'true' where 3 in (1,2,null) and SELECT 'true' where 3 not in (1,2,null)

will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE

link | improve this answer
 
 
feedback

Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.

Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned

In the second however you can reduce it to

select 'true' where 3 not in (null)

So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.

link | improve this answer
 
 
feedback

since null is an unknown a not in query containing a null in the list of possible values will always return 0 records since there is no way to be sure that the null value is not the value being tested.

link | improve this answer
 
 
feedback

The title of this question at the time of writing is

SQL NOT IN constraint and NULL values

From the text of the question it appears that the problem was occurring in a SQL DML SELECT query, rather than a SQL DDL CONSTRAINT.

However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)

When the predicate evaluates to UNKNOWN you don't get any rows.

Although this is the case for SQL DML, when considering constraints the effect is different.

Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by @Brannon):

DECLARE @T TABLE  
( 
 true CHAR(4) DEFAULT 'true' NOT NULL,  
 CHECK (3 NOT IN (1, 2, NULL)),  
 CHECK (3 IN (1, 2, NULL)) 
); 
 
INSERT INTO @T VALUES ('true'); 
 
SELECT COUNT(*) AS tally FROM @T; 

As per @Brannon's answer, the first constraint (using IN) evaluates to TRUE and the second constraint (using NOT IN) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result!

The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec

7.6 where clause

The result of the is a table of those rows of T for which the result of the search condition is true.

4.10 Integrity constraints

A table check constraint is satisfied if and only if the specified search condition is not false for any row of a table.

In other words:

In SQL DML, rows are removed from the result when the WHERE evaluates to UNKNOWN because it does not satisfy the condition "is true".

In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".

Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)

p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!

link | improve this answer
 
I honestly did not think there was anything left to say on this subject. Interesting. –  Jamie Ide Sep 24 '11 at 19:54
 
@Jamie Ide: Actually, I have another answer on the subject: because NOT IN (subquery) involving nulls can give unexpected results, it's tempting to avoid IN (subquery) completely and always use NOT EXISTS (subquery) (as I once did!) because it seems that it always handles nulls correctly. However, there are cases where NOT IN (subquery) gives the expected result whereas NOT EXISTS (subquery) gives unexpected results! I may get around to writing this up yet if I can find my notes on the subject (need notes because it's non-intuitive!) The conclusion is the same, though: avoid nulls! –  onedaywhen Sep 26 '11 at 7:18
feedback

also this might be of use to know the logical difference between join, exists and in http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

link | improve this answer
 
 
feedback

this is for Boy:

select party_code from abc as a where party_code not in ( select party_code from xyz where party_code = a.party_code );

this works regardless of ansi settings

link | improve this answer
 
for the original question: B: select 'true' where 3 not in (1, 2, null) a way to remove nulls must be done e.g. select 'true' where 3 not in (1, 2, isnull(null,0)) the overall logic is, if NULL is the cause, then find a way to remove NULL values at some step in the query. –  Rostand Abear Jun 23 '09 at 11:20
 
select party_code from abc as a where party_code not in ( select party_code from xyz where party_code is not null) but good luck if you forgot the field allows nulls, which is often the case –  Rostand Abear Jun 24 '09 at 6:42
feedback

It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, Semptember 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.

Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:

      VALUES ('S1', 'P1', NULL),  
             ('S2', 'P1', 200), 
             ('S3', 'P1', 1000) 

Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.

The task is to find the supplier numbers of suppliers who are known supply part number 'P1' but not in quantities of 1000.

The following uses NOT IN to correctly identify supplier 'S2' only:

WITH sp 
     AS  
     ( 
      SELECT *  
        FROM ( 
              VALUES ('S1', 'P1', NULL),  
                     ('S2', 'P1', 200), 
                     ('S3', 'P1', 1000) 
             ) AS T (sno, pno, qty) 
     ) 
SELECT DISTINCT spx.sno 
  FROM sp spx 
 WHERE spy.pno = 'P1' 
       AND 1000 NOT IN ( 
                        SELECT spy.qty 
                          FROM sp spy 
                         WHERE spy.sno = spx.sno 
                               AND spy.pno = 'P1' 
                       ); 

However, the below query uses the same general structure with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):

WITH sp 
     AS  
     ( 
      SELECT *  
        FROM ( 
              VALUES ('S1', 'P1', NULL),  
                     ('S2', 'P1', 200), 
                     ('S3', 'P1', 1000) 
             ) AS T (sno, pno, qty) 
     ) 
SELECT DISTINCT spx.sno 
  FROM sp spx 
 WHERE spx.pno = 'P1' 
       AND NOT EXISTS ( 
                       SELECT * 
                         FROM sp spy 
                        WHERE spy.sno = spx.sno 
                              AND spy.pno = 'P1' 
                              AND spy.qty = 1000 
                      ); 

So NOT EXISTS is not the silver bullet it may have appeared!

Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminated those nulls.

This can be achieved using two tables: one sp to model suppliers known to supply parts and another spq to model suppliers known to supply parts in known quantites (noting there could be a foreign key constraint for the second table to reference the first). The result can then be obtained using the minus relational operator (the EXCEPT keyword in Standard SQL) e.g.

WITH sp 
     AS  
     ( 
      SELECT *  
        FROM ( 
              VALUES ('S1', 'P1'),  
                     ('S2', 'P1'), 
                     ('S3', 'P1') 
             ) AS T (sno, pno) 
     ),  
     spq 
     AS  
     ( 
      SELECT *  
        FROM ( 
              VALUES ('S2', 'P1', 200), 
                     ('S3', 'P1', 1000) 
             ) AS T (sno, pno, qty) 
     ) 
SELECT sno 
  FROM spq 
 WHERE pno = 'P1' 
EXCEPT  
SELECT sno 
  FROM spq 
 WHERE pno = 'P1' 
       AND qty = 1000; 
link | improve this answer
 
 
feedback

Any compare against NULL is always FALSE. You can use IS NULL or IS NOT NULL only.

link | improve this answer
 
1 
Not true. Any compare against null is null. Here's some fun reading. en.wikipedia.org/wiki/Null_(SQL)#Boolean_datatype_inconsistency –  David B Sep 25 '08 at 2:55
 
@David B, Digi: you are both wrong. Comparison operators like =, <>, <, > and IN return UNKNOWN when values are compared to null. They don't return null. UNKNOWN is a truth value and is not the same thing as null, which is a marker for the absence of a value. Null does not mean the same as UNKNOWN. –  sqlvogel Aug 28 '11 at 18:13
feedback

Your Answer

 
draft saved
draft discarded
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.

lang-sql
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值