question

Peter-Jones-BIDA avatar image
0 Votes"
Peter-Jones-BIDA asked Peter-Jones-BIDA action

Strange Behaviour in Null field = Not Null field evaluation

Hi All,

Today I saw something that I really didn't believe until I tested it and proved it to myself on both SQL Server 2019 SE and SQL Server 2016 developer edition.

Consider the SQL at the bottom of the post.

The datetime field is set on the zxt_dnn_sitelog_src view but it is null in the zxt_dnn_sitelog_m1 view.

This statement evaluates to 'notequal'.

 ,( case when (zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) then 'equal' else 'notequal' end ) testval

However, the following constraint does not catch this condition.


 not ((zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) or ( zxt_dnn_sitelog_src.datetime is null and zxt_dnn_sitelog_m1.datetime is null))

I had to include this extra condition to detect this situation.

 or     ((zxt_dnn_sitelog_src.datetime is null and zxt_dnn_sitelog_m1.datetime is not null) or ( zxt_dnn_sitelog_src.datetime is not null and zxt_dnn_sitelog_m1.datetime is null))


This was tested on 3 machines with two running SQL Server 2019 SE and one running SQL Server 2016 developer edition.

I am very surprised that this is the case. Can anyone please explain why the first condition does not expose the record on SQL Server?

Clearly the two columns are NOT equal. One has a date in it and one is null.

This code is generated delta detection code and the tool that generates it was developed back in 2008 on Netezza.

I am really interested in understanding how the two fields which are not equal, and SQL server knows they are not equal, do not evaluate properly in the where clause.

Thanks

Peter




 select
  zxt_dnn_sitelog_src.pk_ss_number
 ,zxt_dnn_sitelog_src.pk_sitelogid
 ,zxt_dnn_sitelog_src.datetime
 ,zxt_dnn_sitelog_m1.datetime
 ,( case when (zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) then 'equal' else 'notequal' end ) testval
 ,zxt_dnn_sitelog_src.dk_portalid
 ,zxt_dnn_sitelog_src.dk_userid
 ,zxt_dnn_sitelog_src.referrer
 --,zxt_dnn_sitelog_src.url
 ,zxt_dnn_sitelog_src.useragent
 ,zxt_dnn_sitelog_src.userhostaddress
 ,zxt_dnn_sitelog_src.userhostname
 ,zxt_dnn_sitelog_src.dk_tabid
 ,zxt_dnn_sitelog_src.dk_affiliateid
 ,2
     
 from ibihs_c003_stg.dbo.zxt_dnn_sitelog                                         zxt_dnn_sitelog_src
      ,ibihs_c003_stg.dbo.zxt_dnn_sitelog_m9                                     zxt_dnn_sitelog_m1
     
 where 1=1
 and zxt_dnn_sitelog_src.pk_ss_number                                            = zxt_dnn_sitelog_m1.pk_ss_number
 and zxt_dnn_sitelog_src.pk_sitelogid                                            = zxt_dnn_sitelog_m1.pk_sitelogid
 and zxt_dnn_sitelog_src.pk_sitelogid = 1172189 
 and 
 ( 
    not ((zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) or ( zxt_dnn_sitelog_src.datetime is null and zxt_dnn_sitelog_m1.datetime is null))
 or     ((zxt_dnn_sitelog_src.datetime is null and zxt_dnn_sitelog_m1.datetime is not null) or ( zxt_dnn_sitelog_src.datetime is not null and zxt_dnn_sitelog_m1.datetime is null))
 ) 
 ; 





sql-server-transact-sql
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GuoxiongYuan-7218 avatar image
1 Vote"
GuoxiongYuan-7218 answered Peter-Nolan-IBI published

You can not use a equal sign (=) to compare values between two fields if one or two fields may have NULL value. Try this:

 DECLARE @s1 varchar(10) = NULL;
 DECLARE @s2 varchar(10) = NULL;
    
 -- Incorrect: Return nothing
 IF @s1 = @s2 
 BEGIN
  SELECT 1;
 END
    
 -- Correct: Return 1
 IF ISNULL(@s1, '') = ISNULL(@s2, '') 
 BEGIN
  SELECT 1;
 END
    
 -- Correct: Return 1
 IF @s1 IS NULL AND @s2 IS NULL
 BEGIN
  SELECT 1;
 END

From w3schools, It is not possible to test for NULL values with comparison operators, such as =, <, or <>.


· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Guoxiong ,

Strange. I have never heard of such a restriction and I have been dealing with relational databases nearly 40 years.

I tested this statement and it works.

So the equality test works when done as a case statement. It is likely defaulting to 0 for the else.

I am going to test on some other databases and see what they do.

Best Regards

Peter

  or not ((1 = ( case when (zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) then 1 else 0 end )) or ( zxt_dnn_sitelog_src.datetime is null and zxt_dnn_sitelog_m1.datetime is null))


0 Votes 0 ·

Odd that you have not discovered in 40 years that NULL means unknown and you cannot logically compare any unknown value to anything. In your example, the first comparison returns UNKNOWN if either column is null so the ELSE part of the case expression is returned. FWIW it is WAY past time to stop using old-style joins.


0 Votes 0 ·

Hi SM,

it's because other databases resolve the query correctly.

I had a friend test on DB2 UDB and it resolves the query properly.

So I am testing on some other databases to see what it does on those databases.

With respect to NULLS?

Yes I am well aware of null arithmetic.

Most databases do not allow a comparison operator to resolve to "unknown".

They resolve to true or false.

When in doubt they resolve to false.

On the old join style?

Sure, there is a new join style. So?

The generated code comes from an ETL tool a pal of mine wrote.

The ETL software is free and open source so anyone can update it.

http://www.instantbi.com/freebies/downloads/

In other areas of the generated code it uses the new join functions for such things as left joins.

The much more important feature is this ETL tool cuts ETL development time by 50% or more.

Even if you go into production with Informatica/SSIS etc.

I wish you a good day!

0 Votes 0 ·
Show more comments

Hi SM,
I am the guy Peter Jones is talking about.

The software he is using is a version of my software. There are many people using versions of my software now.

And yes, he is correct. I had not seen that behaviour in databases either and I WROTE a relational database back in 1985! LOL!

Back then the people writing RDBMSs had the discussion how (null field = not null field) should evaluate since we could not call it "null" which is what relational theory says it should be. There is a difference between "theory" and "the real world".

So yes, I was also surprised to see this behaviour. Ok?

Best Regards

Peter

0 Votes 0 ·
Viorel-1 avatar image
0 Votes"
Viorel-1 answered Peter-Jones-BIDA commented

Try one more known method:

 . . .
 and not exists ( select zxt_dnn_sitelog_src.datetime intersect select zxt_dnn_sitelog_m1.datetime )
 ;


· 8
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Viorel,
this sort of thing is not suitable because this code is for delta detection large tables with millions of rows and sometimes up to 50-60 columns.

That is a lot of fields to test for changes so it needs to be efficient.

Best Regards

Peter

0 Votes 0 ·

The solution with INTERSECT is much more compact than the alternatives you had where you compared explicitly for IS NULL. Wrapping with isnull is precarious, because you need to find some magic value to map the NUJLL to. Furthermore, isnull kills any index. That does not happen with INTERSECT - the optimizer is able to handle this well, it has a special operator which is not exposed in the languages.

(The ANSI standard has the operator IS [NOT] DISTINCT FROM for comparisons which covers NULL. But INTERSECT works well in SQL Server.)

0 Votes 0 ·

Hi Erland,
the code that is being generated is to perform delta detection on sizable tables.

So there could be 100+ fields being checked to see if any of them changed.

So the way most databases will resolve that is to select the smaller table and then compare row for row, field for field, to see if any field has changed.

Mostly what is being done is that the updated / inserted records from the operational system are sent into the delta detection processing and compared with the prior version to detect inserts and updates.

Those inserts and updates are then forwarded to the staging area.

Where we allow for delete detection we usually have a time restriction on the period over which deletes will be detected so that we do not scan years of data where no deletes will be done.

The code has to be able to run on any data warehouse database, not just sql server. We happen to be using SQL server for our work on this client.

These responses are short so I will do another one....

0 Votes 0 ·

Hi Erland,

in our case we have built a staging area for a Navision client and we maintain their data warehouse on SQL Server.

We used this tool build the delta detection for the staging area for Navision in just 4 days. (quote below)

Navision has a timestamp field we used for comparison for delta detection so we never saw this problem until we worked on another set of data that had nulls in fields to be checked. It was an "end date" that went from null to not null that we found the problem with.

I mean, how often does it happen that just one field on a row changes from null to not null and no other field changes?

We are actually developing a new BI solution for Navision. It's an interesting project.

"We were able to build a completely new instance of the staging area in just 4 working days. This was 1,867 tables and 35,591 fields from a Navision 2009 instance."

www.businessintelligence-da.com/2019/07/28/bida0007-navision-staging-area-construction-details/

0 Votes 0 ·

this sort of thing is not suitable because this code is for delta detection large tables with millions of rows and sometimes up to 50-60 columns.

If you must compare 50-60 columns, not just one datetime, then you can try a single expression like this:

 not exists (select a1, a2, a3, … a50 intersect select b1, b2, b3, … b50)

which seems simpler than writing 50-60 expressions using your current approach, based on ‘or’ which is often not recommended.

0 Votes 0 ·

Using INTERSECT is absolutely the most efficient way to make these comparisons, both in terms of performance and conciseness of code, on SQL Server.

But since Peter is targeting many different platforms, I am not sure how good bet this is. Not all RBDMSs may implement the INTERSECT operator. And how it is actually implemented can be different from product to product.

Then again, Peter needs to handle NULL correctly with the three-valued logic of SQL. And, yes, if you compare many columns enough, there will be changes from NULL to non-NULL or vice versa.

I don't have access to DB2, but I ran this on Postgres:

CREATE TABLE duo (a int NULL, b int NULL);
INSERT INTO duo(a, b) VALUES(1, NULL);
SELECT * FROM duo WHERE a = b;
SELECT * FROM duo WHERE NOT (a = b);
DROP TABLE duo;


None of the SELECT returned any rows. Nor should they.

0 Votes 0 ·
Show more comments

Hi Viorel, I will take a closer look at how that statement will run on SQL server and other databases.

I am not sure how databases resolve such an intersect statement internally.

By that I mean I do not know the sequence of how the optimiser will optimise the statements.

In the end this is code generated by a tool supplied to me so it is not up to me to change it.

Thanks for the tip. I will pass it along to the author of the tool to take a look at.

Best Regards

Peter

0 Votes 0 ·
TomPhillips-1744 avatar image
0 Votes"
TomPhillips-1744 answered TomPhillips-1744 edited

Please see:

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/null-and-unknown-transact-sql?view=sql-server-ver15

A null value is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a null value and any other value, return unknown because the value of each NULL is unknown.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Peter-Jones-BIDA avatar image
0 Votes"
Peter-Jones-BIDA answered Peter-Jones-BIDA published

Hi Tom,

yes, I am well aware of these discussions. We were having them in the 80s.

Set Theory says ( Not Null Field = Null Field ) evaluates to NULL.

However, as RDBMSs were evolving there was no support for "UNKNOWN" as the resolution of a comparison operator.

For example what did "or NULL" mean as one of the conditions in a where clause?

So way back then a NULL result for a comparison operator as defined by set theory was defaulted to false in the emerging RDBMSs.

And that has been the way of it since the 80s.

This is why I am surprised to see that SQL Server did not behave like all the other databases have behaved over the years.

But it's good to know this is how SQL Server acts.

The changes to the code generator was just a few lines of code and the author got back to me with an update yesterday.

He is looking in to providing more options around the delta detection piece of the tool around nulls since this problem came up, so that's good!

Best Regards

Peter

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JoeCelko-6699 avatar image
0 Votes"
JoeCelko-6699 Suspended answered Peter-Jones-BIDA commented

The datetime field [sic] is set on the zxt_dnn_sitelog_src view but it is NULL in the zxt_dnn_sitelog_m1 view. <<

You might want to actually read a book on SQL. In this language, the term "field" refers to part of the column. The most common example given in a textbook is {year, month, day} fields in a date column

This statement evaluates to 'notequal'.

,( CASE WHEN (zxt_dnn_sitelog_src.datetime = zxt_dnn_sitelog_m1.datetime) THEN 'equal' ELSE 'not equal' END) AS test_val <<

The CASE expression is not a statement. An expression has to return a single scalar value. Then you don't seem to know that "datetime" is a reserved keyword in T-SQL so it should never be used as a column name. Finally, you don't know how NULLs work! You cannot use equality to test for nulls. You have to use the "<expression> IS [NOT] NULL" predicate

Clearly the two columns are NOT equal. One has a DATE in it and one is NULL. <<

Please get a book on SQL and read the chapter on three-valued logic. In this language. A search condition can return true, false, or unknown as logical values. All comparisons to NULL return unknown. This is not the same as true or full. It says in the values missing. We can't make any determination about it whatsoever

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Joe,

"Then you don't seem to know that "datetime" is a reserved keyword in T-SQL"

You do realise that is the name of the field in DNN, right?

DNN was highly touted by MSFT in the early 00s because of it's use of dot net.

If I were you?

I would not be so enthusiastic about being picky with other people.

This culture of endless criticism of other people and righteousness about trivial things is leading no where good.

Best Regards

Peter

0 Votes 0 ·
Peter-Jones-BIDA avatar image
0 Votes"
Peter-Jones-BIDA answered Peter-Jones-BIDA commented

@JoeCelko-6699 Hi Joe,

"You might want to actually read a book on SQL. In this language, the term "field" refers to part of the column. "

Those of us who were building large scale computer systems before relational databases were available called the fields in files..."fields".

And we still call fields in tables, fields. Sure, we could call them "columns".

And what difference would that make exactly?

"Please get a book on SQL and read the chapter on three-valued logic."

Again. As I mentioned.

I was around when RDBMSs were in their infancy. And back then (not null field = null field) evaluated to false.

We all knew set theory said it should evaluate to "null" but the RDBMSs could not handle that evaluation in the 80s so false it was.

Indeed, I was surprised to see this had changed.

Perhaps, Joe, you might show a little more respect to men who have forgotten more about relational theory than you have learned.

I mean. How old are you to post a comment like this?

Best Regards

Peter

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Those of us who were building large scale computer systems before relational databases were available called the fields in files..."fields".

And we still call fields in tables, fields. Sure, we could call them "columns".

Since Joe is quite an old man that may explain why he has been heard saying "field" in a presentation.


0 Votes 0 ·

Hi Erland,

yes. Could be.

Those of us who work in BI are still pulling fields from files and loading them in to columns in tables. Heck, a lot of those "files" are excel spreadsheets.

And so we tend to use the more generic term "field" when talking about data.

It's very common in the BI space to call columns in tables "fields".

It is sad to see a man at 75 spend his time on such a thing rather than the more important issues in the world.

For example? The epidemic of male suicide in western countries might be something that Joe might like to spend his sunset years working to correct.

There are many more important issues in the world for old men to spend their time on than definitions of words being used on a tech forum.

But it has been interesting to me to find out the evaluation of nulls changed. Good to learn that!

Best Regards

Peter

0 Votes 0 ·
JoeCelko-6699 avatar image
0 Votes"
JoeCelko-6699 Suspended answered Peter-Jones-BIDA commented

I mean. How old are you to post a comment like this <<

I turned 75 this year. My first full-time job was in 1965 as a GS-1 Fortran programmer for the Pittman-Dunn research labs. Ten of those years were spent on the ANSI X3H2 Database Standards Committee. Most of my work after that was on commercial systems, and then in later years, teaching correct SQL and repairing systems that were designed by people that have your attitude for the last 30+ years. And you don't you

You And we still call fields in tables, fields. Sure, we could call them "columns". And what difference would that make exactly? <<

My experience people is that get terms, confused, especially such fundamental terms, don't have the mindset to ever do a good job. People who still think in terms of fields are still in a file oriented mindset. They almost never write CHECK() constraints or use REFERENCES. Such a concept as DRI doesn't exist in their previous systems. They also have problems with concurrency control.

No I was around when RDBMSs were in their infancy. And back then (not NULL field [sic] = NULL field [sic]) evaluated to false.<<

No, it didn't. The field cannot be NULL; only a column can be NULL. A search condition evaluated to UNKNOWN, the third logical value. Remember I voted on things like this in the language standard when it was on the committee! I think I probably have more authority on this topic than you do. :-)

Before SQL, the main databases were IMS, IDMS, TOTAL, Image-3000, etc. which are based on the work of Mr. Bachmann. He got an ACM award for it. They were essentially navigational. You followed pointer chains (which is where you get the "parent" and "child" terminology that you also see bad SQL programmers using) , and picked up records as you went down the chains. Later, the ability to hash into a file and get a record was added to some of the products. But at no time did the concept of NULL exist. The best you could do was blanks, zeros, or special codes to handle missing data.

We all knew set theory said it should evaluate to "NULL" but the RDBMSs could not handle that evaluation in the so false it was. <<

My first Masters degree was in math, with a specialization in discrete systems. Please trust me when I tell you that set theory has no concept of a NULL. The closest thing is the empty set and it doesn't behave anything like the relational database NULL.

In full ANSI/ISO standard SQL we have some predicates you might want to look at:

<exp> IS [NOT] NULL
<exp> IS {TRUE | FALSE | UNKNOWN}
<exp 1> IS [NOT] DISTINCT FROM <exp 2>

pay attention to the last predicate.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

<exp 1> IS [NOT] DISTINCT FROM <exp 2>

And while SQL Server does not have this syntax, it is implemented in the optimizer, and this is what EXISTS (... INTERSECT ..) that I discussed earlier results in. This is why this is efficient.

1 Vote 1 ·
Peter-Jones-BIDA avatar image
0 Votes"
Peter-Jones-BIDA answered Peter-Jones-BIDA commented

Hi @JoeCelko-6699 ,
Hi Joe,

"My first Masters degree was in math, with a specialization in discrete systems. Please trust me when I tell you that set theory has no concept of a NULL. The closest thing is the empty set and it doesn't behave anything like the relational database NULL."

We all know that Codd and Date based the idea of a relational database on set theory. Yes, I have a math degree too. Whoopie.

And those of us who were around when the very early RDBMSs were in experimental stage (things like System R at IBM) knew and understood that there was a question as to how to represent the "unknown" value which most considered an extension of the "empty set" idea.

A null being the idea that the set was not empty, but rather, "unknown".

If you were around at that time and writing systems in the likes of IMS then you would know about the discussions that were had about default and unknown fields.

As I said. I would suggest you have more important things to attend to at your age.

Ok?

Best Regards

Peter

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

As I said. I would suggest you have more important things to attend to at your age.

<<

I have chosen educating the young and ignorant. Actually my retirement, since I can live (well, exist) book royalties and Social Security, I'm trying to get jobs teaching middle school and high school. I been doing the local science fast for six years now and I think I'm ready to get back in the classroom. Do you have a better idea of how to give back to the community?
0 Votes 0 ·

I have chosen educating the young and ignorant.

No, you have chosen bullying them. We all know that is your favourite pastime.

0 Votes 0 ·