question

NaomiNNN avatar image
0 Votes"
NaomiNNN asked NaomiNNN commented

Good index when need to join on 1 of 2 columns

Hi,

I'm wondering about best index strategy and the way to express a query for the following situation: I have two columns in the source and the target tables. When one column is populated with the value, the other is NULL. Columns are of bigint type. There is also a tinyint column with 1 for the first column populated and 2 for the second column populated.

So I expressed the JOIN condition in the MERGE command this way:

ON ISNULL(srce.FirstColumn, srce.SecondColumn) = ISNULL(trgt.FirstColumn, trgt.SecondColumn) and trgt.Indicator = srce.Indicator

I also created two filtered indexes for the source table (which is a temp table #Source)

index on FirstColumn where Indicator = 1 and similar for the second column.

Do you think it's the correct approach or what would you suggest for the problem in question (we're not going to change the table design).

Thanks a lot in advance.

sql-server-generalsql-server-transact-sql
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Theoretically you can probably write ON srce.FirstColumn = trgt.FirstColumn OR srce.SecondColumn = trgt.SecondColumn or maybe ON CHOOSE(srce.Indicator, srce.FirstColumn, srce.SecondColumn) = CHOOSE(srce.Indicator, trgt.FirstColumn, trgt.SecondColumn), but it is not clear if there are advantages.


0 Votes 0 ·

HI NaomiNNN,

Welcome back to Microsoft Q&A!

We recommend that you post CREATE TABLE statements for your tables together with INSERT statements with sample data so that we could have a better understanding of your requirement.

You could follow other experts' suggestions and check the execution plans and find out which one is better or faster.

Thank you for understanding!

Best regards
Melissa

0 Votes 0 ·
pituach avatar image
1 Vote"
pituach answered

Hi Naomi

I have two columns in the source and the target tables.

It can be much simpler to have the discussion on a sample scenario which we all implement and can discuss. DDL+DML to demonstrate the scenario is all that is needed.

I have two columns in the source and the target tables. When one column is populated with the value, the other is NULL. Columns are of bigint type.

I assume that by "two columns in the source and the target ", you mean two column in the source and two columns in the target.

In this case, it seems like a very expensive design.

Why not having a single column instead of the two BIGINT. This single column will include the non null value and in addition have a second small column like BIN to say which "column" this value presented?

This will provide you a smaller table + better flexble design (today you have two columns which one is not null and maybe tomorrow you will need 3 columns which one is not null - in your design this means adding columns and more rules. Using my desin you only need to change the BIT column into TINYINT use the value 1,2,3 for three columns).

Moreover! having this rule: "When one column is populated with the value, the other is NULL." means that you need to manae this rule in the client side or the server side. If you will not manage it well and cover all options then you might have a case where both column are not null or both are null. I see not reason for two separet columns if only one should store value.

Note! A DDL+DML could explain what is your exact scenario :-(

There is also a tinyint column with 1 for the first column populated and 2 for the second column populated.

So, why do you need these two columns?!? For the sake of the discussion let's assume that these two columns presents to groups or two types so this column provides the information if the value relayted to forst type/group or the second, and there is no need to have two columns.

we're not going to change the table design

OK, this is a limitation which mean above will not fit, but I would re-think about this limit probably.

I also created two filtered indexes for the source table... index on FirstColumn where Indicator = 1 and similar for the second column.

We need a DDL+DML to examine the execution plan but my first guess is that these INDEXes will be used well since by using calculation on the columns ISNULL(srce.FirstColumn, srce.SecondColumn) the server first need to calculate the value before he can filter the rows.

You can use something like the first option @Viorel-1 suggested (using CHOOSE does not solve this point) which is compare each column separately which might solve this point - again we need to examine it and not guess using DDL+DML and examine your real Execution Plan.

Do you think it's the correct approach

Need more information to answer this

what would you suggest...

(1) Try to use a simple INDEX on both columns (FrstColumn, SecondColumn) using DESC. This will allow the server to sort all the non-null in the FrstColumn first and the null at the end. Next try to use something like

ON srce.FirstColumn = trgt.FirstColumn OR (srce.FirstColumn is NULL and srce.SecondColumn = trgt.SecondColumn)

Note! This discussion is almost useless without a specific scenarion which we can test and examine. Highly recommended to provide DDL+DML + expected result + explanation of all different options





5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered NaomiNNN commented

ON ISNULL(srce.FirstColumn, srce.SecondColumn) = ISNULL(trgt.FirstColumn, trgt.SecondColumn) and trgt.Indicator = srce.Indicator

With this condition, you can forget about indexes.

Did I understand it correctly that Indicator is 1 if FirstColumn is defined and 2 if Second Column is defined?

In the that case, I would consider running two MERGE statements, on for FirstColumn and one for SecondColumn, and then you can index the individually.



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Instead of using ISNULL I tried using OR condition, but the performance was terrible. With the original ISNULL code the performance was fine, so I was thinking - perhaps I'm trying premature optimization instead.

Also tried 2 MERGE statements but was getting some weird errors, so was unable to test that.

The procedure was deployed today with the ISNULL code in place, We'll see in the next few days how would it perform (as in my tests I only run the query itself against temp table and not the real table)

0 Votes 0 ·