question

RadiSoufan-5557 avatar image
1 Vote"
RadiSoufan-5557 asked MelissaMa-msft commented

best way to cleanse huge data

if I have a huge table in data base.. around 40 million records that I have to check that all data are correct inside fields based on some business rules.
let us assume this is my huge table... and the rules are:

  1. US only can have [national_id_type]=NID.. others must be VISA

  2. US only with NA can own house.. other nationality with VISA cant own a house

enter image description here
gcdRL.png

what I think

first to land the table inside a landing table to preserve the orginal data
rewrite error in the column to display the errors and the number of them inside a power bi dashboard so the quality department can fix them
enter image description here
https://i.stack.imgur.com/gcdRL.png

what I will do is

case when nationality <> 'US' and NATINAL_ID_TYPE = 'VISA' then '99\US only can have [national_id_type]=NID' ELSE NATINAL_ID_TYPE END as NATINAL_ID_TYPE ,
case when nationality <> 'US' and NATINAL_ID_TYPE <> 'NA' AND own_house='YES' then '99\US only with NA can own house' ELSE NATINAL_ID_TYPE END as own_house
is this way efficient to detect errors in 40 million Record X 70 Columns I think this will be very slow...

can you help me with some tips I can use to provide the errors for dashboarding and cleansing that can be used for performance

thank you


sql-server-generalsql-server-transact-sql
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @RadiSoufan-5557,

Could you please validate all the answers so far and provide any update?

Thanks!

Best regards,
Melissa

0 Votes 0 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered

Hi @RadiSoufan-5557,

Welcome to Microsoft Q&A!

I found that the image descriptions you provided were the same, please help check and provide the correct ones.

What environment is your table or database in? If it is in production, you have to be very careful and make more tests before updating.

You could try to add indexes on the columns like nationality, NATINAL_ID_TYPE and own_house.

Besides, you could have a try to update in batches of 1000 or 10000.

What is your recovery mode of your database? If it is full, you also have to pay attention to the size of the transaction log.

Please refer below links which may be helpful to you.
How to update large table with millions of rows in SQL Server?
Best way to update huge table

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

Well, you will have to scan that big table, there is no way around that.

It is not clear to me, if you are only going to look for errors, or also correct them.

Sometimes it can be a good idea to break up operations like this in batches, but I'm not sure that it is worth here. That depends on how many rows you will need to update - if you are updating at all.

Although, if you need to do this when then table is being accessed by other users, it's slightly different.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.