if I have a huge table in data base.. around 40 million records that I have to check that all data are correct inside fields based on some business rules.
let us assume this is my huge table... and the rules are:
US only can have [national_id_type]=NID.. others must be VISA
US only with NA can own house.. other nationality with VISA cant own a house
enter image description here
what I think
first to land the table inside a landing table to preserve the orginal data
rewrite error in the column to display the errors and the number of them inside a power bi dashboard so the quality department can fix them
enter image description here
what I will do is
case when nationality <> 'US' and NATINAL_ID_TYPE = 'VISA' then '99\US only can have [national_id_type]=NID' ELSE NATINAL_ID_TYPE END as NATINAL_ID_TYPE ,
case when nationality <> 'US' and NATINAL_ID_TYPE <> 'NA' AND own_house='YES' then '99\US only with NA can own house' ELSE NATINAL_ID_TYPE END as own_house
is this way efficient to detect errors in 40 million Record X 70 Columns I think this will be very slow...
can you help me with some tips I can use to provide the errors for dashboarding and cleansing that can be used for performance
thank you