Hi
I am doing a cleanup of a customer table. The customer table receives data from 4 different sources and there are many duplicate records. I am using a combination of rules to identify similar customer records. The rule I have are:
1) they have same firstname,lastname, email and mobile
2) they have similar lastname, mobile and email
3) they have similar firstname, lastname and email.
I've so far been unable to write a query or use SSIS to identify the records based on the above rules. Can anyone help me.
Thanks
Expected output:
Code:
CREATE TABLE dbo.customer(
customer_id int,
first_name varchar(100),
last_name varchar(100),
email varchar(100),
mobile varchar(20),
unique_id int)
GO
INSERT INTO dbo.customer(customer_id, first_name, last_name, email, mobile)
values (5475872,'','SMITH','john.smith@fakeemail.com','123456'),
(623073,'JOHN','SMITH','john.smith@fakeemail.com','123456'),
(1286681,'JOHN','SMITH','john.smith@fakeemail.com','123456'),
(1303103,'JOHN','SMITH','john.smith@fakeemail.com','123456'),
(7063902,'JOHN','SMITH','john.smith@fakeemail.com','987654'),
(1808155,'JOHN','SMITH','john.smith@fakeemail.com','123456'),
(2151985,'','SMITH','john.smith@fakeemail.com','123456'),
(7221764,'JOHN','SMITH','john.smith@fakeemail.com','987654'),
(1870704,'JOHN','','john.smith@fakeemail.com','123456'),
(887842,'BILL','GATES','bill@microsoft.com','555555'),
(736363,'','GATES','bill@microsoft.com','555555')
GO