Remove duplicates before unifying data
This optional step in unification enables you to set up rules for eliminating duplicate records within an entity. Deduplication identifies multiple records for a customer and selects the best record to keep (based on basic merge preferences) or merges the records into one (based on advanced merge preferences). Source records get linked to the merged record with alternate IDs. If rules are not configured, system-defined rules are applied.
Default deduplication
The system-defined rules apply if no deduplication rules are added.
- The primary key is deduplicated. For any records with the same primary key, the Most filled record (the one with the fewest null values) is the winner.
- Any cross-entity matching rules are applied to the entity. For example: In the match step, if entity A is matched against entity B on FullName and DateofBirth, then entity A is also deduplicated by FullName and DateofBirth. Because FullName and DateofBirth are valid keys for identifying a customer in entity A, these keys are also valid for identifying duplicate customers in entity A.
Include enriched entities (preview)
If you enriched entities on the data source level to help improve your unification results, select them. For more information, see Enrichment for data sources.
On the Duplicate records page, select Use enriched entities at the top of the page.
From the Use enriched entities pane, choose one or more enriched entities.
Select Done.
Define deduplication rules
On the Duplicate records page, select an entity and select Add rule to define the deduplication rules.
In the Add rule pane, enter the following information:
- Select field: Choose from the list of available fields from the entity that you want to check for duplicates. Choose fields that are likely unique for every single customer. For example, an email address, or the combination of name, city, and phone number.
- Normalize: Select from following normalization options for the selected attributes.
- Numerals: Converts other numeral systems, such as Roman numerals, to Arabic numerals. VIII becomes 8.
- Symbols: Removes all symbols and special characters. Head&Shoulder becomes HeadShoulder.
- Text to lower case: Converts all character to lower case. ALL CAPS and Title Case becomes all caps and title case.
- Type (Phone, Name, Address, Organization): Standardizes names, titles, phone numbers, addresses, etc.
- Unicode to ASCII: Converts unicode notation to ASCII characters. /u00B2 becomes 2.
- Whitespace: Removes all spaces. Hello World becomes HelloWorld.
- Precision: Set the level of precision to apply for this condition.
- Basic: Choose from Low (30%), Medium (60%), High (80%), and Exact (100%). Select Exact to only match records that match 100 percent.
- Custom: Set a percentage that records need to match. The system will only match records passing this threshold.
- Name: Name for the rule.
Optionally, select Add > Add condition to add more conditions to the rule. Conditions are connected with a logical AND operator and thus only executed if all conditions are met.
Optionally, Add > Add exception to add exceptions to the rule. Exceptions are used to address rare cases of false positives and false negatives.
Select Done to create the rule.
Optionally, add more rules.
Select an entity and then Edit merge preferences.
In the Merge preferences pane:
Choose one of three options to determine which record to keep if a duplicate is found:
- Most filled: Identifies the record with most populated attribute fields as the winner record. It's the default merge option.
- Most recent: Identifies the winner record based on the most recency. Requires a date or a numeric field to define the recency.
- Least recent: Identifies the winner record based on the least recency. Requires a date or a numeric field to define the recency.
In the event of a tie, the winner record is the one with the MAX(PK) or the larger primary key value.
Optionally, to define merge preferences on individual attributes of an entity, select Advanced at the bottom of the pane. For example, you can choose to keep the most recent email AND the most complete address from different records. Expand the entity to see all its attributes and define which option to use for individual attributes. If you choose a recency-based option, you also need to specify a date/time field that defines the recency.
Select Done to apply your merge preferences.
After defining the deduplication rules and merge preferences, select Next.
Feedback
Submit and view feedback for