MIIS/ILM Code Experiment: Joining using name tables for abbreviated and similar names

 

(formerly posted at https://blogs.technet.com/shawnrab/archive/2007/11/07/miis-ilm-rocket-science-joining-using-name-tables-for-abbreviated-and-similar-names.aspx ) 

I am tasked this week with performing joins.  We have an Active Directory environment and a Lotus Notes environment and we're simply trying to get the similar accounts joined.  Biggest problem is that we don't have anything consistent between the two.  No Employee ID and the Notes ShortName is not consistently the same as samAccountName in Active Directory.  Most of the identities have a first name, middle initial and a last name.  Some of them do not have a middle initial.

 We have around 10000 total identities and we were able to get all but 1700 joined with these three mappings:

1.) Lotus Notes (FirstName and MiddleInitial and LastName) ---direct mapping---> MV via Active Directory (givenName and initials and sn)

2.) LN(FirstName and LastName) ---direct mapping---> MV via AD (givenName and sn) with LN(ShortName) ---rules extension mapping--->MV via AD(samAccountName)

3.) LN(FirstName and LastName) ---direct mapping---> MV via AD (givenName and sn) with LN(ShortName) ---rules extension mapping--->MV via AD(first eight letters of the samAccountName)

 Where one of the entries of ShortName was "close" to samAccountName, however ShortName is multivalued

of the 1700 left, a large chunk is test accounts, service accounts and other fun items, but we found around 450 "potential" joins where we had LastName and one of the ShortName-samAccountName mappings correct however the first name was not correct.  I found a pattern where we would have a successful join if we could correlate similar or abbreviated names.  Example (names have been faked to protect the innocent):

 Similar/Misspelled Name-->Phillip D. Jones (pdjones) vs. Philip D. Jones (pdjones)

Abbreviated Name-->Richard J. Smith (rjsmith) vs. Rich J. Smith (rjsmith)

I created a file with entries comma seperated where these associations were made. Here are some example lines:

 Andrew,Andy,Drew

Edward,Eduard,Eddy,Eddie,Ed,Edwin

Jennifer,Jenifer,Jen,Jenny,Jeni,Jennie

Katherine,Kathleen,Kathrine,Catherine,Cathleen,Cathy,Kathy,Kate,Cate,Cathrine

Timothy,Tim,Timmy

...etc...

 

We were able to come up with over 100 names that could be abbreviated, spelled differently or easily misspelled.  I then created a multi-valued Metaverse Attribute called AltFirstName and flowed in these names.  To do so, I saved my flat file to the Extensions Directory (typically C:\Program Files\Microsoft Identity Integration Server\Extensions) and added the following logic to my Management Agent rules extensions (NOTE: the Notes Extension is below, the Active Directory Extension is similar, yet the attribute name givenName is used where in LN, FirstName is used.)

Public

Class MAExtensionObject

Implements IMASynchronization

 

    Dim AltNameArr As String() ' global array to populate similar names

    Public Sub Initialize() Implements IMASynchronization.Initialize

AltNameArr = File.ReadAllLines(Utils.ExtensionsDirectory +

"\names.txt") ' single call to push the lines of the file to

        ' a String Array - each string represents the possible alternate names

    End Sub

 

...

Public Sub MapAttributesForImport(ByVal FlowRuleName As String, _

ByVal

csentry As CSEntry, ByVal mventry As MVEntry) Implements _

IMASynchronization.MapAttributesForImport

Select Case FlowRuleName

Case "AltFirstName"

If Not mventry("AltFirstName").IsPresent Then ' this is rather expensive so only do it once

For Each nameArr As String In Me.AltNameArr ' for each line

If nameArr.IndexOf(csentry("FirstName").Value + ",", StringComparison.CurrentCulture) > -1 Then ' check to see if the name is present

' I added the comma so "Jo" short for "Joanne" didn't return an index for "Joe" and the case sensitivity

' StringComparison.CurrentCulture so "Ian" doesn't get confused with "Brian"

Dim AltFNValues As String() ' Initialize a single string array for each possibility

AltFNValues = nameArr.Split(

",") ' Populate that array

For Each altFN As String In AltFNValues ' for each entry in the array

mventry(

"altFirstName").Values.Add(altFN) ' add a new value to the AltFirstName metaverse entry

Next

End If

Next

End If

...

 

I then created these join mappings 

1.) LN(FirstName and LastName) ---direct mapping---> MV via AD (AltFirstName and sn) with LN(ShortName) ---rules extension mapping--->MV via AD(samAccountName)

2.) LN(FirstName and LastName) ---direct mapping---> MV via AD (AltFirstName and sn) with LN(ShortName) ---rules extension mapping--->MV via AD(first eight letters of the samAccountName)

 

Now, I am sure there is a more efficient way to do this, using SQL or XML rather than a flat file to import the name associations, or using a two-dimensional array instead of looping the file into one array and splitting to get a second array, but as a first test, we were rather successful with zero false positives.  We were able to recover 425 of the 450 "potential" joins this way. 

I'd like to mention that I received a lot of help from the customer on creating the large list of names (over 400 names) - Christine M. and my shadow this week, Raffe who also came up with a few as well.  Both were also very good voices of reason ensuring me that I wasn't unreasonable trying to get "Brian" to map to "Brien" when the last name was Smiswartszkoether (completely made-up last name intended to sound unique) on both sides. 

I am sure I am not the first person to do this, but if I can help someone out there out with their joins by posting this, I've done exactly what I've set out to do. 

Have fun, and good luck

--Shawn

UPDATE: 11/28 - For those people running SQL 2005, it appears that Alex Tcherniakhovski found a way to do this using functionality built in to SQL 2005 SSIS - see https://blogs.gotdotnet.com/alextch/archive/2005/07/15/init-record-linking-via-ssis.aspx

or

https://blogs.msdn.com/alextch/archive/2005/08/09/miis-and-ssis.aspx

to find out more.  

 

This posting is provided "AS IS" with no warranties, and confers no rights.

And Just in case...

This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.