BI Labs and Fuzzy Match for Excel
Just saw that the Business Intelligence folks have setup BI Labs – It will be interesting to see all the prototypes and concepts as the come out and how they can be used for Scientific research/exploration. I’m currently looking at the Fuzzy Lookup Add-in to see how it performs for environmental datasets.
The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match. While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.
BI Labs is a collection of experimental business intelligence projects and useful applications made available from internal sources across Microsoft. These projects are prototypes and concepts, and there are no current plans to include them in Microsoft products. New ideas can pop up at any time, so please check back often to see what's new. We look forward to your feedback. Enjoy!
Cross Posted from Dan Fay's Blog (http://blogs.msdn.com/dan\_fay)