Make this simple change to improve your Reltio matches and to reduce the number of potential matches
Customer
American luxury retail company with more than $3 billion in revenue
Problem
It is well known that when working with large volumes of data, inaccuracies tend to emerge, especially when dealing with inconsistent data that comes from multiple sources.
As a company that works on an everyday basis with the Reltio Connected Data Platform, our main objective is to attain the highest data quality standards.
Recently, we ran across an interesting topic on match rules strategies in the Reltio documentation that helped us to improve accuracy of matches.
We had 10 million crosswalks which generated 2,5 million entities of type Individual in one of our Reltio tenants. We noticed that there are too many potential matches which were coming from misspellings and nicknames of first name attribute values. Our target was to reduce those potential matches to lower the overall count of Individuals and to have more accurate record matching.
Solution
We used a combination of DoubleMetaphoneComparator and Name Dictionary Cleanser, as suggested in Reltio’s documentation, but incorporated them in a couple of automatic merge rules, instead of suspect match rules. The DoubleMetaphoneComparator covered cases where first name values are phonetically equal, even when they are written a bit differently. The Name Dictionary Cleanser covered nicknames that should be treated as semantically equal.
- There are a couple of things that you need to consider when changing your match rule structure:
don’t forget to analyze your data prior to making any changes
perform tests with different comparators to see which one works best for your use case
assess the impact on your downstream systems when it comes to changing the merge rules
Impact
This action reduced the count of Individuals by 5% of which 80% had 1 or more than 1 potential matches. That led to an increase in the consolidation rate and a decrease in the potential matches count which reduced the workload on the data stewards.