Power BI – Fuzzy Match Maximum Number of Matches
By: David Rohlfs
If you have worked with fuzzy match before, you know that there are a lot of complexities to how you can create your merge. One of the most useful of these is the Maximum Number of Matches option. This feature allows you to limit how many matches there are with each match in a fuzzy match merge.
This blog is building off several other EPM Strategy blogs that have focused on Fuzzy Match. Because of this, we will not be covering some of the basics of fuzzy match. If you want to read through those blogs, they will be linked below.
In this blog we will go over when to use the maximum number of matches. We will not be going through a walkthrough in this blog because it will be extremely short, so there will be good examples inside of the when to use it portion of the blog.
When to use it
Let’s start off with what is the maximum number of matches option inside of the fuzzy match.
The maximum number of matches feature in the fuzzy match allows a developer to limit the number of matches from Table A to Table B when using a fuzzy match. This becomes a very useful feature when the dataset size is smaller, the data is very structured to only have a certain number of matches per row, and/or there is a low similarity threshold being used and there is an overabundance of matches.
Let’s go through an example where using the Maximum Number of Matches will help our merge. We have Table 1 and Table 2 as our data. Note that both examples only have the key column as their dataset. I have done a fuzzy match merge between the two to create Merged Table 1. Notice that Chicken Tikka Masala from Similarity Column 1 is repeating.
We can fix this by setting our Maximum Number of Matches to 1. Doing this has created Merged Table 2 which has the data that I expected.
To explain what has happened, we do need to go into the background of how a fuzzy match works. What a fuzzy match/fuzzy logic is doing is finding the percentage of similarity between each of the options from table 1 to table 2. So, the similarity between “Chicken Tikka Masala” and “Tikka Masala” is higher than the similarity between “Chicken Tikka Masala” and “Chicken Caesar Salads”. When we limit the maximum number of matches, Power Query will go through and choose the top n results of a match to determine which matches make it to the merge (along with limits due to the similarity threshold). This is why we can have the reduction of 1 row in Merged Table 2 compared to Merged Table 1.
It can be hard to describe the all the specific reasons where you need to use this feature, but the general idea is that you have too many matches from table a to table b. If you know your data well, then I suggest you use the method described in Utilizing a Group By Function (blog linked below) to help discover these match errors inside of your data.
Because we are not going to do a walkthrough, here are a couple of screenshots of similar situations to the first example that you can look through to get a better understanding of what the maximum number of matches does.
I hope that you enjoyed this blog. I know that it is more of a niche topic, but it can help your use of fuzzy match merges a lot. There are several blogs linked below that are aimed at merging and fuzzy match merges that I highly recommend reading through to help expand your knowledge on the subject.
Links to Related Articles: