Power BI Remove Duplicates
BY: David Rohlfs
Having duplicates (or multiple occurrences) of a data field in your data can become a big issue very quickly. It can make your data inaccurate and not credible. There is an easy built-in way to check for duplicates in Power BI by using the Remove Duplicate Rows button.
Using this function can be an easy solution to your problems if there are duplicates in your data that is skewing your results. Many of the problems associated with duplicates cause inaccuracy and can harm a dashboards credibility.
When to use it
Remove duplicates will remove all the duplicated rows in a specific column and leave the topmost duplicate as the original. This function will work when there is one or more duplicates in the column you are using it on.
This is very easy to use for when you have duplicated rows in your data. If you want to check for duplicated rows look at the EPM Strategy article about Group By
Using a Specific Column
When you are using the Remove Duplicates function you will want to make sure that you use a column that is associated with specific values for that row. You want to do this because Power BI will only look through the selected column to search for duplicates. This means if you have a row that sold the same product, it will delete the second row, even if they were on different sale orders (Look at Common Problems).
To reduce the chance of this happening I suggest limiting this function to columns that use data that won’t be repeated in the table you are in. This data should be included with the source data. These columns may include:
- Sales Order Numbers
- Product Identification Numbers
- Customer account Numbers
- Date Tables
If you are using the Remove Duplicates on a column that is not specific for a specified set of a row (see Using a Specific Column above), then this function will probably remove more than what you intend it to.
For Example, this is a data set with many products and some of them are duplicates, but not all. We should be using the SO_Number, but in this demonstration, we are using the product column.
After we have removed the duplicates in the product column, it removed much more than we wanted it to. It has removed all the apples except for the first one, while we wanted to keep most of them.
If we would have removed the duplicates from the SO_Number then we would have 12 rows, but since we used the product column, we only have three.
To demonstrate this function, we will do a quick walkthrough of where to find the remove duplicates and how it is used.
After you have loaded in your data, you will want to go to the Power Query.
Next, you will want to click on the column that you are removing the duplicates from. I chose Sales Order Number.
Note: you will want to use a specific column for this. Something like a Processing number, Sales Order Numbers, Product Identification Numbers, Customer account Numbers. To see what this looks like read the Using a Specific Column section above.
Now you can select the dropdown on Remove Rows. This will come up with multiple ways to remove rows, but for now we will select the Remove Duplicates option.
After you have clicked on Remove Duplicates, your query should have taken out all duplicates that were in the selected column. If you want to remove this transformation you can click on the ‘X’ in the applied steps of your query.
Now you just need to click the Close and Apply in the top left corner.
These are a couple of visuals that I set up to show the differences when there are duplicates in your data.
You can see that the Total Sales Amount for the second matrix is correct because the duplicates of SO_01 have been removed.
You can also see how the Product column says First Product in the first visual and only Product in the second visual. This is because in the first visual the product column finds multiple data points for SO_01
Sometimes, your data is duplicated for some reason in your data source and you need to remove those duplicates to get an accurate report. Using the Remove Duplicates function is a great way to accomplish this task.