Power BI – Utilizing a Group By Function
BY: David Rohlfs
When you grow in your report building skills there are some processes that you begin to use more and more often. One of these processes for me is How I use the group by function to create a new column. This is a process for me that works well, and I want to talk about some of the uses of it.
This blog will be using a few pre-covered topics and ideas from EPM Strategy like Group By, Append and Merge, Adding a Custom Column, and Removing Duplicates from Multiple Rows. To find these blogs, there will be links at the end of the page.
In this blog we are going to go over a process that uses merging and the group by function to add important details into your report. We will go through when to use this process and a short walkthrough demonstrating this process.
When to use it
This process works by using the group by function in Power Query and merging information back into the original query. The goal is to get one of two things from the Group By function:
- One of the Aggregation processes.
- The All-Rows process to add a table to the row.
A lot of the time I use this process to find duplicates and create a filter for them. You do this by making steps for Group By, Adding a Custom Column, and Merging. I will explain this process in greater detail in the Walkthrough.
Because this is more of a method and technique rather than describing a function, there are a lot of uses that you can find depending on your dataset. So, the use of this process depends on if the situation is right for it.
In this walkthrough I want to describe this process as it relates to how I find duplicates in my data. This example shows the way that I use this process the most and it is a good example of how the uses can vary to your situation.
To start, we are going to have a dataset that has duplicates. My goal is to use a group by then create a column that I can use to filter by and find my duplicates.
Now we are going to duplicate the query that we are working on. As long as we don’t use very many of the columns, then the affect to performance should be minor.
Now we are going to right click on the referenced query and unselect Enable Load. Do Not unselect the refresh option or else this query will only refresh as often as you manually refresh it.
Now we are going to do an advanced Group By Count on the columns that we care about. These are the same columns that I am worried about there being duplicates on.
Once we have our results, I would like to verify that there are duplicates on the count column by clicking the dropdown arrow next to the column name and clicking load more. If there are options greater than 1, then we know that there are duplicates in the data.
Now we can add in a custom column that responds to the count column. To do this we will follow the steps in the Adding a Custom Column blog by EPM Strategy. The difference is that our code will look like what I have below. You can also use a conditional column instead of M Code.
Now we should have another column that responds TRUE or FALSE depending on the count column.
From here we are going to do a merge between our referenced query and our parent query. So, you can go back to your parent query and go through the process of merging your queries together.
From here we can expand our TRUE or FALSE column and filter to find our duplicates.
Although I went through the example of using this process for duplicates, I have used it on other processes that I want to do for greater than or less than, amounts, or just to create a filter that depends on a count. If you don’t already know any of the underlying processes that we went through, EPM Strategy has them all in previous blogs linked below.
Links Related to This Article: