Power BI – Cleaning Data by Fill Down and Remove Alternate Rows
By: David Rohlfs
Introduction
It is almost inevitable that when you connect to a new data set you will need to clean the data before being able to report. There are always test rows, misspelled values, and sometimes just table formatting issues. It isn’t common, but also not un-common to deal with tables that are set up to work for a specific reason, but that reason isn’t reporting. In this blog, I will go through a process that is similar to a Pivot function but has more capabilities. This blog will include a section on some of the basic and use cases of this process and a walkthrough of the process itself.
When to use it
The issue with creating blogs that are so centered on a specific process is that it is hard to know when to use that process. Before trying this, I would highly encourage you to read through Pivot in Power Query linked below.
Sometimes data can be stacked vertically where you have column names in one column and their respective values in the second column. While this isn’t common, I have come across it in the real world. Usually a Pivot, Transpose, or Unpivot would fix this issue, but occasionally your data will have nuances that can be hurtful. Sometimes these nuances create issues where a function won’t work, and you have to resort to doing a process similar to this.
Originally, like many other blog topics, this process came from a real-world situation. It is important to note, this process was specifically created because of discrepancies in a dataset where irregularities caused the Pivot function to break. Because of the inability to fix the error before pivoting, this process was created.
Walkthrough
For this walkthrough, we are going to use a set of sample data to help set the scenario. We have been given a query from a location that doesn’t use a normal layout. Instead, they have a column name and a value.
We do know that the column names are all the same and there is always a value present in each row of the Value column. We also know that we will only ever have 5 columns in the data. Because of these statements, we can use the method of fill down and removing alternate rows.
To start, we are going to go into our query and ensure everything looks correct. No out of place columns or missing values. (While you should always do this, it is extra important when you are talking about removing rows in the data)
Now we are going to start creating conditional columns. You will need to create one conditional statement for each column you have. In this scenario I have five different columns (GUID, Product Name, Weight, Product Origin, and Sale Number) and each of these will have their own conditional column statement.
In the conditional column pop-out, I am going to state that if my Column Name column is __ then return the value. My __ (blank value) will be my column name. Your example should look like this:
Now repeat this for all of your columns.
Once you finish, you should get something that looks like this:
Go ahead and change all of these columns to a Text data type.
- The column MUST be a text for this to work. If you have values that are not text, you can change them later on.
Next you can remove your original Column Name and Value columns. These are no longer needed.
We are finally ready for our Fill Down action. To do this, you can select all of your columns (either through ctrl click or shift click), then under the Transform Ribbon, do a Fill Down.
Now, I want you to notice a pattern in this data. Every fifth row is correct! But if you look really closely, al rows except those that are a multiple of five has at least one field wrong.
What we need to do is remove every row except for the fifth rows. Fortunately for us, Power BI has the perfect function for this. Under the Home Ribbon > Remove Rows > Remove Alternate Rows, we can plug in our rules for what rows to delete.
Here are my values for the pop-out when I want to only keep every fifth row. You may need to do some trial and error to make sure your values that you enter are correct to your data.
And here is my result!
Comparing this to my original data, you can see that it correctly displays the rows of data in a way that Power BI can easily work with the data.
Conclusion
While I don’t expect you to need this process every day, when you need to do something complex, a standard process can be helpful. If you enjoyed this blog, I have linked a few similar blogs below for you to enjoy!
Links Related to This Blog:
Comments
Leave a Reply
You must be logged in to post a comment.