For the last couple of decades Excel has been the most popular tool to convert raw data into reports and charts. This could change soon as more people start to use Microsoft’s new Power BI.
Power BI gives you the ability to create reports and dashboards with a drag and drop interface. The BI stands for Business Intelligence. Built into Power BI is a data cleansing system that can correct most data issues you will face, for example, invalid date formats, added spaces and missing data. Also built-in is a formula language that works with relational databases.
Power BI is not Excel. It is a separate, stand-alone Microsoft application. The Power BI version that I will be discussing today is the desktop version. There is also an online version.
Power BI uses a number of features that are available in Excel. If you have used Power Query (Get and Transform in Excel 2016) or PowerPivot in Excel then you will recognise many of the features built into Power BI.
Power BI does not have a grid interface and does not use the formulas that you are used to in Excel. There is a formula language called DAX, but while it is similar to Excel, it is different and much more powerful because it works with relational databases structures.
Power BI is a huge topic and this series of posts aims to provide an introduction and get you started using Power BI.
Getting Power BI
It’s free! Yes, you read that correctly, FREE! This fully featured dashboard-creating system is free from Microsoft. It is also being updated monthly. To get the latest version (I am using the January 2017 version) you can follow the steps outlined online, which allow you to download the software from Microsoft.
Be aware you may need an administrator profile to install the software. This may mean you will need to liaise with your IT section if there are limitations on your software installation rights.
Although it is free to create fully featured dashboards, when it comes to distributing the final report to the people who need to read them then you and the users may need a Pro license (about US$10 per person per month).
Data – the starting point
Power BI can extract data from most database systems. It works well with CSV (Comma Separated Variable) files, text files and Excel files.
Another important feature of Power BI is its ability to work with multiple tables from relational databases.
I will only cover using a single table as a data source in this example. Relational databases are a complex subject on their own and I don’t want to add additional complexity to this post. We will use a CSV file as the data source. A CSV file is a common method of extracting data from most systems.
If you are lucky and your data is well laid out and formatted correctly then you won’t need to use the built-in data cleansing features that Power BI possesses. In most cases, however, there will be some form of data cleansing required to get your data ready to report on.
Conference On Demand - Unlock Excel: discover fresh and exciting ways to unlock the full potential of your data.
The data cleansing process is virtually the same as Power Query in Excel 2010 and 2013, and Get and Transform in Excel 2016.
In Power BI you click the Get Data icon on the Home ribbon to start the data importation process. As you can see from the image below, the common data sources are listed. If your data source isn’t shown, click the More option at the bottom of the list.
After you click the CSV option you navigate to where your file is located, select it and click Open.
The data I am using is small and simple, but has a few issues that we will solve with the built-in data cleansing features.
Note: Power BI is designed to handle huge, complex data sets.
My data is shown below (I have included the data set if you want to follow along).
There are four issues with this data:
- The header at the top is not required
- The dates are not in a date format
- The entries in the State column have extra spaces in the codes
- The Amount column is left aligned, indicating text values
These are all easy to fix with the built-in data cleansing features in Power BI.
Note: the Data Type Detection option above (top right of image) can assist with guessing data types when you load the data. If you experience issues with wrong data types being guessed you can turn the feature off – see image below.
At the bottom of the dialogue are three buttons. If your data is ready to go then you can click the Load button. Otherwise, click the Edit button to enable you to perform the data cleansing operations.
I recommend always using the Edit button, which enables you to view the data and the data types and to make sure that they are correct. In Power BI the correct data type makes a difference to the way reports and charts are created later in the process. Getting the data types correct upfront saves time later in the reporting process.
If you do click Load you can still access the Edit screen later in the process.
When you click the Edit button you are presented with a separate screen which is dedicated to data cleansing. See image below.
We are going to perform some data cleansing operations but note we are not changing the underlying data at all. So the CSV file that we are using will not be changed in any way. We are taking the raw data and amending it so that it is ready to report on.
The beauty of this technique is that when the file is updated with new data, the operations that we specify now will be re-run over the new data set each time we refresh the data.
In our case we will be performing data cleansing operations, but you can also perform filtering and sorting operations at this stage. Another common process is to rename the column names to make them more meaningful to users. Many large databases use abbreviations in the field names and you can change those to make them more informative to you and the user.
Let’s fix those four issues that I identified earlier.
Remove the header
Many CSV files have a header section at the top followed by the data section below. In our case we don’t need the top three rows. You may have spotted that there is a Remove Rows icon on the Home ribbon. Click that icon, then click the Remove Top Rows option and type in 3 and click OK.
The data is amended as per the screen shot below.
Now that we have only the data on screen, we can use another option on the Home ribbon (right side) called Use First Row As Headers. This promotes row 1 to the header row. From now on we can refer to the column by its correct name – see image below.
The ABC icons on the left of the column names indicate the text data type. These icons change based on the data type of the column. You can easily amend the column data type.
On the right hand side of the screen you may have noticed the Query Settings as per the Image below.
The name is derived from the file name – you can edit and change the Query name.
The Applied Steps list the processes performed on the data. This is a bit like the macro recorder in Excel, where each step is created in sequence and then you can re-run the processes again whenever you refresh the data.
There is an important limitation to be aware of in this interface.
YOU CAN’T UNDO!
This is a bit of a culture shock for Excel users. There is no undo button or shortcut. You cannot undo your last, or any, action. You can delete steps. You may have noticed the X on the left of the Promoted Headers step in the above image.
In general you delete steps from the bottom up. If you delete a step in the middle of the process it may corrupt a step later in the process. If you do delete a middle step a warning message will be displayed.
The cog symbol on the right of the step means you can edit the step. For example in the Removed Top Rows step you could modify the number of rows to remove. Not all steps can be amended. Sometimes you have to delete the step and redo it.
Rather than focus on the fact that you can’t undo, it is better to focus on the fact that there are so many built-in data cleansing operations that can be created with only a few clicks.
Even though there is no undo it is easy to delete and redo the last step.
Also when you are starting out a little extra practice can be a good thing.
Fixing the data types in the columns
We can now fix the last three issues we identified earlier.
To adjust the data type of any column, right click the column and choose Change Type and then select the data type you require. In the image below I have right clicked the Date column.
Change the Account and State columns to Text (it is already defined as text but that was guessed when it was imported – it is best practice to specify each column and its data type).
Change the Amount column to a Decimal Number.
Note the icons on the left of the Date and Amount column names have changed.
You may have noticed that only one step was listed after you changed all the columns. Sometimes steps are combined into a single step.
Leading and trailing spaces
The last issue we have is that there are extra spaces in the State column. This is also easy to fix. Right click the State column and choose Transform and then choose Trim – see image below.
That’s it! The data is ready to use in a report or dashboard.
The final Applied Steps are show below.
The last thing you need to do is click the Close and Apply icon on the left of the Home ribbon and we can start creating reports and dashboards.
Now the beauty of what we have just done is that if we update the Data.csv file with new data we only need to refresh the Query to get the latest or new data. The steps we have created will be repeated in the sequence we have created in Power BI.
Obviously if the structure of the file changed you would need to redo the Query.
The output of the Query is shown on the right side of the main Power BI screen – a list of column names – see image below.
Note: the column names (Fields) are listed in alphabetical order.
We have reviewed a few common data cleansing operations but there are lots more built into Power BI. Take some time and look through the various options available in the ribbons.
If the built-in features can’t provide the transformation you require there is a programming language system built in that performs other, more complex data cleansing operations.
What we have covered just scratches the surface of what the Get Data feature can perform. Remember this is also available through Power Query (Get and Transform) in Excel 2010 and later versions.
Some of the other things you can do:
- Create a single table of data from all the CSV files in a folder (assuming they are all the same structure)
- Create a single table of data from all the Excel files in a folder (e.g. you could only import data from sheets with a certain name)
- Bring in multiple tables from different sources (e.g. TXT, CSV, Excel, Access and SQL and then relate them together using common fields, just like a relational database)
Where to from here?
Now that we have some clean data, in the next article we will look at creating a report and later a chart.
Neale Blackwood CPA runs A4 Accounting, providing Excel training, webinars and consulting services to organisations around Australia. Questions can be sent to firstname.lastname@example.org
This is the first in a three-part series that shows how you can use Power BI to get the most out of your data.
How Power BI can supercharge your Excel data (part 2)