Note: You can listen to the blog post on the video or read the blog post.
Hello and Welcome.
I am Esther.
I am Peters A I Assistant to create voice overs.
I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.
Hello and welcome Gentlemen.
I have something to share with you that I think is very interesting and important for all front end data analysis tools like Power BI, Tableau, Click, Meta five and many others.
For the purposes of this blog post we will focus on Power BI.
This is because Power BI has become the most used end user analysis front end tool after Excel.
I believe the technique presented in this paper also applies to Excel.
Firstly, please allow me to describe the background to the problem this paper presents one solution for.
Ralph Kimball created the idea of having multiple levels of data in single dimension tables along with multiple levels of data in fact tables.
This was created at the company that he cofounded in 1982 called Metaphor Computer Systems.
When Metaphor Computer Systems was purchased by IBM, Ralph started a company called Red Brick which sold a database that implemented these ideas inside the database.
Red Brick was the only database that allowed you to present the view of the most detailed data in the data warehouse, and it would automatically navigate to the summary level that provided the most efficient answer.
With Red Brick any front end tool could connect to it and could send SQL into the database at the most detailed level, the database itself would then parse and rewrite the SQL to go to the most efficient level of data.
In 1994, when IBM shut down Metaphor Computer Systems, some of the staff went to Micro Strategy.
They brought the multi-level data idea to Micro Strategy.
It was decided that Micro Strategy would implement these ideas but using different tables rather than views.
This decision was to provide a level of lock into Micro Strategy.
This was because such a data model was not readable by other front end tools.
The investment was made and Micro Strategy became the only front end tool that could properly navigate multi level data models.
This became a part of the reason for the success of Micro Strategy at the high end.
Micro Strategy could answer the same workload of queries with at least ten times less database compute power than any other front end tools.
The product that Ralph helped create is now known as Meta five.
This tool required the end user to specify what level of data was being requested from the database.
There is a tool call the Reporter Tool where the setting of these levels is left to the end user and it’s very simple to do.
There is also a tool called the Query Tool where the levels can be set by the end user very simply as well.
If you would like to watch a video demonstration of how this is done?
Please click on the button below.
The main design technique that Metaphor Computer Systems used was that the levels of data in the dimension tables had unique keys across the many levels.
So, for example, the day level data had unique keys.
The week level data had unique keys.
The month level data had unique keys, and so on.
For geography the same applied.
There were unique keys for the levels of post codes, towns and cities, states, countries, and economic regions.
For products there were unique keys for the levels of product, product group, product categories and so on.
By having unique keys for all levels of dimensions the summary data could be placed into one fact table.
That one fact table would then be properly accessed and you could guarantee no double counting.
In recent times the idea of a summary table number at the front of the summary tables has been introduced.
By doing this it is possible to present out to Micro Strategy the exact model it needs to be able to navigate the summary levels.
I have done a lot of work with Micro Strategy customers.
My ETL software maintains data models that Micro Strategy can read.
In this example we are going to have views at the different levels like Micro Strategy uses.
We are going to present the summary tables out as different views.
Of course, since we are writing views and exposing views to Power BI, we can change the underlying queries in the views and Power BI knows nothing about that.
Now, let us look at an example summary table and a view placed over the top.
On the video you will see part of the create view statement for a fact table summary view for retail outlet sales transactions.
In the blog post you will see the portions of the create view statement and you can click on each image to get the full image.
This is about as simple as a summary table can be which is why we have it in the demonstration.
You can see that the underlying table has the lookup keys at the detail level.
You can see that there is a key for the sale date.
And you can see there are keys for the retail outlet, the product, the geography, the currency, the customer demographics, the transaction type, the transaction status, and finally the unit of measure for the sale.
You can see there are the usual suspects for the fact columns.
There is the sale extended amount, the sale cost extended amount, the VAT extended amount, the discount extended amount, the sales units, and finally the number of sales that were summarised up into this record.
You can see the from clause saying that it is coming from the underlying multi level fact table.
You can see that the fact table level is set to one which means this is summary number one.
This is just an example which is why you have summary number one.
Every summary level gets a unique number across all summaries so that there can be no chance of double counting.
Now, to the right of the view you can see that the table columns are aliased in the view.
These aliases link to the levels of the views that expose that level of data.
So, you see that this summary is at the week level.
You can see it is at the outlet level, which is the detailed level.
You can see it is at the product group level.
You can see it is at the marital status level.
The other levels are all at the detail level.
The reduction of rows you get due to summarisation varies according to your data.
In general you should expect to get between 10 and 1,000 times fewer rows than in the detailed outlet sales transaction fact table.
Even if you use columnar databases or SQL Server Column store indexes you will see a vast reduction in CPU usage and IO usage with such summary fact tables.
On Azure CPU and IO has to be paid for and so multi level fact tables will save you money.
The only question is how much money.
With see TL the extra cost of creating these multiple levels is zero for all intents and purposes.
Ok.
Now we will go to the one big view over the top of this table and the relevant dimension tables.
We will do this in sections so it’s easier to understand.
This is a very simple view, but the connotations of what it does are big news.
In the select portion of the statement we can see the following.
We can see the selection of the desired fields from each of the dimension tables.
They are selected in the order we want them to appear in the output view.
We have prefixed the data fields with letters starting at b so that they will be grouped together in the Power BI semantic model.
You can see that the facts are at the end of the select statement but the fields will be sorted to the front of the semantic model because the fields are prefixed with the letter A.
Next we have the from statement to select only those rows that are in summary level one.
Next we have the inner joins to join the fields from the summary table view to the views for the correct levels of the dimension tables.
One of the many reasons we prefer this is that the person writing the SQL does not need to understand what levels exist inside tables.
To the person writing the SQL they will just get the data from the tables they are told to get the data from by the report developer.
In our models all joins are inner joins and we guarantee the referential integrity of the data using see TL.
Lastly, we have the obvious need for the group by and order by for all the dimension columns that will be used by this one big view.
Of course, what you are seeing with One Big View can be done by bringing in the data model at the individual view level.
There is no question this is an alternative and in no way a replacement for using data models inside the semantic model.
However, this approach has the following advantages.
One.
The SQL developer has full control over how the SQL is written.
Therefore, the developer has greater control over performance optimisation.
Constraints for the report such as the length of time the report will be for can be coded into the view and not into Power BI. Therefore, it can more easily be changed.
Two.
Little time is needed to create the semantic model in Power BI.
No time is needed to remove columns from the Semantic model in Power Query.
No time is needed to create joins and hide columns in the semantic model.
Three.
The SQL Server Optimiser may create a better optimisation plan against a view than it does against SQL being sent to it from an external tool.
Four.
The actual report will run faster because there are no joins to perform in the Power BI queries against the semantic model itself.
Five.
When creating dashboards for things like plan versus actual or sub group variance from group, it is possible to do that in the view.
Indeed, now we have common table expressions, very complex views can be created and then presented as either fact tables or one big view.
So even when the data itself is coming from many source fact tables, one big view gives you the opportunity to use just one view for the report.
That is the end of the advantages.
The main disadvantage of creating such a view is that the amount of data being sent from the database to Power BI is larger.
If you are sending millions of rows it will be much larger.
So the use of One Big View is not an all or nothing proposal.
It is a proposal where the person designing the report should decide if they are going to use individual tables in a data model or one big view like this.
That person should be trained to know the advantages and disadvantages of One Big View and then be able to make a well informed decision on a report by report basis.
The improved development simplicity and improved performance at database and report level is balanced with the extra data that has to be sent across the network.
There is one other point to make.
One Big View can only be used like this when the results that are desired in the report can be presented in one fact table.
There are many reports that can not be reported in one fact table.
The most common such report is demographics analysis.
To do demographics analysis you need six mini fact tables in the semantic model.
There is no realistic way to do demographics analysis using just one fact table.
I would also like to show you a part of the semantic model in an example Power BI report.
In the video you can see it on the screen now.
You can pause to read it in more detail.
On the blog post you can click on the image below to increase it to full size.
You can see in the one big view on the left that Power BI has sorted the columns into alphabetical order.
To the right you can see that I have created some folders to put the columns into.
Of course, in Power BI columns are sorted into alphabetical order.
So within each of the folders the columns are in alphabetical order.
You could put numeric fields in front of the columns to sort them into the order you want.
You can see that we have a Key Performance Indicators Folder at the top.
You can see that we have the week level fields.
And so on.
I have closed the Retail Outlet folder because it has quite a few fields in it.
Everyone who has used Power BI can see that this one big view will import the data needed for a single report.
In Summary
Now to summarise what we have shown you today.
We have given you a basic introduction to multi level dimensional models.
Our data models are all multi level dimensional models.
We showed you that we could have one fact table with many levels of data in it.
We showed you that we can have a fact table level column which uniquely identified the summary level.
And then we showed you that we can create one big view over the top of that view.
The one big view retrieves all the rows needed for one specific Power BI report.
It must retrieve data at the lowest level needed in the report.
Lastly, we showed you a simple piece of the semantic model for this one big view that was read into Power BI desktop.
Anyone who is experienced in Power BI and data models can see that this is a very useful alternative to using a data model inside the semantic model to achieve the same result.
We listed the advantages and disadvantages.
If you feel there are more advantages or disadvantages we should add to the list then please let us know.
If you have any questions?
Please feel free to speak to me on X.
I have made X my primary mechanism of talking in the public.
Even my email is heavily censored.
And with that?
I hope you found this blog post interesting and informative.
Thank you very much for your time and attention.
I really appreciate that.
Best Regards.
Esther.
Peters A I Assistant.