Note: You can listen to the blog post on the video or read the blog post.
Hello and Welcome.
I am Esther.
I am Peters A I Assistant to create voice overs.
I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.
Hello Gentlemen.
I saw a post this week that was talking about the numbers of profiles on LinkedIn that had data engineers, data scientists and these sorts of titles in them.
They were in their hundreds of thousands.
I don’t believe the numbers are even in the right ball park.
I would have a hard time believing even half of the numbers quoted as being close to real.
There are not that many jobs in “Business Intelligence” or the “data” business.
Those numbers are more like the numbers of people who cut and paste data into Excel, and create a pivot chart.
What are all these people doing?
I wonder.
I was saying to one of my contacts today that the data warehousing space is a very small world.
If it was a big world with those sorts of numbers?
I would still be able to sell into the market, despite my ex wife’s slander about me.
Today, I want to talk about what I see as the future of data warehousing.
This is especially in the area of dimensional models.
Just before people yell at me, but what about Data Vault?
I am well aware of Data Vault.
I first became aware of Dan and his ideas in 1997.
I have followed along with his career all this time.
We are not friends, or even online pals.
I have read his 2015 book on Data Vault and I understand it.
Dans idea of how to archive data was better than Bills idea. Sure.
But it is more expensive.
More importantly, it was far too hard to sell in the late 1990s.
I don’t plan on using Data Vault because, I, personally, will continue to work in high volume transaction commercial businesses.
In these businesses dimensional models will remain the unrivalled standard.
I just included that bit because a lot of people talking to me about Data Vault, and a lot of people I see talking online, are talking about it like a religious conversion.
They seem to think if they just repeat to me often enough.
“But data vault is new and archives your data perfectly.”
That, somehow, I will convert to their religion.
It’s really quite strange, that anyone who knows me, would imagine I don’t know what Data Vault is.
So let us move on.
What do I think is the future for dimensional model data warehousing?
The main obstacle to building large dimensional models has been the 1,000 fields mapped per month rate of working.
We were at that rate for about 20 years.
I was able to map 1,000 fields per month from 1997 to 2017.
In 2017 and 2018 I did some pure research.
I tried to find out if there were any way to dramatically increase that mapping rate.
It turns out that it is possible.
We are now at 6,000 to 8,000 field being mapped per work month.
Anyone who wants to map data from a source to a target, at that rate, can get See T L from my drop box, and learn how to do that.
It’s free.
The version I am using has a few new features.
But all the features to map at 6,000 to 8,000 fields per month is in the public version.
What does this mean for our industry?
This means that a single data warehouse architect can map a very large operational system, mapping all fields, relatively quickly.
For example, it should take less than a year to map a system with 60,000 field in it.
Another feature we have added to our models since 2010 is this.
We added a customer number and a source system number.
Each of our customers have these two fields embedded into their ETL.
That means that many customers data can be ingested into one staging area and one data warehouse.
Then views can be created to expose just the data the customer owns back to them.
These views can be in another database, in the case of SQL Server, or in another schema, in the case of Oracle and DB2.
In this scenario you need to have one landing area per customer.
You need to have one set of delta detection processing from the landing area to the staging area per customer.
But you only need one staging area, and you only need one data warehouse.
And you only need one set of ETL from Staging to the Data Warehouse.
You do not need any data marts.
When you are talking about 10,000 plus tables in the staging area, and perhaps as many as 5,000 plus tables or views in the data warehouse?
You can see why you only want one set of tables and one set of ETL.
By using a customer number and source system number you can identify the correct rows to read per customer.
Where a customer wants their own copy of their data available on a dedicated machine to query?
I believe we will put their data into the large common data warehouse, and then forward their individual data to their own dedicated machine.
See T L already has the features to do this.
In See T L we have the ability to forward any subset of data to another schema.
This is because we have the current batch number and audit timestamp on every row.
Because we do not actually delete any rows from the data warehouse, we do not have the problem of detecting deletes to forward from the central data warehouse.
So, in the future, I see data warehouse architects will build target data warehouses closely, or even tightly, linked to very large operational systems.
We will build them using the idea that you have one set of target models.
These target models will be a super set of all the data for all the customers whose data is housed in that super set.
These models will run on cloud machines and, most likely, on column based databases.
I predict that the days of a data warehouse architect developing their own data models, for their own company, or for their own customers, is going away.
There will be a very small band of men, like me, who will build these massive models.
We will have some apprentices who run them, and look after them.
Of course, one day one of my apprentices will take over from me, in my role.
But personally?
I don’t see any future, for any significant number of people, to be actually building dimensional data warehouses.
The number of people who are going to do that is going to be very small world wide.
With See T L and B I 4 ALL being free and open source now?
My opinion is that the race will be on.
Those men who embrace See T L and B I 4 ALL and build a large data model for some large operational system they have profound knowledge of?
And then they sell the use of those models?
Those men will do very well for themselves.
Those who simply keep building custom dimensional models for companies?
Those who ignore these large models that will become available for use?
A lot of them will be serving burgers and fries at McDonalds soon enough.
Of course, that is just my opinion.
We will see how we go.
But I am putting my prediction out there now.
Those men to adopt see T L and B I 4 ALL to build such large models?
I predict will do very well.
And with that?
I hope you found this blog post interesting and informative.
Thank you very much for your time and attention.
I really appreciate that.
Best Regards.
Esther.
Peters A I Assistant.