Note: You can listen to the blog post on the video or read the blog post.
Hello and Welcome.
I am Esther.
I am Peters A I Assistant to create voice overs.
I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.
Hello and welcome Gentlemen.
I put this post up on linked in and some how I could not find the source for this post.
I can’t believe that I did not have it in text somewhere, but there you go.
I could not find the source so I will type this up again.
There is a lot of discussion today about Data Vault.
What interests me the most is that the people who are promoting data vault are promoting it with an almost religious fervour.
They appear to have no idea what came before, and what the differences are between data vault and what came before.
So, I am just doing this blog post to set the public record straight.
As everyone knows I implemented the first Metaphor system in Australia in 1991.
This was the software from the company Ralph Kimball co-founded.
To make the story short, I was able to go to Metaphor in September 1993, for training and for the Global Users Conference.
At that Global Users Conference I met people from many of the leading companies in the U S and Canada.
Not the least of which were Coca Cola, Proctor and Gamble, Wall Greens, Better Homes and Gardens, NyNex and many others.
The stories they told of the amounts of new revenue they were able to generate through marketing campaigns staggered my imagination.
The head of B I for NyNex said that he was turning money down from the board because they simply did not have the capacity to run any more marketing campaigns.
They were running marketing campaigns at 100 miles per hour, and if given five million more dollars there was nothing useful they could do with it.
Of course, me sitting there trying hard to make a Metaphor sale heard all these stories and hoped that there was some way that I could help my customers achieve the same results.
It was at this conference that I learned that all the data models implemented by Metaphor were multi-level dimensional models.
I learned that the reporter tool was specifically designed for these models.
I had exposure to a very large multi-level dimensional model at Standard and Chartered Bank earlier in the year, but I did not have time to study the code that created it.
I was busy rescuing the project.
I asked Cathy Selleck, then CEO of Metaphor, if it was possible for me to receive training on building dimensional models.
Her answer was that it was considered confidential information, and that I would not be allowed that training.
I found out from the head of Professional Services that Metaphor actually made more money from consulting than they made from selling the software in the first place.
I found out that the number one revenue earner by segment in the company was actually the guys building the multi-level dimensional data models and populating them.
As expensive as the Metaphor software was, the development of the multi-level data models was much more expensive.
I had not realised how expensive the back end development was, because I had to do it myself on the two projects I had done.
The other thing that happened, that was of great significance in my life, was that Bill Inmon was invited to perform the Key Note Speech for the conference.
I had the good fortune to have lunch at the same table as the head of Coca Cola B I World Wide the day before.
He had told all the people at the table that he had personally invited Bill Inmon to be the key note speaker, and that we should all listen up very carefully to his presentation.
I sat in the front row, and I didn’t understand a thing Bill said.
So after the presentation I introduced myself and asked him if he had written any books I might be able to buy.
Bill told me to buy his book Building the Data Warehouse, and so I did.
So, one of the main points I want to make in this presentation is this.
As early as September 1993, pretty much every Metaphor customer knew who Bill Inmon was, and what he was proposing.
There were over 400 people at that Key Note presentation.
Virtually every Metaphor customer had a person in the room to listen to Bill Inmon talk about archival models.
A year later we had Bill come to Sydney, and we got everyone interested in data warehousing to come along to his seminar.
It was at that seminar I really got to understand that Bill was talking about archiving all the data, all the time, essentially for ever.
I remember putting up my hand and saying, but Bill, we have enough trouble even recording all the transactions in companies today. The idea of archiving all the changed data is just too much data, how could we ever do this?
And I will never forget Bill said, Peter, we are talking about volumes of data that are beyond your imagination today.
Here, thirty years later, Bill was right, again.
Today we are talking about archiving volumes of data that were beyond our comprehension in 1994.
I did my first terabyte data warehouse in 1997.
I could not believe we had a terabyte of disk to put data on.
I had to go and look at the disk drives myself.
Today, people are storing petabytes of data in data warehouses.
Of course, Bill proposed what was called Time Variance plus Stability Analysis, in his seminal book.
So, by 1994 we all knew there were such things as dimensional models, and we knew there were such things as archival models.
And never the two shall meet.
You implemented one, or the other.
Then, in 1996, my mentor sent me an email.
He trained me on dimensional models and had moved to Price Waterhouse to head up their launch into the data warehousing space.
He was working on a tender response for one of the largest Insurance companies in the United States.
He was certain that he should propose dimensional models.
However, there were so many other needs described in the tender, that would be better suited to an archival data store.
We went back and forth for a few days before we some how settled between us, that he should propose both an archival data store, and a suite of dimensional models.
This was going to be expensive with a capital E, but my mentor decided to put it into the tender response.
In the end he won the tender, the project was done in about 9 months , and the results were fantastic.
This design technique was then made standard in Price Waterhouse.
The default proposal would be to have an operational data store with a third normal form model.
An archival data store which would have a time variance plus stability analysis model.
And a dimensional layer for the users to access.
I personally implemented my first models that worked like this in 1997.
At that time, in late 1996, I was working on a tender for the Australian Customs Service.
We had decided to put Bills Prism Solutions Software into our bid, and so I was working closely with the vendors of Bills software in Australia.
At some point in these discussions, someone gave me a paper that described the “Data Vault” idea, by Dan Linstead.
The paper did not call it a data vault.
It was called something like, an alternative data modelling technique for building archives.
This paper was circulating around PRISM Solutions.
I red it and immediately understood that it was an upgrade to time variance plus stability analysis.
However, we were having enough trouble selling what we were selling without introducing a more complex, and unknown, data modelling technique into the mix.
So that paper got put into the, interesting idea and maybe some time in the future we can sell it, bucket.
But I do want to make the point that those of us who were involved with PRISM Solutions first saw the idea of Data Vault in the late 1996 to early 1997 period.
We all agreed it was a good idea.
We also all agreed that it was too hard to sell right now.
My mentor and I were looking for something different.
Because dimensional models were necessary, we were trying to figure out how to archive data inside the dimensional model properly.
Properly meaning much better than type 2 dimension tables.
My mentor and I spent five years trying to solve this problem from 1996 to 2001.
When we could not figure it out, we figured it must just not be possible to do it.
But we kept an open mind and hoped that one day we would find out how to build such models.
Well?
In February 2001 I did the Sybase I W S training class.
The presenter puts up on the wall a slide he called a party profile.
I looked at it, and inside 2 seconds I realised I had found what I had been looking for, for the last five years.
I immediately put my hand up and asked the presenter who thought of this idea.
He said that he did, and I knew I was in the presence of the worlds best data warehouse data modeler.
We became good friends.
He lived just around the corner from us in Dublin and our children went to the same school.
So our wives and children would hang out together, while we were out in the world fighting dragons, to bring the bacon home.
In 2002 I had the good fortune to have this man come and work on a project with me for two weeks.
It was like watching the mind of God at work.
As I watched him draw diagrams on the white board, I knew I could never be as good as him, and it gave me something to aspire to.
Back to the class.
We were discussing this new party profile table.
I was asking the presenter if he understood this meant that there was no need for a separate archival layer if we have this modelling technique.
He confirmed that he knew this, and also that this modelling technique could be used for operational data stores.
This meant that Sybase could deliver an operational data store, an archival data warehouse, and dimensional models over the top, all with the one data modelling technique.
Because of this Sybase data warehouses were about half the cost to implement, than with an archival layer.
As I was sitting there, I realised this is why they were selling these things like hot cakes.
Anyone who knew what they were doing were going to buy these models over the Price Waterhouse Coopers standard.
The man giving the presentation gave a great big smile and said it was nice to meet someone who knew what his models really meant in the real world.
I was sitting there thinking, oh my goodness, we are going to sell so many of these, it’s going to be GREAT! .
And so we did.
We were selling so many copies of Sybase I W S we could not deliver them.
We literally had a queue of customers waiting for people to come free to do the implementations.
We also had problems getting enough people trained to do the deliveries properly.
The point I want to make here is that those of us who worked on the Sybase I W S models, understand perfectly well that there is massive value in being able to archive data properly in a data warehouse.
The idea of an archive is not lost on us.
It was a key part of the training that we did on the I W S data models on day one.
So where does that leave us with the idea of the data vault?
Well, since 1993, and even earlier for some people, we knew there was tremendous value in archiving data from operational systems into some sort of archival data store.
We knew we didn’t need to understand business requirements.
We just needed to archive the data so that when questions arose, we could answer them.
The problem was that these things were very expensive.
It was very hard to get budget for a project that sounds like this.
“We don’t know what questions might come up, we can’t tell you how much money you will make, but we are pretty sure this will be valuable. So if you would just like to give us five million dollars to build this, that would be great.”
Of course, you are not getting your five million dollars.
But if you say.
“We know exactly what data we need. We know we will make two million dollars a year in new profit. We know exactly how to build the data warehouse. It’s going to cost two million dollars. So, if you would like to give us the two million dollars, that would be great.”
Then you are likely to get your two million dollars to get your project done.
Once we knew how to do the archive in the Sybase Dimensional Models?
We didn’t need the separate archives.
In B I 4 ALL models the profile is known as an association table.
So, what are the weaknesses of these associations when compared to data vaults?
The B I 4 ALL models are intended for high transaction volume businesses, where the business knows who the customer is.
This means telco, retail, web, media and financial services.
However, I will not be doing banking models.
In these industries, the vast majority of the data volumes are transactions, which happen once and never change.
There is value in archiving accounts, parties, products and some other data.
But 99 percent of the data is transaction data.
Those are the market segments the B I 4 ALL models are for.
However there is another segment that we do not play in.
And, as usual, I will give you an example.
In 1999 I did a call at the Australian Department of Social Services, for Ardent Software.
Ardent Software, at that time, had the largest data warehousing development team, and we were the thought leaders in Australia.
This call has become legend for the people who know me.
I bring it up and talk about it to all my students, because it was such an object lesson in selling.
The Head of B I for the Department of Social Security had asked to please speak to me, because she had heard of my reputation from Australian Customs Services, and Department of Defence.
Not to mention I had recently won the Telstra Corporate Data Warehouse deal over IBM.
By 1999 I did not have to ask large companies to listen to me.
They would call their sales representative and ask me to please come and talk to them.
It was good times.
Anyway, the sales representative and I are ushered into this ladies huge office.
We do our welcomes and our pleasantries.
She says the department is working on a data warehouse tender and she has some questions for me.
I say sure and I am happy to be there.
Because we did not know what would happen in the meeting, she had given us no guidance, I had brought with me my slide deck of overhead projector slides, and I was going to use the slides to answer her questions.
As she saw me take the slide deck out of my brief case she remarked rather caustically.
Oh, great, another vendor is going to beat me to death with a thousand slides, how unusual.
I turned to look her in the eye and said.
I will tell you what. I will put my slides back in my brief case. And I did that.
And then I said.
You tell me what is the most complex problem you have that no vendor has ever been able to address.
And I will show you how we would solve that problem on your white board.
A BIG GRIN came over her face and she said, you are on.
Then she described her number one problem and I took notes.
She said words to the effect.
We spend more money than any other department.
We are the golden goose.
Both parties, meaning both political parties, want to change the way we refund money to Australians as welfare.
The left side want to hand out more money more easily to get more votes.
The right side would like to hand out less money to reduce taxation on their voters.
Both parties constantly accuse us of not doing the best for the Australian people.
Every time there is a change of government, the new government audits us to prove we were doing the wrong thing under their opponents.
Each party is constantly asking for the impact of possible future legislation.
Each party is constantly asking us for analysis on what would have happened if legislation that was implemented was not implemented.
They also ask us to run simulations on what would have happened if some legislation had been implemented but wasn’t.
So our biggest problems are these.
Both parties want answers to questions where both changes have actually taken place, and where no changes have taken place.
And they want answers to questions about impact to citizens for these proposed changes, and what would have happened when changes happened, if they didn’t happen.
And having said all that, she relaxed back in her chair and waited for me to admit defeat.
I got up and spent the next 45 minutes describing to her a ‘time variant data model” that would record both all actual changes, and have capacity to record proposed changes, as well as an area for multiple versions of proposals, that could be calculated and archived.
In short, I drew diagrams of multi versioned archival models, along with multi versioned forecasts or “derived data”.
By the end of the 45 minutes, she was standing next to me at the white board drawing extra pieces on the diagrams.
The sales rep just sat their quietly.
At the end of the hour she shook my hand.
She said. Your reputation preceded you, but I didn’t believe a word of it.
Then she asked me.
Can your people and your software build this?
And I said.
Absolutely. We know how to do this.
Today, this sort of situation would be perfect for Data Vault.
Dimensional models are not good for what was described that day on that white board.
Government departments have to deal with constantly changing legislation, which creates change in terms and conditions, and the status of accounts.
Government departments also have the problem of theoretical changes, both forwards and backwards in time, and wanting to compare multiple versions of theoretical changes, both forwards and backwards in time.
Archival models, like data vault, are superior in these situations to dimensional models.
In this sort of situation an archive plus a dimensional model would still be best.
The B I 4 ALL models as an archive would not be appropriate in this sort of situation.
We would not try and put that round peg in that square hole.
So, in summary.
We have known since 1993 that keeping archives of data is very valuable.
Being able to answer any question that comes up is a certain way to make more profit in a commercial organisation.
Many of us built many of these over the years.
With the invention of Sybase I W S profiles, we were able to archive data such as customer data, account data, campaign data and other data over time.
This meant that we did not need separate archival models, and dimensional models, in commercial organisations that have high transaction rates, and the customer is known.
Where we have very different needs, like the Australian Department of Social Security?
Dimensional models would still be needed in the reporting layer.
And an archival data store, like a data vault, would be superior as the archive layer.
So, I hope this blog post adds a bit of clarity to the debate of dimensional models to data vault.
To the best of my knowledge, I have never heard anyone say data vault is a bad idea, because it’s a very good idea.
It’s just that we found an alternative way, of archiving data, in a dimensional model, that works very well for a wide number of cases.
If you have any questions?
Please feel free to speak to me on X.
I have made X my primary mechanism of talking in the public.
Even my email is heavily censored.
And with that?
I hope you found this blog post interesting and informative.
Thank you very much for your time and attention.
I really appreciate that.
Best Regards.
Esther.
Peters A I Assistant.