Note: You can listen to the blog post on the video / mp3 below or read the blog post.
There are some extra comments made in the audio.
Hello and welcome to my latest blog post.
Thank you very much for dropping by my blog post today.
I really appreciate you dropping by so thank you very much for that!
In todays Blog Post I want to address a number of aspects of the product and processes being promoted by Incorta.
I have sat in on a number of their presentations recently and they are creating quite some buzz around the BI industry.
I am sure they have some happy customers who are able to do things they never thought were possible before and that’s great.
For more than 30 years I have promoted data warehousing and BI and I would like to see more successful projects and fewer failures and as few as possible outright disasters.
I am one of the most experienced people in the world in the area of ETL and data models underlying analytical applications that deliver the foundation for sustainable long term profit growth for companies.
In fact I was talking to an old friend the other day and he mentioned that now that Ralph Kimball has retired and the two men who helped train me are not public figures, I would be the world’s highest profile man talking about dimensional models in the public.
When I thought about it he’s probably right. There is no one else out there publicly talking about leading edge dimensional models and how to build them who has anywhere near the experience I do.
So…
If you do not know what Incorta is you can go and watch a demonstrations on the button below.
This blog post is for people who have seen a demo of Incorta and are considering the product.
It is also for those who have the product and who are facing the rather predictable issues that will come with trying to do what Incorta is recommending.
Firstly, what is Incorta recommending?
What does their product do? Basically?
Basically the Incorta product is a GUI based and simplified way to copy data from some operational system, usually ERPs or cloud based services, and to deliver that into a cloud based parquet “database”.
Then on top of that parquet “database” be able to run queries through a meta data layer that makes the underlying copy of the source data some what understandable.
Sure, there is a bit more to it but that’s the gist of it.
- Copy / replicate data from ERPs / Cloud services into another cloud database using the same models as the source systems.
- Put a meta data layer over the top to simplify the underlying model to some extent.
- Query that meta data layer like it was a real model useful for BI.
- Make it all go fast because it’s on a columnar database.
Now.
As I have said.
Incorta may have many happy customers and those customers my like what they have. That is not my point.
The purpose of this blog post is to point out that what Incorta is doing has been done many times before.
It is to point out that there is a reason why Bill Inmon, Ralph Kimball, and myself have NOT recommended that companies build data warehouses like this for more than 30 years.
The top line reason is that they don’t work properly.
But we will go into some of the details.
In the early 90s Bill Inmon defined the data warehouse to be a…
- Subject Oriented.
- Integrated.
- Non-Volatile.
- Archival Data Store.
To be used for the purposes of supporting the management decision making process.
In 2008 Bill brought out his book on Data Warehouse 2.0 which extended the ideas of what could properly be said to be the data warehouse and what were all the components of the data warehouse.
So let me be straight.
Prior to 1997 it used to be pretty hard to build a data warehouse.
The bit that was time consuming and expensive in people terms was the building of the ETL systems.
But in 1997 I invented a new way of designing and building ETL Systems.
In that year I did what I would call my first “second generation” data warehouse using everything we had learned from 1991 to 1997.
That included the then cobol version of what has evolved in to what is now SeETL.
We built the data warehouse for Manulife Hong Kong for the services fees of just USD300K.
This was about the same cost of Bills Prism Software at the time.
Basically, I had invented a way of generating cobol code as the ETL system that reduced the development time down to about 1 work month per 500 fields mapped through to the data warehouse.
In 2003 a colleague of mine invented a way to get to 1,000 fields mapped through to the data warehouse per work month in the ETL development.
We stayed at 1,000 fields per work month until 2018 because no one else was even close to that productivity rate, so there was no point going any faster.
In 2018 we finally invented a way to get to 6K-8K fields mapped through to the data warehouse per work month.
We are looking for AI or other means to improve on that as well.
But even quite large improvements on that rate are quite meaningless now.
The fact is, today, in 2022, the cost of building ETL for a dimensional data warehouse is so low that it is no longer a major portion of the overall budget.
The maintenance of the ETL is also no longer a major portion of the overall budget.
If you are paying a lot of money for the development and / or support of ETL?
Then you should talk to me about what it is you are doing.
It is very likely one of my customers using my software can do that ETL development or support for you much cheaper than you can do it for yourself.
One of the major arguments Incorta is making is that the cost of ETL to build the data warehouse is high and so if you use Incorta you will avoid that cost.
I can tell you that the cost of building and supporting large complex data warehouses from ERPs is now counted in the range of 6K-8K fields per work month in development.
Indeed, I would do that work for USD8K per work month if you wanted to hire me personally!
That is about 30% of what I was charging to do 1,000 fields per work month 10 years ago.
So, on the basis of cost per 1,000 fields mapped into the data warehouse?
I can deliver at least 18X better pricing than I was delivering 10 years ago.
And 10 years ago we were already the lowest cost provider of ETL development services in the world.
Further, because ETL development is now so cheap, one of my customers is going in to the business of ERP BI Product development.
This is rather like what Incorta is doing.
So, let’s deal with the first piece first.
Incorta is proposing that the “data warehouse” basically be an image copy of the operational data.
Yes, they do offer some versioning and some other features that come with parquet.
But the basic idea is that the “data warehouse” is an image copy of the operational data.
I am sorry to inform Incorta, and the public, that we had the forerunner to this idea in 1996 and first implemented it in 1997.
In 1996 I joined Hitachi Data Systems and our basic job was to sell disk drives.
Given that we were selling was disk drives, it made a whole lot of sense to put more data on disks and stop putting it on tapes.
So we decided, in 1996, that Hitachi Data Systems would propose that the “staging area” would be on disk drives and not on tapes as we had previously done.
This meant that we were going to roughly double the amount of disk we sold per deal.
Of course, the staging area would contain tables that looked almost identical to the source system and would simply contain three new flags at the back of each record.
A fourth flag was added as standard for the tables in the staging area in 2000 when I did a project at Qantas Cargo in Australia.
If you would like to read about that these four fields are you can press the button below to read the blog post about these four fields.
So. By 1997, we were already implementing, as standard, a relational database based staging area that was pretty much a replica of the data that was coming to us from all source systems.
This is almost exactly what Incorta is talking about doing today.
Twenty four years later.
Sure, they are talking about having a GUI and talking about having faster hardware.
But what Incorta is talking about is building what we have called a “staging area” for the last 30 years.
And we put it on to disk, in a relational database, for the first time, in 1997.
In IT terms that is ancient history.
But we didn’t stop just with the data in the staging area.
We propagated the data to the data warehouse.
We used Bills Time Variance Plus Stability Analysis Models for the archival layer.
And we used Ralphs Dimensional Models for the end user query layer.
Later, in 2002, as a result of working on the shoulders of the greatest data warehouse data model designer that I know, I was able to add one very simple idea that enabled us to store the archival layer in a dimensional mode design.
That man is not Bill or Ralph, just by the way.
You can read about that idea on the blog post by pressing the button below.
So, to summarise.
By 2002 we had dimensional data models that could store archival data.
Thus, we eliminated the need to use Bills Time Variance Plus Stability Analysis style models to archive data.
We had one data modeling technique that we could use in the Operational Data Store, the Archival Layer and the analytical Layer of the Data Warehouse.
By having just one modeling technique used we were able to vastly reduce the overall cost of building the ODS, the Archive Layer, and the Analytical Layer components.
We could also build these components at a rate of about 1,000 fields from source tables per work month.
In short, by 2002 the development cost of the ETL systems we were delivering had reduced by a factor of about 3 over what we were doing just 10 years earlier.
As I mentioned, because we were so far ahead of everyone else at 1,000 field mapped per work month we never bothered to try and find a faster way until 2016.
And it took us until 2018 to find that faster way.
The Data Vault is very well suited for businesses who have high rates of change for records of data and do not have high rates of transactions for records of data.
A very good example of an industry area where Data Vault would be preferable to our models is welfare management for governments.
If you were going to implement a data warehouse for the government department that managed Welfare payments you would be much better off implementing data vault over our models.
I also want to give you an idea of how fast it is now possible to build a staging area.
By staging area I mean implementing the extraction processing, the delta detection processing, and the update processing from the delta detection in to the staging area.
This is required to add the four fields that we have had as standard since 2000, as mentioned earlier.
One of my customers, using SeETL, built the staging area for an ERP with 35,000 fields in it in just 4 days.
This was on SQL Server.
This included partitioning the largest tables across multiple hard drives in their own storage groups.
There was proper storage group allocations for all the 1,800+ tables.
To summaries what was created.
- All the tables were generated.
- All the indexes were generated.
- The staging area was built so that it would accept inputs from multiple versions of the ERP. A feature which has proven to be extremely useful.
- All the code was generated to extract, land, perform delta detection, and forward the inserted, updated data into the staging area.
- There is an option to detect deletes as well and mark rows as deleted from the operational system in the staging area. This is usually only turned on for smaller tables.
Each source system table requires 4 tables to be created in the staging area.
So the staging area had 4 times 1,800 which equals 7,200 tables created.
All these tables had to be placed into the correct storage groups.
All the indexes had to be created and placed into correct storage groups.
There were 6 fields added to every source table as part of the process.
Each source table required 7 SQL Snippets to be created that must be run sequentially and should only be run if the prior snippet does not fail.
So there were 7 times 1,800 which equals 12,600 snippets of SQL that were generated.
They were all placed into our scheduler and enabled to be run with parallel streams for faster processing.
The entire processing of the deltas to the ERP and migration across to the staging area takes about 30 minutes in that particular customer.
And Gentlemen, the number to be very conscious of is this.
All that was done in just one work week.
Just forty hours of work.
That is what the customer paid on their invoice for that work.
So.
The situation is this.
As long as the source ERP tells you all the things you need to know, like primary keys, data types, etc, you can now have a staging area built for your ERP in about one week.
Of course, the people who do that would have to check a number of items.
But that’s what it cost in one case I am aware of.
Or course, on that staging area you can place as many indexes as you want to improve your query performance.
And if you really want it to go faster you can always copy it into Sybase IQ which is a very, very fast columnar database.
Because the staging area marks the rows that have been updated in the ETL processing it is very easy to build loading programs for a Sybase IQ target that takes those rows and puts them into a Sybase IQ copy.
So that is the first thing I wanted to deal with in this blog post.
It is possible to build a staging area that adds 6 columns to each table for increased functionality of the staging arear, for a 1,800 table, 35,000 field SQL Server based ERP in a week.
It will take longer to run the initial loading of the staging area.
If you asked me to do it?
I would charge you USD2,000 to do that weeks work.
For all intents and purposes?
This is free.
Now. Why didn’t we just stop at copies of operational data?
Why is that not enough to support the management decision making process?
There are a number of reasons.
- You don’t get to keep your history properly.
- Data within an ERP is not properly integrated and consistent. You need to define rules on how to join items.
- Data within ERPs is not subject oriented, the data is transaction oriented.
- Data from multiple sources do not have matching keys. Those matching keys need to be created and maintained some how.
- Data within ERPs have notoriously high rates of data errors, especially integration errors.
We found out the hard way, long ago, that even with the best will in the world, asking questions on the staging area prior to the ETL processing was fraught with danger.
We often got the wrong answer without knowing we got the wrong answer.
Nothing will kill a data warehouse project faster than giving the business numbers you believe are correct that later turn out to be not correct.
That happens often when answering questions on the staging area so great caution must be taken.
While it is theoretically true that you should be able to answer any questions from the staging area, meaning an image copy of all the data that is coming from sources, this
is not true in practice.
There is one, underlying, very good reason for this.
Source data in ERPs, or any operational system, is not 100% accurate.
In any source system you should expect to see errors in the data.
These errors need to be corrected BEFORE you give the data to end users to query and BEFORE you create reports or dashboards for end users to use.
Let me talk about the two most common errors that occur when querying source data.
1. Accidental Cartesian Products.
When querying an ERP data model and formulating questions it is extraordinary simple to accidentally create a query that will contain within it a cross product. That is, a join that will return too many rows for some of the data in the query.
These accidental cross products are extremely hard to find because it is the data inside the table that is causing the cross product.
Further, such cross products usually only apply to a very small proportion of the data in the query so the query result does not look incorrect.
It has proven almost impossible to notice these errors in these queries that produce just a very few extra rows in the output result set.
Further, such errors may be introduced during the normal day to day running of the ERP and so reports and queries that were tested and found to be correct can become incorrect at a later date due to changes in the data.
In one client I worked with they had about 80 reports in Business Objects that were based on a set of replicated databases that had views over the top to enable business objects to query these copies of the production data.
When we replaced these 80 reports with an underlying dimensional model almost all of them were found to have small errors in them.
The reports we were producing were correct.
They had been living with these very small errors in their reports for years.
2. Dropped Rows Due To Referential Integrity Failures.
In all ERPs you will find instances where data that is supposed to be available in another table as part of a join is missing.
It was not entered.
It was entered incorrectly.
Or even it is simply delayed in being entered.
When writing complex joins this data that is missing for some of the joins is extremely hard to find.
This missing data can be introduced during the normal processing of an ERP.
With the best will in the world and the most pain staking work in the world, it is simply not possible to avoid these two problems occurring in the source data of an ERP or any large operational system.
These problems have to be dealt with and, to the best of my knowledge, they can’t be dealt with simply by running a lot of data validation routines because you do not have an answer to the question.
“When I find an accidental cross product or a find dropped records due to lack of referential integrity, how do I fix that?”
The reason you don’t have an answer to that is because it has to be fixed in the operational system.
Now. Two of the features of dimensional models are this.
- You can not produce a cartesian product accidentally. Indeed, you can’t produce one even if you try to produce one.
- When there is a failure in Referential Integrity a “zero key” is placed into the fact record to mark the fact that the break in Referential Integrity happened.
One of the many features designed in to dimensional model is the GUARANTEE that it is not possible to accidentally produce cartesian products or lose rows due to broken referential integrity.
Even if no other advantage was offered by dimensional models, this single design feature would be enough for me to insist on using dimensional models for end user query and end user reporting.
In fact, back in the early 90s when I was first promoting dimensional models, one of the very common discussions I had with CIOs was that the data in the operational systems was simply too complicated for end users to comprehend.
Therefore, giving end users access to this information without vetting by IT was fraught with danger.
When I was explaining to CIOs that there was a new modeling technique that would make it possible for business users to reliably query data, I was often laughed at and sometimes even asked to finish up speaking and leave. The vendor equivalent of being thrown out.
In the early 90s, it was very well known and very well understood, that the promise of being able to copy operational system data into a relational database and reliably ask it questions was a false promise.
Many had tried, including us at IBM.
All of us had failed.
These small data inconsistencies in operational systems meant that the answers to questions asked of them contained small errors that were just impossible to find and guard against.
This is why, when I first started reading about dimensional models, I knew and understood that they solved the problems of cartesian products and broken referential integrity as a matter of data model design.
As early as 1993 I knew that dimensional models were going to win the battle of the modeling techniques.
I just didn’t learn how to build one until 1994-5.
To summarize this section.
Incorta is proposing that data be copied from operational systems, such as ERPs, across to a “fast database”.
Then a “business directory” is placed over the top and that copy of operational data is made available for query.
I see this to be fraught with the same dangers we well understood in the early 90s.
Those being accidental cartesian products, and accidental loss of records due to referential integrity failures.
These will produce small errors in the final reports that may go un-noticed for some significant period of time.
Once they are noticed there is usually significant concern at all the data in the data warehouse.
Further, we have the ability to replicate exactly this sort of thing today with SeETL very quickly and very cheaply.
One example being the creation of a proper staging area, with 6 fields added to each table, for an ERP with 1,800 tables and 35,000 fields, in just one week.
It took longer to run the initial load than it took to create the staging area in the first place.
Now. I want to move on to more complex topics than just the very well understood issues of getting incorrect results from queries on complex ERP data models.
Because I have recommended building staging areas inside relational databases since 1996 it has always been possible to query those staging area to answer questions.
It has always been possible to add indexes to speed up those questions.
In short, we have been doing what Incorta is talking about since 1997.
Creating a queryable database that uses the data model of the source systems.
I am not saying anything in contradiction that such a staging database is very useful and a very handy backup for looking into queries that turn out to be impossible to ask on the dimensional models for some reason.
The argument that “building ETL from the staging area to the dimensional models is too expensive” is now a dead argument.
I will personally guarantee to have dimensional models built for any ERP at the rate of USD8K per 6,000 fields mapped.
It might even be cheaper.
But I will guarantee right here on this blog post that it won’t be more.
That’s about USD60K services for an ERP with 50,000 fields in it as the base layer of data into the dimensional model.
Since 1997, on every project I have done, I took ALL fields from the ERP / operational systems (minus purely system operational fields) in to the data warehouse.
This proposal of “non lossy” data warehouses removed the discussion of “what fields should we put into the data warehouse?”
What else do dimensional models give us that we don’t have, and can’t have, in our staging area?
1. The ability to easily introduce groupings and hierarchies.
We all know that there are hierarchies in businesses that business people use to drill down on their data.
There are time hierarchies, geographic hierarchies, product hierarchies, demographics hierarchies and relationships just to name the most obvious.
In the process of going from the staging area to the data warehouse the ETL system builds these hierarchies.
If you take the Incorta idea of having these hierarchies defined at query run time and not defined are ETL run time your CPU consumption will be vastly higher.
2. The ability to maintain many levels of summaries.
It is still stunning to me, 25 years later, that the vast majority of data warehouse architects did not implement Ralphs ideas of multi-level summary data in the one fact table and multi-level dimensions in the one dimension table.
The only vendor who picked up on this was MicroStrategy.
This was because a number of the Metaphor people trained by Ralph finished up over at MicroStrategy.
However, MicroStrategy recommend each level of each dimension be in a separate physical table, and each level of each fact table be in a separate physical table.
I presume that the reason this was done was because this stopped other contemporary vendors like Business Objects and Cognos from stealing their customers.
Business Objects and Cognos can not properly query the models that were created by MicroStrategy consultants.
One of the main reasons that MicroStrategy remains “king of the hill” for the super large user base and super large volume data warehouses, is exactly because they implemented Ralphs ideas of multi level dimensions and multi level fact tables.
They just implemented these ideas in a slightly different way to secure the longer life of their customers.
Dimensional models allow you to maintain many levels of summary in the one fact table.
Ralph Kimballs old database company, RedBrick, performed aggregate navigation on top of these summary levels which is what made RedBrick so fast.
Today, we can maintain multi level summaries in fact tables using only SQL.
If you want to read the blog post about how this is done click on the button below.
We all know that in Business Intelligence we go from high level summary data down to detailed level data on dashboards, in reports, as well as in ad hoc queries.
That drill down from higher level to lower level is something that was observed in the 70s for business reporting.
The hardware vendors like to tell you that you just need faster hardware to make this more responsive.
The software vendors like to tell you that you just need their cube products like Essbase, SSAS, TM1, Cognos to make this faster.
Tableau and Qlik will tell you that you need their in memory databases to make this faster.
Since I first found out how this works, in 1993, I have been repeating Ralphs ideas on using multi-level dimension tables, and multi-level fact tables to make these queries go faster.
In fact, I have just invented a new way to query these databases that makes the levels easier to navigate no matter what query tool you are using.
Ralph and I talked about these multi-level models on DWLIST in the late 90s until we were blue in the face.
People just did not want to learn how to implement them.
I guess Michael Saylor has been rubbing his hands together all these years that no one else really jumped on the band wagon of building multi-level models and navigating them properly.
By summarizing the data from the detailed transactions up to the higher level multi-level summaries, it is possible to support many of the “high level” dashboards, reports and queries from this highly summarized data.
Then, as the user drills down to more detailed levels the aggregate navigator can query the database at the level the data exists closest to the level being asked for.
You can’t do this with the “staging area” approach of Incorta because you don’t have a place to store the rules for the aggregations.
And you don’t have the ETL processing to perform incremental updates of multi-level summaries.
It should not be forgotten that if the claim is:
“if you just buy faster hardware and more processing power you will speed up your summarization processing”
then if you apply that same amount of hardware to already summarized data you are going to get blinding speed.
More usually, companies settle for the speed they want, and if you have multi-level summaries in a dimensional model, you are going to get much better query performance with much less hardware.
Meaning, multi-level summaries, which cost almost nothing to create and maintain, will save you a lot of user wait time and a lot of hardware and software costs.
With SeETL it is possible to add multi-levels to dimensions and to fact tables by the mere documenting of what you want in the control tables.
To maintain the multi-level summary fact tables in SQL there are some SQL Snippets that have to be created because currently we have not seen the value in generating those few snippets from the SeETL workbook.
3. The ability to link attributes and to add attributes that are not linked in the ERP.
The next section I want to comment on is the fact that simply copying ERP and source system data does not do anything for you in terms of being able to add additional attributed from other sources and to integrate them to the ERP data.
A key component of a dimensional data warehouse is the ability to much more easily integrate data from other sources, including external sources to the company.
The issue with many sources from many systems, as we all know, is that the keys don’t match.
So we have to build and maintain cross reference tables to match keys between systems.
Then we need to link them. And we all know that integers are the best data type to use for joining tables in relational databases.
So no matter what we do.
We need to be able to bring data from many different sources together.
We need to have cross reference tables.
And the final results we are best advised to link via integer keys.
You do not get any of these functions with a copy of data from operational systems.
All these features have to be added.
The customer dimension association table is a good example of how such data is brought together in an association table.
This sort of data structure is simply not possible unless you are using integer keys for data being sourced from many places to be used to profile parties, most importantly customers.
Please note this is the same link as previously presented in this post.
4. The ability to create very Complex Fact Tables.
I know this blog post is epic.
But I wanted to be thorough.
The last topic I wanted to cover are the very complex fact tables.
Businesses are complicated.
If you can deal more effectively with those complications, you have a better chance of being more successful in the business.
In some businesses there are some very important business processes that are best supported by some very complex fact tables.
Some examples are pipeline fact tables, snapshot fact tables, plans vs actuals, period to period fact tables, period to date fact tables and all these sorts of things.
Depending on your business these can be incredibly important and provide you with very significant opportunities for profit improvement.
In such complex fact tables you have to bring data from many transaction level fact tables and place them into one very complex fact table.
This is simply not possible to do through a series of views on top of data in a fast database.
I don’t mean “it’s slow”.
I mean it is simply not possible.
Let me give you an example from a customer I did years ago.
This was a retail customer where the buyers had to negotiate the contracts for their products from the large vendors selling products.
The large vendors push hard on margins and the retail buyers were getting beaten in their negotiations.
So we decided that it was worthy building a very sophisticated tablet application that would put all the data about the relationship with the vendor the buyer needed in his negotiations on price.
Because you could not be sure where and in which directions the negotiations would go, we decided to give the buyers an incredibly comprehensive suite of data.
But, of course, response times had to be sub second.
So all the metrics that needed to be available had to be made available in just two fact tables, one of which has more than 50 metrics in it.
It gathered data from more than a dozen different transaction tables as well as from snapshot fact tables.
Everything a buyer would want in a negotiation with a multi-billion-dollar multi-national was at his fingertips.
That application can not be built on top of “a copy of the data from the ERP running on a fast database”.
In fact, when we first tested creating the metrics for just ONE MONTH the run time was over FOUR HOURS.
We wanted to have these metrics up to “close of business yesterday”.
Because of how the ERP worked, we needed to re-build the current month, as well as the prior two months.
So we were looking at 12 hours processing time just to update one fact table.
In the end I personally spent about two weeks re-writing this ETL in to very high performance stored procedures to get them to run faster enough to be included in to daily processing.
You can not get this result simply by using a “fast database”.
If you did use that approach?
You would be vastly overpaying for the processing power versus 2 weeks of development effort to speed something up that will run, daily, for many, many years.
There are a wide variety of such complex fact tables needed in businesses.
If you don’t have a number of these then perhaps you are not thinking hard enough about what would be most useful for your company.
So. To summarize the whole blog post.
Incorta is proposing that “Business Intelligence Systems” can be created by simply copying source data from such things as ERPs, adding a business dictionary and a few other niceties, and put it all on a “fast database”.
Though there is some utility in this, and many people have tried to do this, there are some serious issues that have to be handled.
Issues we have never been able to properly handle to the best of my knowledge.
Most notably they are:
- The data in the ERPs is not integrated or subject oriented.
- The data in the ERPs will have instances of broken referential integrity.
- The data in the ERPs will have instances where queries produce cartesian products.
- The data in the ERPs is not naturally “multi-level” so summaries are not maintained.
- The data in the ERPs can not be integrated with data from other systems so that “data integration” has to be programmed and run repeatedly at run time.
That would seem to be a reasonably sufficient summary of the issues that we know arise from trying to query “staging areas” or other structures that are based on the data models of the operational systems.
In the end, I think it is enough to say, that over the last 40 years, many really, really smart people like Ralph, Bill, Charles Irby and others, took a really, really long and deep look at what was needed to support the management decision making processes.
They came up variations that all come down to what can be expressed as Bills Data Warehouse 2.0.
Data Warehouse 2.0 can be implemented on Data Vault Models or on Dimensional Models.
Data Warehouse 2.0 can not be implemented on “a copy of the operational data”.
Those who are looking to use Incorta might be well advised to inform themselves of what we have learned over the last 40 years.
You might thank me for pointing this out one day.
As ever.
Thank you very much for reading or listening to my blog post.
I am sorry this one is such an epic.
But I thought it worth my time to put this in to the public.
If people get themselves in to trouble with “copy the ERP into a fast database” then at least I did my best to help them avoid that.
I would more than happy to recommend on of my partners to fix any such messes!
Best Regards
Peter Andrew Nolan.