4 Key Questions About Your BI Methodology
Hello and welcome to this new blog post. Today we are going to talk about one of the most important ingredients of guaranteeing success of a Data Warehousing Project. This is a “do not pass go without it” situation.
I mean the methodology that is going to be used during the project. I see so many projects that have failed and when I ask “what methodology were you using”? the answer very often is “Agile” or sometimes even “no specific methodology” at all.
Let me say this right here, right now. If you are not using a data warehousing specific methodology then your chances of success are very significantly compromised. It is not that a methodology will guarantee you success. Far from it. But NOT using one is asking for trouble.
I have a massive amount of experience with IT development methodologies in general and BI methodologies specifically. The first methodology for BI that I was exposed to was the Metaphor Computer System Methodology in 1993. I wrote the BI Methodology for SAS Australia in 1995. I wrote the Hi-STAR Warehouse Methodology for Hitachi Data Systems in 1996. I used the Price Waterhouse Coopers Methodology in 1998. I used the Prism Data Warehousing Methodology in 99-00. I used the Sybase Systems Integration Methodology from 2001-05.
I do not know any other person who has had exposure and used as many of the leading data warehousing methodologies as I have. If there is someone out there I would be interested to meet you! So I know data warehousing methodologies. I know good from bad.
We have developed our own methodology that we have adapted to be tightly linked to our products. Our methodology is a result of the many years of work that I have been involved in over the years. We have produced a one hour video about our methodology. You can download the presentation slides from this link.
Naturally, I am not going to repeat our methodology presentation in this blog post. You can go and hear what we have to offer when ever you would like.
In this blog post I am going to give you four key questions to ask about the methodology you are using. If your vendor can not answer these questions off the top of their head sensibly? You should start asking some harder questions.
1. Is it a waterfall methodology?
This slide is the most important slide in our whole sales kit. It shows the way in which our software enables us to alter the way we deliver projects. Click on the image to get the full sized image.
The traditional approach to building a data warehouse is to follow a waterfall methodology of some sort. The phases run something similar to:
- Requirements
- Data Models Design
- Data Mapping
- ETL Design and Build
- Applications Building
- Testing
- Move to Production
Once the applications building has begun and new ideas are forthcoming about what would be valuable those ideas go in to 2.0 which may or may not ever be implemented.
Typically these phases have some level of overlap. The greater the overlap the greater the risk of the project being unsuccessful becomes. In an ideal world where you have unlimited time and money you would not overlap phases to reduce the risk. But we do not live in such an ideal world. Not by a long shot.
The biggest problem with this style of data warehouse implementation is that you do not get to start building applications and you do not get a populated data model until after you have coded a significant portion of the ETL in your ETL tool.
We have always tried to build prototypes to reduce the risk. But in the 90s, when we were dealing with complex ETL and large volumes, we could not build prototypes in anything more robust than Microsoft Access. These were tiny and simple prototypes that very often had significant flaws in them. We did our best but often our best was fraught with problems. Those were tough times to get the data models as good as they could be before sitting down and coding the ETL in a vendor ETL tool like we used to do when I was the Professional Services Manager at Ardent.
When I wrote my list of features that an ETL tool needed they included:
- Typeless.
- Mappingless.
By this I meant that the ETL tool itself should have no idea of data types so that there can not be a problem with data type mismatches. Data type mismatches were the bane of our ETL developers at Ardent. The slightest mismatch would generate a message that we would have to go and fix.
I also meant that the ETL tool itself should not know what mappings exist. The ETL tool should discover the mappings that are needed at run time. I proposed that there should be no dictionary where the mappings were defined or where the source and target tables should be defined.
I proposed that the ETL engine should be told which source and which target it was to use this time and that it should open the source, open the target, discover the column names, build a map of movements according to column names, and then process the data based on column names.
I proposed that the dictionary should be minimalist and that most data should be stored in the database catalog itself and read using ODBC to give greater transportability.
By doing this the processing could be abstracted to one program per type of processing and that one program could perform all the processing for that type of processing for all mappings.
Naturally, the ideal place to store the sources and the targets was inside the database as a view and not in some external catalog that would be used to define the processing.
This was implemented in the SeETL C++ engine and it was very successful.
This had the added benefit that it became possible to implement the prototypes to full scale before migrating the SeETL ETL to the vendor tool of choice.
Because SeETL enabled us to build the prototype to full scale we were able to collapse the “waterfall” methodology so that model design, data mapping and ETL design and build all happen simultaneously. This collapses the time line of the project, drastically increases productivity, and reduces risk.
Further, because the applications can be built starting much earlier feedback from the applications build can be fed back in to the data models design process.
If the client insists on using an already installed vendor ETL tool then the SeETL to ETL tool migration happens right at the very END of the project. And we have tools and methods of doing that very quickly and easily with good QA processes.
So the very first question you should ask of your methodology is this “Is it a waterfall methodology”? And if it is? That is a major contributor to risk. It is not appropriate and it is unnecessary.
You can compare the diagram we use with what you are doing to determine how you compare.
2. Do you have to spend major dollars before you are sure of what you will get?
I have bid for and won the largest data warehousing tender of the year in my country twice. Once for the Australian Customs Service in 1997 and a second time in 1998 for Telstra, the ex government telco in Australia. I have also won many millions of dollars of BI business in other tender responses.
One thing that I never particularly agreed with in the tender process is how the customer has to make a decision and back that up with a major amount of money BEFORE they know what it is they are getting. It is ok when you are selling hardware or software where you can read the specifications or manuals and you can be sure of what you are getting. You can sue or not pay if the delivered product does not match the manual or the specs.
But building a data warehouse is very different. As Bill Inmon famously said in the early 90s
“You don’t really know what you want from your data warehouse until after it is built and you start using it.”
This absolutely reflected my experience as well in the early 90s. My very first data warehouse “Business Requirement” was “I want to be able to ask any question I think of. I do not know what those questions will be. And I want the answer before I forget why the hell I asked the question in the first place.”
I mean? How do you tender for a system to do something like that?
You can’t.
Too often companies buy their data warehouse via a tender process where they have to make a decision and a major financial commitment before they know what it is they are going to get for their money. Sure. They have “business processes” to go through but it is a business process that increases the risk of failure far more than it reduces the risk of failure.
Tendering for projects is supposed to reduce risk and reduce costs by getting the vendors to bang their heads against each other over features, functions and prices. In areas where the delivery is well defined I agree. It does do this. You want to buy hardware? You want to buy software? Put it to tender and get the vendors to beat each other up on price. Great!
But to build your data warehouse? You should get the “almost finished” product before you put down the big dollars, pounds, euros etc.
If your vendor is wanting to lock you in to spending the big dollars and making a large financial commitment before you know what you are going to get? This should ring alarm bells very, very loudly in your ears.
Specifically. You should be able to have a reasonable sized prototype built before you put your money down for your production hardware or software. And that includes ETL tools which are quite pricey.
This all comes back to the ability to prototype your data warehouse before you have to put the big money down and that means prototype ETL or ETL that is free until you are mostly finished.
We make SeETL free for those people who sign a contract saying they will not deploy the ETL into production. It is our way of giving the “try and buy” option.
If our clients also use BI4ALL they can inspect the data models to a great extent before buying. For example? We have 7 hours of video that takes our prospects through the data models end to end so that they can understand what it is they are buying before they have to spend any money.
Further? We now license the BI4ALL models on an annual basis. If you do not like what you are getting then you don’t pay license fee and you decommission the data warehouse in favour of what ever else you want to implement.
But the fact remains. If your vendor is driving you to spend the big dollars before you know what you are getting? That should ring alarm bells. Your vendor should explain to you how they are free to stop at any time until the prototype is signed off as being what is desired and needs to be productionised in the chosen production tools.
Of course, for your part you must understand that prototypes are not production ready systems and you need to have the patience to give your IT people the time to turn prototypes into production systems. Too often I see business people put pressure on IT people to productionise a prototype rather than to make sure the job is done properly.
3. What is the level of transparency of progress?
It never ceases to amaze me how vendors can get away with the lack of transparency of their project work that they do. Really. As a vendor in many of my projects I went out of my way to make sure that progress was as transparent as possible, warts and all. This makes for more successful projects. Hiding problems from your client might be good in the short term but it is bad for business in the longer term.
To give you an example of the level of transparency you should be asking for, if not demanding, this is how we make transparency possible.
The SeETL workbooks are ALL able to be loaded into our dictionary. We are the only people who have an end to end BI development dictionary. Where the product does not make its metadata readily available, like the reporting tools, we have proxy documentation that allows us to enter metadata about the tools in to our dictionary. We do the same for stored procedures.
As the project progresses spreadsheets are versioned and loaded so that old versions can be queried. This means there is no temptation to be “fiddling the numbers” from week to week in reporting cycles because last weeks data is still in the dictionary. Fiddling the numbers will be exposed quickly and easily so it will not be done.
The dictionary is open and easy to query. We provide SQL Server Report Services Reports for all the information in the dictionary. If you would like another query tool to report on the dictionary you can re-create the reports using any tool you would like.
Our tools record everything from the Business Requirements and Project Objectives all the way through to every field in the data warehouse, every field on every report, and every piece of ETL.
During the development of the ETL process we track the progress of each mapping through the development process.
Creating the weekly “Progress/Status Report” is little more than running reports off our dictionary and presenting them to the project management group that reviews such projects. There is simply no way for a project manager to hide problems when our methodology is married up to our tools.
We can give counts of fields mapped, percentage progress, level of testing that the mappings have been put through. Everything. Our clients find our level of transparency better than any of the other vendors they deal with.
We even have project planning worksheets if you wish to use those instead of a normal project management tool.
If your vendor solution does not have full transparency in reporting? If you have to “take the vendors word for progress”? That should be ringing alarm bells for you.
4. How does the methodology reduce communication paths?
Software development is all about communication paths.
How many are there?
How is information communicated along those communication paths?
The more communication paths you have and the less fixed the format of the communication the higher the risk. The higher the chance of failure. Communication paths have been the study of software development since Frederick P. Brooks wrote “The Mythical Man Month” in the 70s.
Again. Let me explain how we address this issue so that you can compare your situation.
Our workbooks allow the collapse of the tasks for data modelling, data mapping, and ETL design and build to all occur at the same time in the same tool. An excel spreadsheet.
There is no separate exercise of designing a data model and putting it into a data modelling tool.
There is no separate exercise of documenting source to target mappings in word or excel and then giving those specifications to a programmer to code them into an ETL tool.
The data modeller, ETL designer, and ETL coder are one and the same person. There is no communication path needed to anyone else. This rather remarkable situation is dependent solely on the ability of the workbook to capture all that information. We actually recommend that all data modelling, ETL design, and ETL implementation for the prototype be done by the one person per project. Of course, this is a very high skilled job because of the diversity of skills needed. This was normally my job on projects over the last 12 years.
On some projects clients like to do some of that work to learn. Our general approach to skills transfer is to get the DBA for the project to be the reviewer of all this work. It is the DBA who has to take the workbooks and physically implement the database. But this is only a part time job. So the DBA is used to get answers to questions for the designer so that the designer does not need to chase answers to questions.
The mechanism for “communication” is not imprecise text in a word document. It is source code in a workbook that can be used by a generator. This is a much more precise and reliable way of communicating what needs to be communicated.
So compare this to your project. If you have one person designing the database, another designing ETL, another group coding ETL? Then you have numerous communication paths, all of which can have failures along the way.
The reduction of communication paths and the preciseness of communication by using coded workbooks vastly reduces time, effort and risk when designing and building a data warehouse.
This is true whether you use a packaged model or whether you are going to design a custom model for your company.
One of our clients insisted that they wanted everything that needed to be remembered in the workbooks and in the dictionary so that as staff turned over they knew for 100% certain that everything they needed to know about was documented in the workbooks.
Compare and contrast this to your situation. Do you have different pieces of code in different places and different tools so that it is possible to “forget” some component when you are attempting changes?
That is quite normal in companies that do not use our tools. We see it all the time.
Summary
In this short blog entry I have presented 4 very specific questions you should be asking about the methodology you are using to implement and/or maintain your data warehouses.
They are again:
1. Is it a waterfall methodology?
2. Do you have to spend major dollars before you are sure of what you will get?
3. What is the level of transparency of progress?
4. How does the methodology reduce communication paths?
These are, in my mind, the 4 most important questions you need to be asking about the methodology that you are using.
If your BI vendor/consultants can not present you with a sensible answer to these question you should be hearing alarm bells.
The methodology you use to deliver your data warehouse project will be one of the most important factors towards your success. It will not guarantee success but it will go a long way towards that goal in and of itself.
And, of course, a bad methodology will increase your risk.
If you want me to review the methodology and approach that your vendor is using? I would be more than willing to provide you a written report for a reasonable consulting fee.
If you got this far? Thank you for reading and/or listening. I hope this was useful to you.
Best Regards
Peter