Hello and Welcome.
This is Peter and I’d like to say thank you very much for coming along and listening to my latest blog post.
Please forgive the abbreviated thumbnail.
I could not fit “Using Excel To Augment GUI Input Processing” on the thumbnail.
In this blog post I have taken the slightly different approach of dictating the blog post in to word and then updating it to make it just how I want.
Having done that I have read the finished blog post back into the public record.
If you are listening to this blog post then you are hearing the finished result.
This blog post has been rattling around in my head for a long time.
It’s been like that itch that you just can’t scratch.
I am not really sure why.
So here we go anyway!
Many people have asked me:
“Pete how come you keep pushing the idea of using Excel as good way for building data warehouses?”
They point out that there are perfectly viable tools to build ETL systems, perfectly good tools to design databases, perfectly good tools that are schedulers.
“Why would you regress back to using Excel as a tool for building databases, for building ETL, and for doing so many other things within a data warehousing project?”
Those are good questions.
In my mind the answer is rather obvious.
It saves a LOT of money.
However, that other people are asking these questions would indicate that the answer is anything but obvious in their minds.
So I will explain again here.
To build a data warehouse you actually have to use many tools.
For ETL systems you need to be able to define all the source data that you’re going to use before you have built anything.
Then you need to do analytics on that source data to learn it and understand it.
Then you need to go through the process of documenting your mappings through to your target data models.
Then you need to go through the process of actually building your mappings through to the target data models using whatever ETL tool you have chosen.
Of course, for the landing area, staging area, and data warehouse itself, you need to build the underlying tables in the databases.
On top of those tables in those databases you need to build views.
For these views you need good documentation to go into the query tools and reporting tools so that the people using these tools have a fair chance of knowing what the hell it is they are looking at.
All these things that need to be done and they require someone, a human being, tell a computer, what it is they want the computer to do.
There is a human to machine interface required at each of these points for all of these tools to express what needs to be done by the machine for all of the fields involved.
Today we are building data warehouses with field counts in the 30,000-50,000 range.
No. I am not kidding.
My clients are now working on data warehouses with field counts in the 30,000 to 50,000 field range.
Just 10 years ago 4,000 fields mapped to a data warehouse was considered quite a considerable undertaking.
Today we are building data warehouses from a ERPs where the ERP has 30,000+ fields in it.
We want to take every field, with its correct name, and data type, through the landing area, the staging area and send them in to the data warehouse.
All the fields.
All the time.
It is simply not viable to do that in any of the leading ETL tools today.
It’s not “hard” or “difficult” or time consuming.
It is financially a no-go.
Today it is too expensive to build such an ETL system using any of the current GUI based tools.
Too expensive means “impossible” because you can’t get the budget approved.
So when you look at the dominant and emerging ERPs, and look at the number of fields that are now available to be mapped through to the data warehouse?
It soon becomes clear that there is quite a lot of metadata required to build a data warehouse and then even more metadata to actually run that data warehouse and keep it in good working order.
You don’t have a choice in doing all this.
It has to be done.
This is how data warehouses and business intelligence solutions are built.
Some people will make promises to the contrary.
I would warn you that most of those promises have gone unfulfilled over the last 25 years.
In the past there have been many efforts to figure out how to make building data warehouses faster, cheaper, easier and more reliable.
We have way too many project failures in our industry.
My work from 25 years ago was one of those efforts to make building data warehouses cheaper and easier.
I was very successful at that.
As long ago as 1991, I noticed how hard it was to build the ETL systems for data warehouses.
I noticed that if I could decrease the amount of time and cost for building the ETL system, I would be able to build data warehouses faster and cheaper than anybody else.
By 1995 I realised what the design patterns were for building dimensional data warehouses were and I built my first ETL software.
In 1996, with my ETL software in tow, I was hired to start up the Hitachi Data Systems Data Warehousing Practice in Asia Pacific.
We rebranded my Software the Hi-Star Warehouse Toolkit and we sold a number of massive deals before I moved on.
SeETL, as the “Hi-Star Warehouse Toolkit” proved itself in the real world of massive multi-national companies building data warehouses.
While I was at Hitachi in the 96-97 period we were able to deliver large data warehouses for our clients with the cost of the consulting around the same price as Prism Solutions were selling their ETL Software.
This gave us a tremendous competitive advantage in the marketplace.
I realised then that building ETL software was a pretty good idea.
But I want to take a slightly different view in this blog post and talk about more general issues that just building ETL software.
What we have seen happen, not only in the business intelligence area, but in virtually all areas of computing usage, is the emergence of the GUI as the predominant interface for users of an application.
For those of you who don’t know, I was programming screen based data entry computer systems before the idea of a GUI even made it out of Xerox PARC.
Now, we all know that GUIs have been helpful and useful.
We all know GUIs provide a really nice, simple way for the average person to interact with the computer.
My point is that not all people are average.
Although it is true to say that the average person is assisted by a GUI.
It is also true to say that the smartest person in the room is almost always hindered and hampered by the exact same GUI.
For the smartest person in the room, the GUI gets in the way and slows them down.
We saw that this was true in the development of ETL systems.
When people like Bill Inmon and Ardent Software brought out ETL development tools they used GUIs as the human to machine interface.
The idea that everyone had with ETL tools was to make the ETL programming simple enough that the average programmer could do it.
Programmers were the expensive part of developing ETL systems so it made sense that the early focus was on tools to make it possible to hire cheaper programmers.
They may not have the same productivity numbers as the top flight programmes, but the overall cost was less and cheaper programmers were in more plentiful supply.
Because the intended user of the ETL tool was the average programmer, the idea of having a GUI seemed like a really good idea to pretty much everyone.
What we did in the ETL world was to separate the work of designing the ETL subsystem, designing the data model, and implementing the ETL subsystem through a GUI in an ETL tool, to separate people.
So in most data warehousing projects, the people who design the ETL system, design the database, and write the ETL system in the GUI, are three different people.
Very often they are using three different tools.
In the early 2000s when I was working for myself again.
I looked at this situation and I studied it in some more detail.
I knew and understood that if I could use just one tool to do these three pieces of work, and have that one tool be useful by the smartest person in the room, then I could dramatically cut the cost of implementing data warehouses and business intelligence solutions.
Of course, I was thinking that tool should be used by me.
One, because I was thinking I was the smartest person in the room.
Two, I was an independent consultant and if I could make this happen then I could charge higher fees because I would be more productive.
And three, I would also “eat the lunch” of the ETL programmers and get paid for doing their work.
All this led to the development of the tool you know today as SeETL.
As you know I sold SeETL very successfully in the 00s.
I was also very successful in using SeETL on projects even when the final ETL system was delivered in DataStage or Informatica.
Back in the 00s my customers were astonished at the prices I quoted for building ETL systems because they were often less than half of the next lowest quote.
Just by the way, that is still true today.
If you are going to build a new ETL system, perhaps in an ERP migration project or telco Billing Systems Migration Project?
You can download SeETL and use it for free.
You will be able to build your DataStage / Informatica ETL systems at about 50% of your expected cost.
Similarly, if you want to migrate away from DataStage or Informatica to save money you can download SeETL and migrate to it.
You will reduce your people time support costs by between 20 and 50% depending on how poorly your current ETL is written.
You can do that today for free and you are invited to do so with my blessings.
But there is something deeper here for you to understand.
Out there in the real world there is a vast amount of code being written every day.
Indeed, people are writing more code daily today than has ever been written, per day, in the history of mankind.
If you haven’t noticed that code is being written in GUIs.
Further, the amount of data entry that is being done has also exploded.
If you haven’t noticed, that data entry is being done mostly through GUIs as well.
The GUI is being used by the average person at their role.
But what if there was a way of getting the smartest person in the room to do that work many times faster for, say, a 50% increase on costs?
That was the idea behind SeETL.
The idea behind SeETL was we can use the Excel interface to capture the human to machine data input and then from the Excel spreadsheet generate the code that is necessary.
By eliminating the GUI, and eliminating the amount of time that the GUI takes to navigate, suddenly we were able to build data warehouses at least twice as fast as we were able to previously.
This applies to virtually any application that has a GUI interface.
The GUI interface is there to capture definitions of things, such as data models or many other things, and to make it simple enough that the average person can do that data entry.
But what if we used the smartest person in the room who didn’t need the GUI?
We could vastly improve the rate at which the work can be done.
The human to machine interface can be the humble Excel spreadsheet.
An application can read that spreadsheet, interpret the human to machine interface expression of what needs to be done, and then go and do that task, whatever that task is.
Now that task may be to take some data from the spreadsheet and just place it into a database.
Or it might be to generate code that will be used later.
Here are two examples from SeETL.
We all know that SeETL can generate SQL language as the ETL subsystem.
That is a case of taking the definitions in the spreadsheet and generating code.
But SeETL also has a scheduler as a part of the product that not many people know about.
Like everything else the schedules are expressed as a worksheet.
So for the scheduler, SeETL simply takes the expression of the schedule in the worksheet and loads it into the dictionary tables for the scheduler to use at a later date.
These two processing patterns can be used for almost anything.
So, if you have a product that uses a GUI as the human to machine interface, then there is a very good chance that you can emulate that interface into Excel plus an app and increase productivity of the people who use your GUI.
Now I am not arguing that you should replace your GUI with Excel, like we have with SeETL.
I am simply suggesting that it is very possible to augment the GUI with excel to increase productivity of entering information in the first instance, to later perhaps be edited via the GUI in the second instance.
Either way, it is a very interesting idea.
For example, and all my subscribers know I love giving examples.
I recently watched a demonstration of a product called Incorta.
Just like DataStage or Informatica, they are using a GUI to define translations and data movements.
The demonstration was with Oracle Enterprise Business Systems (EBS).
It is impossible to build data warehouses taking all the fields from sources to target using GUIs because of the sheer number of fields.
Everyone complains about the cost of developing ETL.
(Well, everyone except my SeETL customers. LOL!)
“There has to be a better way” people say.
I would contend we invented the better way 17 years ago and that the better way is using an Excel spreadsheet to define these very large numbers of fields and mappings.
Then you can load these definitions it into whatever metadata dictionary or engine you want to use.
Sure, you can also have a GUI for the average person to be able to edit these mappings.
But to get them loaded in the first place, you could save a lot of time and money if you used Excel and had a tool to load the workbooks into the dictionary or to generate the code that is needed, just like SeETL does.
Now that I have published the source code of SeETL?
You are welcome to go and read that source code and to determine the following for yourself.
Would it be a good idea to define Excel workbooks that are then read and processed by an app such that you reduce the amount of time and effort that is necessary to generate whatever code it is you need to generate?
To give you a valid comparison in the world of ETL systems.
We have found it faster and cheaper to build ETL subsystems using SeETL and then to migrate to either of Datastage or Informatica at the end of the project.
This is about twice as fast as using DataStage or Informatica during the project to build the ETL system.
At the end of the project you still get your ETL written in either data stage or Informatica, it’s just that it costs you half the money to do it.
I would contend there are many other applications out there like DataStage and Informatica where developing an Excel input option would save a great deal of time and money.
Now you actually have the SeETL source code to go to you can find that out for yourselves.
In my opinion, virtually every product that has a GUI to define something to generate code, or to load up metadata that is interpreted for later processing, may very likely benefit from using the exact same idea as SeETL.
Of course there is only one way for you to find out, and that is download the SeETL code and read it to see exactly what it does.
The simplest description is that it:
- Loads the spreadsheet in to an in memory xml document.
- Extracts the cells from the workbooks into arrays.
- Reads the arrays and populates in memory structures based on the data in the arrays.
- Then reads these in memory structures to produce whatever the output that is needed to be produced.
The code itself is actually very simple.
It’s the idea that was so revolutionary.
The idea being that Excel Worksheets could be used as source code to generate other code or to load data into dictionaries so that a GUI did not have to be written as the human machine interface.
I would encourage those of my subscribers who work with products that have GUIs where a great deal of time and effort is expended by programmers using the GUI as the human machine interface to take a second look.
Is this GUI the best way to get the information needed out of the head of the programmer and into the machine readable form?
Or is your case similar to DataStage and Informatica?
Is your case similar to data modelling tools?
Is your case similar to scheduling tools?
In all these cases the GUI can be augmented by Excel to do the vast majority of the work, or even all of the work, at a vastly reduced cost ongoingly.
I think it may well be worth many of my subscribers asking that question.
Especially of your tools vendors.
Lastly, a number of people have asked me:
“Pete, why don’t you add the feature where SeETL can generate the actual XML document for DataStage or Informatica to ingest as mappings?”
My answer to that is this:
“At the end of a sizable project, say 4000 fields mapped in a multi-billion dollar telco, the SeETL migration effort is 2 weeks for either DataStage or Informatica as the target ETL tool.
So the time saving per project is 2 weeks, 80 hours, approximately.
The project itself might be 36-48 work months.
So, firstly, the saving is nominal compared to the overall project.
And secondly? Who is going to pay me to write that interface?
There is no point me writing it myself if I don’t get paid.
My customers are happy to pay the two weeks work because we are still 50% cheaper than using DataStage or Informatica during development work.
Therefore, it has never made sense for me to write either of those interfaces.”
Now, that’s the itch I have wanted to scratch.
The idea I wanted to get out of my head and into the public.
The idea of using Excel as the human to machine interface to capture “data”, that will be used as source code, by an application, to generate code or simply put some records into a database via ODBC.
An idea that has served me very well for 17 years.
Perhaps it will serve you well too.
I have created a separate permanent link to the SeETL VB code for this post.
It may become out-dated from the release version so if you want to get the latest release version please make sure you go to the downloads page.
It’s just that there are a lot of things downloaded in the full download and I wanted to give you a one click link to the SeETL VB code on this page.
As always, if you have any questions please feel free to email me and ask.
As always, now I am banned off linkedin, please feel free to share my posts on your linkedin and let other people know that this source code for SeETL is freely available from this page.
Thank you again for being my subscriber.
I really appreciate you!
I look forward to your comments.
If you have questions, or ideas about future blog posts, please feel free to email me.