The History Lesson for SeETL
I have decided to start a new blog to speak up about today’s issues in the area of Data Warehousing, Business Intelligence and Big Data. I have been relatively silent on these issues over the last 10 years. I have decided to speak up again for a variety of reasons I will explain in another post.
As a prelude to starting this new Blog I wish to explain the history of SeETL and BI4ALL to demonstrate that these products are what I say they are and do what I say they do.
It is very disappointing to me that even though I have been in business for 32 years and I have never even been accused of any sort of dishonest activity in all that time I am tarred with the same brush as the charlatans and liars who plague our industry.
It is sad that in this new century a mans reputation, built up over decades, counts for nothing.
It is sad that the time honoured small town practice of naming and shaming the charlatans and liars is as dead as the dodo. How little courage men have today.
Our sons and our grand sons will not thank us for devaluing a mans reputation to nothingness, of this you can be sure.
The History of SeETL
As many of you will know I did my very first data warehouse prototype project starting at Easter 1991. It was a pilot project for the Data Interpretation System from Metaphor Computer Systems. Metaphor was a company that was co-founded by Ralph Kimball and included many members from the famed Star project at Xerox PARC. If you do not know about Metaphor Computer Systems it is really worth finding out about them.
Along with Ralph were David Liddle, Don Massaro and Charles Irby. I got to meet Charles in 1993. What a privilege that was! I met Ralph face to face in 96 and that too was a real thrill for me.
In that first project it immediately became clear that THE problem was ETL. In those days we still had to use tapes because we simply could not build the staging area on disk. Disk was USD20,000 per GB in those days. So it was tape and cobol all the way.
That first data warehouse was a massive 5GB and it had to be refreshed each month. We had no way to determine deltas so there was no history other than what was in the operational systems. It was a prototype. After one false start trying with 3NF I invented a new way to store data in DB2 for high performance query that was widely adopted by large companies in Australia.
The most notable company that adopted my original ideas was the National Australia bank. These ideas were the key to the NAB doubling their profit from AUD 1 Billion to AUD 2 Billion between 1992 and 1997. The ideas I gave them enabled them to perform analysis that no other bank could. This was despite the fact that we offered to teach the same ideas to each of the big 4 banks for free. Just like we did with the NAB.
It was all part of the “IBM Difference” in those days. Provide the worlds best ideas to the biggest companies and point to the value when it came time to order new hardware.
In 1994 I left IBM to work as a consultant to my client the Mutual Life Company. It was known as MLC in Australia and it is now owned by the National Australia Bank. We had finally gained a small budget to do a research project to replace the now 3 year old prototype with a star schema data warehouse. This was a research project to consider adopting across the group of companies that MLC was a part of.
We had a colleague from Metaphor come out and design the dimensional models we would need to get started with. We then worked on the problem of populating the dimensional models. I had tried to populate dimensional models for a bank in 1993 for a prototype and found it very difficult. We had no incremental update in that prototype so we had not figured out how to do incremental updates. Nor had we figured out how to generate the meaningless integer keys.
It took us 18 months to figure out the cobol code needed to load this database. As I said. It was a research project so we knew that we would have to spend some extra money to figure out things we didn’t know.
My next project was back at IBM where I was asked to design and build a data warehouse for the RB 2020 banking system which was a mid tier banking system IBM was developing for sale. The basic dimensional models took only about 100 hours to design. That was our starting point.
We then moved on to work out how to load them from the operational system. As we were doing this design work the thought occurred to me that there was a lot of similarities in the TYPES of processing to be done even though the field names were different.
At this time MLC had been merged with IBM Australias IT Shop to create the first instance of an IBM services company. It was called ISSC if memory serves me. Given that my old client was now a part of IBM I proposed at a meeting that we see if we could get a copy of the code from my last project and use it on this project to reduce the time and cost. I was asked to estimate the time saving. I said that it was about 250 or 300 hours we would save.
Right in front of me the Project Manager (meaning the IBM Manager for the project) told the Project Manager for our project. “Find out how the contracts work. If we have to pay offer them AUD10,000 for a copy of the code. Accept up to AUD20,000. If they want more than AUD20,000 tell them no.”
Of course, as an independent consultant sitting at the table what I heard was:
“If you had a version of this code that you owned I might pay you up to AUD20,000 for a copy of it.”
So I decided then and there that I must buy a cobol compiler and I must create template code that I could sell to my next client. And so I did. This was original code that solved many of the problems we did not get to solve at MLC. But it was heavily based on my experience in building my first two proper dimensional data warehouses at MLC and IBM.
Those templates were christened “The Instant Data Warehouse” because compared to the 18 months we just took to do the code at MLC we were able to get much more work done on the IBM project with only my guidance to go on. The new ISSC did not want to sell the code we had developed to IBM for some reason. As far as I am aware no price was ever offered. ISSC simply turned IBM down. I have no idea why that happened. I was not involved.
Over the next few months I spent every spare minute I could working on “The Instant Data Warehouse”. It was a full set of templates for each type of processing. We simply copied the template and then changed it by hand and whola! We had the new program needed.
The “Instant Data Warehouse” was a HUGE factor in me getting my next job which was as the Practice Manager for the newly formed Data Warehousing Practice at Hitachi Data Systems. This was a very difficult role to fill because it would entail a lot of travel and the person who filled it would have to have a depth and breadth of skills second to none. So I was the man for the job.
My new manager made it clear to me that his reading of the IDW product was a major factor in deciding to offer me the position. He, like me, figured we could make a product out of it that would differentiate us in the marketplace. And so we did. For Hitachi it was branded “The Hi-STAR Warehouse Toolkit” as we promoted Ralphs book and promoted dimensional modelling so that we were different to Teradata-
Early in the life of the “The Hi-STAR Warehouse Toolkit” one of my staff came to me and said that he felt that the templates could be parameterised and that he could write another program that would take a dictionary and a template and generate the cobol code so that we would not have to make the changes by hand. Since he was pretty confident and he was brilliant I gave him approval to spent the week figuring out if he could do this and to come back to me with an estimate of what it would cost.
Well? I said he was brilliant. And by the end of the week he had written the translation program and had it fully tested. So the second version of “The Hi-STAR Warehouse Toolkit” was a version where we would simply create a dictionary in text files for what the staging tables and the data warehouse tables would look like. We had templates for each type of processing. And we had the generator. Once we modelled the staging area and the target data warehouse generated the code was just a set of commands to run.
With “The Hi-STAR Warehouse Toolkit” we had a more than 50% advantage in the marketplace and we knew it. Everyone else was still writing ETL by hand in cobol. The only reasonable product in the marketplace was Bill Inmons PRISM product which cost about AUD300,000 at the time. There was another one called Carleton Passport which was finally bought by Oracle for USD5 Million. Why Oracle bought Carleton I will never know. ETI was out there and IBM finished up supporting ETI. But it was a C generator in a time when Cobol was king.
As a result of this and putting together the “Hi-STAR Warehouse Methodology” I was instrumental in winning a number of big deals for Hitachi. The most notable of which was the Australian Customs Service tender at the beginning of 1997. This was the largest Data Warehouse tender for the year in Australia. Winning the Australian Customs Service put HDS on the Data Warehousing map in Australia. We won quite a few other follow on projects in Australia and Asia in 1997 before I left HDS towards the end of the year due to political difficulties which will be discussed in a subsequent blog post.
By the time I left Hitachi at the end of 1997 it was clear to me that ETL was THE problem in building data warehouses. It was just as clear that any and all tools to deal with this problem were going to vastly reduce the costs of the project. The toolkit had been extended greatly in my time at HDS. My contract with HDS said that in the event I resigned HDS would be entitled to a copy of “The Hi-STAR Warehouse Toolkit” in perpetuity. I would be entitled to a copy as it was at the time of resignation. This agreement was honoured by both sides.
I was very disappointed to leave HDS. We had great people and a great team. I never had any problems with my team mates or manager. They were great. The problem we had was that the branch managers wanted to control the delivery side of the data warehousing deals I won and they would bring in their own people who would not be able to deliver the projects.
This happened in both Canberra and Melbourne, though, to be fair, I hired the person in Melbourne but she lied to me in the interviews and the Branch Manager was able to persuade her to turn against the standards I had set down. Since I was not able to enforce standards and ensure successful delivery my position became untenable and I chose to resign rather than be associated with a Data Warehousing Practice that was sure to implode under the weight of the branch managers trying to create mini-practices one per branch.
Indeed, the Canberra Branch Manager, one of the ringleaders of refusing to allow the various practice managers run their businesses as needed, later turned around and claimed he had a “nervous breakdown” from a “hostile work environment”. Maybe if he had not spent 2 years lying to his colleagues and stabbing them in the back they might have been a little less “hostile” to him!
In my next two jobs I did not use this tool kit. PriceWaterhouseCoopers was absolutely not in to productivity tools that would take billable hours out of a project. And Ardent were the makers of DataStage. Having an ETL tool written by one of your staff you just hired is not a starter at a company that sells an ETL tool.
In 2001 I relocated to Ireland to work with Sean Kelly implementing the then Sybase Industry Warehouse Studio data models. More on them in the next section.
I did a project at North Jersey Media Group in New Jersey USA. Just before I went on that project I had written an article for Ralph that was published in DBMS Magazine. The article was a brief introduction of this presentation.
My article for Ralph in DBMS Magazine was well received and Ralph suggested that if I wanted to write another article he would be happy to consider getting it published. On the day that he happened to send me that email I was tearing my hair out at Informatica.
I was a DataStage expert and Informatica was driving me crazy with its deficiencies. I literally had to change the data models in order to be able to load them with Informatica. The need to alter a data model in order to load it with the ETL tool is abhorrent to a data modeller and should be abhorrent to ETL tool vendors too! So I sent back this really frustrated note to Ralph about how bad Informatica was.
Then Ralph sent a reply that would change my life, again, forever. He said something very close to.
“Well, if you are so smart and know so much about ETL tools, why not write an article listing all the features ETL tools should have? I will publish that for sure.”
That sounded like a really good idea and I promised to write the article as soon as I could. The problem being that we were working 13 days a fortnight on the project for North Jersey because I would not be allowed to work more than 9 months in the US without a H1B visa. I did not want to go in to the IRS system so there was a hard date that I had to leave the country. We were working feverishly to get the work done before I had to leave. I also recommended a colleague to take over from me and he had to be brought up to speed.
After I finished this contract at the end of June 2002 I went on a long holiday around Europe as we planned to return to Australia in January 2003. This was the “holiday of a lifetime”. When I got back to Ireland in September I kept my promise to Ralph and wrote the article. All the things that the ETL vendors should put in to their products.
Unfortunately Ralph had decided to end his association with DBMS magazine and told me that he would not be able to publish the article as promised. That was fine and I thanked him for letting me know. After I sent the email I went and read the article again. It was a good article.
Then the thought occurred to me:
“You know, if I wrote an ETL tool that could do all these things? I would make a lot of money. Everyone needs an ETL tool with all these features. Everyone.”
By now it was October 2002. We had a firm plan to return to Sydney after spending Christmas in Ireland. Our two year visas were expiring in February 2003. I was not planning on taking any new contract. I was planning on getting a job or going consulting when I was back in Australia. The two years experience had left me with more skills than any other person in Australia so I knew I would command good fees.
But the thought stuck in my head. If I could write this ETL tool I would make a lot of money. But I could not write C++ so I would have to learn it in order to write the ETL tool. Also, I had never written ODBC code so I would have to learn ODBC too.
The short story is that it turned out to be possible to write the ETL tool with all the features that I wanted. It took a month to write the first program to create the classes that would form the heart of the software. After a month of 14 hour days I knew the ETL tool I wanted could be written. It was now just a matter of time to write all the features. Naturally, that would take a LONG time and I would need a contract to pay the bills while I worked on this software. It was also at this time we decided we would stay in Ireland and try for Irish Citizenship for the benefit of our childrens future.
As everyone knows, after 9/11, 2002 was a difficult year for many companies including tech companies like Sybase. Another recession was getting underway and it was time to cut back development projects in many large companies. Selling Data Models during a recession is a tough task and sales of Sybase IWS were slowing in 2002.
Sean Kelly was experiencing difficulties with Sybase there was a parting of the ways at the end of 2002. Sean did me the favour of signing my new contract and visa extension before he moved on from Sybase. But then he moved on to form another company called Comhra to develop and sell a new and innovative product that was not related to the data models he had sold to Sybase.
So by 2003 I was aware that we had the best data models possible and now the challenge was to write the ETL tool in my “abundant spare time” so that I could sell it and make money from it. I will refer to it as its current name of SeETL rather than the name that it had at the time.
My next IWS implementation for Sybase was at Saudi Telecom in Saudi Arabia. We were developing on Windows and would be going in to production on Solaris. The Sybase partner had sold DataStage rather than the Sybase recommended Informatica which pleased me no end. This project presented a great opportunity to develop my ETL software.
So during 2003 I worked on the implementation of the IWS Telco data model and worked on the development of SeETL as a C++ engine to populate the IWS Telco models. It was a smashing success. We even implemented memory mapped IO for SeETL so that the dimension tables could be loaded in to shared memory maps so only one image of a set of dimension table keys were needed while we could have as many attribution processes running as we liked. We were able to totally max out an 8CPU machine image when doing our testing. Saudi Telecom had 20 million customers and produced about 60 million CDRs per day back in the 2003 timeframe.
That the SeETL C++ engine was able to process the CDRs faster than the DataStage engine showed that SeETL was ready for prime time and sales activity. We were able to build 100% of the ETL required by Saudi Telecom in SeETL for our testing. Once we had the data model finalised and the ETL fully tested we converted the SeETL ETL to DataStage.
The Saudi Telecom project set records in Sybase for the most number of fields ever mapped to an IWS Data Model. Because of SeETL the work was done in less than 50% of the person days that we had taken just a year earlier for North Jersey Media Group. SeETL had proven itself in a national telephone company. Clearly it could handle the processing volumes found in more modestly sized companies.
Given that I did my first data warehouse project in 1991 I was very pleased with the situation in 2003. We now had the best data models possible and the best ETL tool to load that data model. I was expecting to sell these things like hot cakes.
I put SeETL up for sale for AUD20,000 and started selling it to all the people I knew. I sold three copies to Kerry Packer companies as a way for those companies to get their data warehouses done more cheaply. I had placed the consultant that was now responsible for these projects on the original Australian Consolidated Press project so he was well aware that SeETL would do what I said it would do. I even designed the data model for one of those projects as part of the project.
However, much to my surprise, my other friends and colleagues did not buy SeETL when I sold it to them. They resisted and stayed with tools like DataStage and Informatica. I was very surprised by this. I spent a lot of time on emails explaining to my friends and colleagues that the product was sound and we had implemented the entire processing stream for Saudi Telecom with the product in test. But no more sales were closed. Not because the tool was not good enough, but because of the politics of the situation. People did not want to alienate IBM or Informatica.
My next project for IWS was Orange Romania where we actually did two data warehouses. We did one specifically for Finance using Oracle Apps. This was a “test” for my client for the real data warehouse build. Orange Romania wanted to replace their billing system and that would require a new data warehouse. Since no one in Romania had ever built such a data warehouse my client was being “tested” on the finance data warehouse before being given the opportunity to do the billing systems data warehouse which would be an order of magnitude more difficult.
SeETL was used again on both projects. This time on AIX. We also ported it to Red Hat Linux during this project. So I now had the worlds best ETL tool running on Windows, Solaris, AIX and Linux. All with memory mapped IO and being able to max out the CPUs on any of these platforms during the attribution process.
It was during this project in 2004 that one of the developers, Adrian Nagy, came up with the idea of saving the mapping spreadsheet, which was purely documentation at that time, as an xml document. We would then write a VB program to read the XML and to generate the code we used to write by hand. Once we were able to read the excel spreadsheet with VB and generate code it was clear that we had taken the next BIG step forward.
The Orange Romania Finance Project placed more than 9,000 fields in to the IWS data model. This was more than 4x the number of fields mapped on the Saudi Telecom project which was the then record holder for fields placed into an IWS data model. The amount of effort expended was less than half again of the Saudi Telecom project. This made the field for field cost of the Orange Romania project about 80% less than that of the Saudi Telecom project which was already half the effort of the New Jersey Media Group project! Great improvements in just 3 projects!!
By the time I finished on the Orange Romania projects in late 2005 I knew that with IWS and SeETL we had a combination that would deliver success 100% of the time as long as the client did what we advised them to do.
By late 2005 things were not going well for Sybase. Times were tough. There is very little money in data models when compared to database licenses and Sybase had let most of the consultants implementing IWS go. Contracts were not being renewed and very little sales effort was being made in preference to the database products. Eventually Sybase was bought out by SAP which has its own SAP Business Data Warehouse. SAP withdrew the IWS models from marketing.
My next project was Electronic Arts which needed a new data warehouse for the roll out of the PS3 scheduled for 2006. I started on a 3 month contract for EA in October 2005 and ended up staying there until the following November 2006. This was not an IWS project.
With no new sales of SeETL happening among my friends and colleagues I simply kept on developing the product to suite the needs of my clients. Every time we needed to do something that we could not do with SeETL it was added to the product and then used on the project.
In 2007 I finally got the “breakthrough sale” that I was looking for. This was with a company called Key Work Consulting in Germany. I had known the co-owner, a man by the name of Tobin Wotring, since 1999 when he founded his company. I had answered many questions for him and his staff on the DWLIST forum. So now, 8 years later, my patient answers paid off when Key Work decided to migrate all their ETL from DTS to SeETL. They had tried SSIS and they had found it lacking. It was simply not reliable enough. We created a custom version of SeETL for Key Work and they own the source code for their version.
This was a “breakthrough” sale because Key Work has very talented people and can argue that they are the best Fashion Retail BI people in Germany. Having Key Work as a reference to PROVE that a top flight consulting company had settled on SeETL in preference to everything else and migrated all their ETL to SeETL would, I thought, convince all the “doubting Thomases” that were my friends and colleagues. Key Work runs 100% of their ETL for their clients on their custom version of SeETL. The co-owner, Tobin Wotring, has done a very nice reference for SeETL.
The project after Key Work was Carphone Warehouse and the project was the Talk Talk data warehouse. Talk Talk were doing a 3 year billing systems migration project which was the reason they needed a new data warehouse. You can listen to a little about Carphone warehouse in this video.
This was the be the first time Sean Kelly and I had worked on a Data Warehousing project from initial sales calls to move to production and support. It was one of the great privileges and pleasures of my professional life to work with Sean on this project. It was an amazingly smooth project despite numerous setbacks that were outside of our control. We implemented the BI4ALL models using SeETL in development and Informatica in production.
The big innovation at Talk Talk was to test generating the ETL system as SQL rather than using the C++ engine. This was the brainchild of Brian Ganly who is on the video above. Brian was a big supporter of Netezza and worked shoulder to shoulder with me on the project. He wanted to adopt all our ideas for Carphone once the Talk Talk project was over. It was Brian who “pestered” me time and again with “you have everything in the spreadsheet you need to generate SQL as the ETL and the Netezza machine will handle it.”
We were working long hours and it did not seem sensible to spend time testing this hypothesis. But Brian can be a very persuasive man. So one weekend I decided to try processing 80 million CDRs through SQL rather than through SeETL.
Amazingly it worked…in TWENTY MINUTES! We were able to perform a select insert with 25 joins for 80 million CDRS in just TWENTY MINUTES. I will never forget when the query ended. I just sat there…stunned. I couldn’t believe it had finished successfully.
This was only a fairly small and old development netezza machine. I was expecting hours if it was successful at all. We later processed TWO BILLION CDRs in one statement on the new fast production machine! When you can join TWO BILLION CDRs to 25 dimension tables in a select insert you can do ANYTHING in SQL.
So on that day, when the Netezza machine responded and had completed the processing properly it was clear. We had seen the future. The future was ETL was going to be performed inside the database in SQL. Goodbye DataStage and Informatica!
With Talk Talk in 2008/09 we had proven we could deploy our new generation of BI4ALL data models using SeETL faster, cheaper, better, with higher quality than any other project we had undertaken to that point in time.
We teamed up with Netezza and started talking to EVERYONE. The video above was made by Netezza before the IBM buy out. Since it was one of the best customer references Netezza had IBM kept it and just put the IBM logo on it. After all, IBM paid USD1.8 Billion for Netezza. That entitles them to claim the successes of Netezza.
Sean and Brian spoke at the Netezza users conference in 2009 and the audience were absolutely stunned at the numbers they were quoting. We got a lot of leads from that presentation. We did a lot of joint calls with Netezza as well. We won another UK telco in 2010 to do a very narrow implementation. That client also spoke at the Netezza UK users conference and quoted numbers that staggered the audience.
In 2010 our client actually got a Key Note speech for all attendees to the conference. Not just a break our session as we did the year before. 400 people heard how the client has built 100% of their ETL on Netezza using an “innovative tool developed by our consultants”. We could not be named for legal reasons but everyone knew who he was talking about.
Then, of course, disaster struck. IBM bought Netezza. Just like they had bought Sequent, PwC Consulting, Informix and Ascential previously!! IBM buys EVERYTHING I have success with!
The last thing IBM wanted was Sean Kelly and Associates talking to what were now IBM customers about our Data Models and SeETL. IBM has their own data models and DataStage. We were kicked out and the Netezza people were told not to talk to us any more.
Thanks Big Blue! Yet again a major vendor did all they could to stop a small company from providing an innovative solution to their clients. I often wonder when IBM customers will figure out that IBM stopped innovating in the late 80s and has been stomping on good ideas ever since. This stomping on good ideas because they were “not invented here” has not done IBM customers any favours at all. And that comes from someone who expected to spend his whole career at IBM in 1986.
Even with IBM severing the Netezza relationship, Sean and I were convinced. We now had two major telco clients with the new models and who both used SeETL in development. We would be able to take that to the marketplace and win. There was still the issue of 50% failure rates. We knew we had a package that we could put in to a Telco where we could have success be 100% guaranteed. Would people really keep buying from the likes of IBM over us when it was so clear how the large vendors were doing all they could to stifle innovation?
The answer was yes. People still doubted us and still did not believe that we could do what we said we could do. Despite the fact that Sean and I were two of the most experienced people in Europe in Business Intelligence people openly doubted us when we explained what SeETL and BI4ALL could do for their data warehouse projects. In 2011 we didn’t close a new sale. So we decided to just keep going about our various business efforts and let the world pass up the opportunity to save time, money and effort by using SeETL and BI4ALL.
In early 2012 Sean and I agreed that I would put license keys into SeETL and sell SeETL as a commodity product even though that might introduce competition for ourselves in our consulting business. The idea was to sell SeETL to consulting houses on an annual license fee making it extremely cheap to do so. Again, amazingly to me, people have not adopted SeETL despite the fact that it is basically free at EUR800 per year per person.
This is where we find SeETL today. It is a commodity product that costs EUR800 per user per annum for anyone who wants to cut down the time and effort of constructing ETL.
As you can see from the 20 years history of the product it has gone through numerous complete re-writes to where it is today. Today we have the legacy C++ engine for anyone who wants to use that. And we have the workbooks that maintain all the metadata for virtually all aspects of a data warehousing project. The most important workbook being the one that generates SQL as ETL.
The way the ETL is generated is able to be modified via parameters. There is not just one way that the ETL gets generated. SeETL is, quite simply, the best way to build ETL subsystems. Even if you are going to use something else in production.
It is my gift to my industry to help everyone else have fewer failures and more successes.
You are welcome!