Hi! Thanks for dropping by my blog! It is much appreciated.
So Microsoft and Salesforce have joined up to take Informatica private.
Talk about “News”!
You want to know something really strange?
Last week I got a whole bunch of emails to the effect.
“Hey, Peter, you are Mr. ETL what do you think of this?”
First of all.
I never knew I was “Mr ETL”. That is a new one on me.
Back in 1991 I was “Mr. SCLM”. People used to even call me “Mr. SCLM” rather than just Peter.
In 1995 the then Managing Director of SAS Institute Australia called me “Mr. Star Schema”. This was because he claimed that every time he heard me talk, or I was talked about, it was always in relationship to how hard I was pushing Star Schemas as the new way to design analytical data warehouses.
But no one ever called me “Mr. ETL” before last week.
I wonder how THAT happened!
Anyway, yes, I used to be the Professional Services Manager for Ardent Asia Pacific.
Yes, I used to be expert in both DataStage and Informatica.
When I did Saudi Telecom in 2004 we leveraged what others had done to create a cool new way to share dimension tables in memory to reduce the memory requirements of DataStage. Something that was not available in Informatica as late as 2010 as far as I know.
But hey, it has been quite a few years since I used DataStage or Informatica in anger, so “Mr. ETL”?
I am not too sure about that.
Even so. This is BIG news.
What is my opinion about this?
Well, my opinion about ETL tools has been the same since about 1995 when I wrote my first ETL tool in cobol that I called “The Instant Data Warehouse” at the time.
It was “Instant” because we could do about 18 months work in 3 months with this new tool. And that was an amazing improvement.
What was my opinion back then?
My opinion, in 1995, was this:
“The databases are going to get faster and faster and in the end all ETL will be done inside the database. This will be 20 years away but this will happen.”
Now why did I hold this opinion?
Well, because in 1986 I started work for IBM on a project called COBRA. It was a billing application that produced invoices for IBM Countries. The database inside COBRA 1.0 was only small and the vast majority of processing was done in files.
Because a read of a record on a file was about 100,000 fewer instructions than a read of a record in an IMS database at the time. And billing is something that has to go FAST. And I mean REALLY FAST.
So in COBRA we used the IMS database only for control records. It was not until COBRA 2.0 that there was a query function to review an invoice online. And it did not work very well in its first release.
So I watched with some interest as DB2 went from “unusable” in version 1.0 to “pretty ok” by version 2.3. I could see that the databases were going to go faster and that the cpu and storage was going to get cheaper.
In 1993 we even started using the OS/2 version of DB2 to perform prototyping of data warehouses and it served as a very reasonable database for smaller volumes. And when I say smaller volumes I mean up to about 1 million records at the time.
So, of course, in 1995, cobol was king. ETL was written in cobol for exactly the reason that there was no license fee associated with the run time portions, you just paid for your CPUs. On the other hand all the database vendors licensed their databases based on CPUs back in those days.
So if you wanted to do ETL inside the database you were going to pay for that big time.
Also, the #1 task in ETL that consumes MASSES of resources is the attribution phase. This is where the transactions were read and the string keys were translated to integer keys that were generated inside the ETL subsystem.
You could do these translations in binary searches in memory which was at least 100x faster than doing them using the database manage.
Simply put, in 1995, the processing advantage and cost advantage of using cobol over using the database was AT LEAST 100 times. And when something is 100 times cheaper and 100 times faster? You use it, right?
But I had already seen the speed up of IBM mainframes from 1982 to 1995. It was an astonishing increase in price performance ratios. And another 20 years like that would make computing almost free.
So, as early as 1995 I was saying:
“eventually everything will be in the database, the RDBMS, there will be no reason to have any data anywhere else. Certainly no financial reason.”
So in those days I watched the ETL tools come to market with some scepticism. My view was that ETL tools were great to overcome the issues of the day. They were well cost justified against the costs of the database licenses that would be needed to perform the same functions. No question.
I had no problem being paid to sell DataStage because at that time DataStage was 4 to 5 times cheaper than trying to do ETL inside the database or manually. The only tool that was available that was better than DataStage was my ETL tool and it was “on the shelf” while I was at Ardent.
But, over time, it was clear to me that the database vendors would eat the lunch of the ETL tool providers. No question. The databases would get faster and faster. The investment in them would be leveraged. And the ETL tool vendors would see themselves attacked by the massive likes of Oracle, IBM, Sybase, and Informix who were the leading database vendors of the day. Of course Microsoft jumped on that boat later too.
The amount of dollars put in to database research meant that no small company was ever going to be a serious threat to those guys. A few years investment in new technology and they would “eat the lunch” of anyone who had built up a bit of a lead.
I said the same about dimensional databases like IRI Express, Essbase, Analysis Services etc. These dimensional databases had a feature, function, price, performance advantage over the relational databases. No question.
But with Ralph Kimball making the mechanisms for creating multi-level star schemas in relational databases public it would only be a matter of time before people would adopt those techniques and Microsoft would open up excel for access to databases and the need for dimensional databases would go the same was as the need for horses.
And so we come to Informatica.
The relatively slow growth of Informatica over the last few years was as predictable as the sun coming up in the morning. The days of ETL and data integration being done in high cost specialty tools is over.
The databases, as I predicted 20 years ago, are now fast enough to do the vast majority of the ETL and data integration inside them. A lot of people will not like me for what I say. But hey, that is nothing new to me, right?
I did my very first 100% ETL inside SQL inside a database in 2010. That was on a Netezza machine admittedly. But it was a stunning event. When our client announced that they had done this to the 400 strong audience in a key note speech at the London Netezza users conference in 2010 you could have heard a pin drop.
“What? No ETL tool?”
And this was a telco that had ETL tools at their disposal.
They wanted to see if they could do all the ETL for a data warehouse in SQL inside Netezza. And we were just the guys to do it.
The writing was on the wall for ETL tools in 2009 when I was working at Carphone Warehouse and we were able to process 80 million call detail records in 20 minutes on a the development netezza box. When I saw the query end in 20 minutes I thought it must have died. So I went checking everything to see what happened.
Nope. It worked.
80 million CDRs were joined to 25 dimension tables and the attribution process was completed properly. We later processed more than 2 billion rows through the same statement successfully on the new netezza production box.
Good bye ETL tools.
Now. As many people know. I worked for IBM for many years and I have many friends both at IBM and ex IBM. Alas, in the late 90s when I competed with IBM in Australia, I took so much business off them that someone over there holds a bit of a grudge. Every time IBM can do something to make my life that little more difficult they do. And so when IBM took over Netezza they cut us out of further discussions to make sure they could sell more DataStage licenses to their unsuspecting clients.
But you, dear reader, should make no mistake. IBM knew very well that when they bought Netezza we had invented a way of pushing all the ETL in to the Netezza machine so as to avoid an ETL software license. And they hid this information from their clients so as to get more money from them.
This was one of the reasons that I ended up competing with IBM despite wishing to co-operate with them. An IBMer, one Victor Grasty, engaged in some unethical business practices against me back in 1995 after I had left IBM.
I took up this issue with the General Manager of the Finance Branch who was my ex boss and also a friend of mine. He decided that he would not enforce IBM policies with respect to business ethics. It would be too much trouble and “what can you do to IBM” was how that conversation went.
Well, the answer was “plenty”! LOL! As I said. I took so much business off IBM that even 15 years later someone over there is holding a bit of a grudge. All very childish if you ask me.
The refusal of IBM to inform their Netezza customers that two Netezza customers had built very robust ETL systems in SQL only, while at the same time selling them copious copies of DataStage, gives a clear indication of the business ethics of IBM today.
A far cry from the business ethics of IBM in the 80s, that’s for sure.
All that is to say that I have known about, and predicted, the end of the high cost ETL tools for a very long time now.
The ETL tools I have written over the last 20 years have shown the way for how ETL will be written in the future.
SeETL, today, is little more than a set of 80 Excel workbook definitions and an 80,000 line VB.NET program that reads the spreadsheets and generates SQL for the ETL side of things. It is available for EUR800 per year per ETL developer. And it will do 99%+ of everything needed in a data warehouse project. The other 1% being covered by the older C++ versions and the trusty C++ compiler.
So to shell out big dollars for an independent ETL tool, or even to lock yourself in to an ETL tool by getting it “free” with your database like Microsofts Integration Services, is not a good idea.
What you want from an ETL tool in 2015 is exactly what we wanted from ETL tools in 1995, 20 years ago.
- Best possible performance.
- Lowest possible price.
- Easy to use.
- 110% database independent.
The product I wrote in 1995 called “The Instant Data Warehouse” was all those things.
None of the ETL tools the “vendors” brought out were any of those things.
And now we are 20 years later and we are seeing that the ETL tool vendors are starting to suffer the inevitable consequences of trying to beat database vendors and hardware developers.
You can’t beat database vendors and hardware developers over the longer term.
They have too much money from legacy installs and can re-invest that money like there is no tomorrow. The best you can do is exactly what I did. Develop niche tools to leverage the ever increasing power and performance of the hardware and databases and stay “just in front” on the innovation curve.
Because, although it is inevitable that the database vendors will remain supreme, they are massive organisations and it takes some time for them to change course or add in new features and functions. So there is a time gap on the innovation curve where us independent consultants can make quite a nice living creating new tools that the database vendors will eventually copy and make obsolete.
And this is what is happening to Informatica.
Informatica is a specialist “data integration” tools vendor. And that business is going away in favour of putting all the data in a very big and very powerful database engine and crunching it inside that engine.
No one is going to outdo the database vendors in that space.
And certainly not “all things hadoop”.
One of the things I wonder about the hadoop hysteria is this.
The logo is a dancing elephant.
Has anyone who so enthusiastically promotes hadoop and their little dancing elephant actually ever SEEN a dancing elephant?
I have. I went to circuses when I was a kid. They all have “dancing elephants”.
As amusing and interesting as it is to see a “dancing elephant” it is not a creature of great grace and dexterity of movement. They are large lumbering animals that actually respond and move quite slowly because of the mass of their bodies.
How anyone came up with the idea of “hey, lets use a dancing elephant to create the image of flexibility, dexterity, nimbleness and quick responses” is beyond me. I guess some marketing guy who had never seen a dancing elephant came up with THAT one!
So. The future of Informatica?
Microsoft is the company that just wrote off $10 BILLION due to the endless car crash that was the Nokia acquisition.
In a market place that is ON FIRE, smart phones, the worlds largest software seller managed to lose $10 BILLION.
Let me repeat that just in case anyone is wondering just how badly Microsoft messed that one up.
In a market place that is ON FIRE, where Apple makes money hand over fist for third rate over priced phones because some people mistakenly think the phones are “cool”… the worlds largest software maker joined with the prior leader of smart phones and managed to lose $10 BILLION.
What does that tell me?
It tells me that Microsoft has got the acquisition and exploitation of smaller companies down to the same fine art that IBM has always had it! LOL!
After all? In the early 80s, when IBM was dominant, they bought the worlds leading digital telephone company called ROLM. I had a ROLM phone when I was at IBM. It was a miracle of modern technology in 1986.
But no one knows who ROLM were now. They went the same way as so many other IBM acquisitions.
Relegated to the dustbin of “worlds coolest technology acquired by IBM and killed off.”
If Microsoft can not make a go of Nokia? In a market that is booming?
The prospects for our friends and colleagues at Informatica do not look so bright I am afraid.
That is pretty much my take on the situation.
As far as the ETL world goes?
The future of ETL was invented in 2004.
The idea was provided by a friend of mine. I didn’t think of it myself. I do not want to take credit where it is not due to me.
That future is to put as many definitions as possible, for as many aspects of building a data warehouse as possible, in to an excel spreadsheet stored as XML. And then to read that spreadsheet and generate whatever code or perform whatever functions are necessary for design time work from that spreadsheet.
That is the future of ETL development and the vast majority of BI development.
Because it meets all 4 criteria that are the major needs of ETL developers.
- Best possible performance.
- Lowest possible price.
- Easy to use.
- 110% database independent.
Not to mention it is 10 time faster in development, soon to be 100 times faster. SeETL is about to be 100 times faster in development than doing the same thing in a GUI that is pasted on the front of any of the “leading” ETL tools.
And we all know that developer time costs money. Even if the developers are in India or China.
So. To all those people who asked me.
“Hey, Peter, you are Mr. ETL what do you think of this?”
That is my opinion on the matter.
Things are not bright for our colleagues at Informatica, in my humble opinion.
Conversely things are VERY BRIGHT for any enterprising young men who want to be leaders in the development of data warehouses because they can use and leverage SeETL for free in development and for the equivalent of one days consulting work per year to put the generated code in to production.
When you can build ETL subsystems with a tool that costs you EUR800 a year?
Who wants to pay the license fees for Informatica and DataStage?
Not me. That is for sure!
Ok. That’s a wrap for this blog post.
Thanks for dropping by!
I really appreciate it!
Until next time!