If you download the demonstration database on the button below you will find a fully worked example of delta detection for an old Dot Net Nuke site I used to run. Indeed, it is still running but I am not following it closely any more.
The delta detection processing puts a flag on the “delta 01” table to indicate if the delta is an insert, update or delete. It uses this flag to know whether to perform an insert, update or mark a row as deleted in the staging area.
Now that we have made delta detection pretty much normal, if not mandatory, before sending data in to the staging area the following idea occurred to me the other day.
Today, in the detection of deltas for very large tables such as telco call records tables we rely in the “application system source system key” to be set in the ETL processing. This source system key is used to determine if the record will be an insert or an update.
In very large sites the index for this key, even on partitions, can get quite large and the time taken to test the record for insert / update can be quite lengthy.
It occurred to me that we could forward the “delta indicator” from the “delta 01” table to the staging area. Then, for tables where there was purely a row going forward which is inserted or updated this “delta indicator” could be used on read rather than the index being checked on insert / update.
I would imagine this will improve the performance of very large ETL processing loads such as occur in telcos.
I have not introduced this feature in to SeETL at the time of writing. I am putting this idea “out there” to let people who work in the area of very large numbers of rows for ETL that I have come up with this idea and it would be easy to implement if you wanted it added to the product.
The rather obvious proposal being that if you choose to use SeETL for such an environment I will add the feature if you would like to have it.
Most of our customers who have very large processing loads, like the many telcos I have worked in over the years, have gone in to production with Informatica or DataStage as a corporate standard.
For our smaller customers this feature is very much in the camp of “nice to have but it won’t make much difference”.
So if you are doing ETL in a very large number of rows environment and you try out SeETL and you want to have this feature? Please let me know and we will look in to adding it for you!