SeETL042 – Generating SQL ETL In C++ Programs

0
879

Listen To Blog Post


Hello and welcome back to our latest post.

In this blog post we are going to talk about the idea that the sql that is currently generated in SeETL Design Time could also be generated at run time and thereby not require the SQL to be stored in any location.

So, if you download the demonstration database on the button below you will find a fully worked example of how SQL is generated by SeETL DesignTime to be later execute by the scheduler. If you want to follow along with this blog post it might be useful to download the example data warehouse so you can see exactly that I am talking about.

IBI Downloads

SeETL DesignTime is able to generate complete SQL Subsystems as ANSI standard SQL.

There are 3 different configurations for the SQL Generation which produces three different SQL statements series to produce the same result. These were asked for by one of our telco customers and so we added it to our product.

The way that the SQL ETL subsystem is generated is that the SeETL Console command reads the workbook and reads the processing directives and then generates SQL code according to those processing directives.

The SQL ETL DML code is then placed in an output file. One file for each step that is restart repeatable.

This file is then loaded in to the SQL Statements table where it can be run by the scheduler in a secure and auditable environment.

The mapping details of the spreadsheet are also loaded in to the SeETL Dictionary for reporting purposes.

A number of years ago the idea came up that we could use a C++ program, which was passed appropriate parameters, to read the SeETL Dictionary and to generate the SQL needed rather than to generate the SQL at design time.

I proposed this to a number of my customers at the time and their answer was “no thanks”.

The reason they gave was:

“It is hard enough to manage large ETL subsystems where we have the SQL code that is operating in front of us to read. The idea of the code being generate and only sent to a log would concern us when there was a problem.

We would be in the position of having to read the code in the log and then in some way editing the mapping and editing the tables involved to get the ETL to complete. Today we go to the SQL statement involved in the file system. We test run it. We identify the problem. We fix it. We reload it in to the statements table. We restart the scheduler.

That is very fast, very easy, and very reliable. Generating the code in C++ would introduce complexity in to the problem resolution process that would concern us.”

Since I got a number of customers give me pretty much the same answer I decided never to try and build the C++ programs in question.

However, I thought I would put this up as a blog post for anyone to read because this is still an interesting idea, just perhaps not so useful as I originally thought all those years ago!

What is your opinion?

Feel free to say what you want below in the comments section!

Best Regards

Peter

Previous articleSeETL041 – Placing Delta Indicators On Staging Tables
Next articleSeETL043 – SQL Execution From The File System
Peter Nolan
Peter Nolan is one of the worlds leading thought leaders in Business Intelligence. Across his 29+ years in BI Peter has consistently invented new and innovative ways of designing and building data warehouses. SeETL now stands alone as the worlds most cost effective data warehouse development tool.