Peter Andrew Nolan: ETL

SeETL039 – Multi-Level Fact Table Summaries Using SQL

Peter Andrew Nolan — Thu, 16 Apr 2026 19:59:44 GMT

Hello and welcome to this latest blog post!

Well this post has been a loooooong time coming.

Everyone knows that I was trained at Metaphor Computer Systems in the design techniques that Ralph Kimball later made popular through his series of books. But prior to 1995 the mechanisms by which the Metaphor Data Warehouse Developers built their ETL systems were a closely guarded secret.

Even though I was a personal friend of the Metaphor CEO, Cathy Selleck, and I was really the only person pushing Metaphor in Australia, I was not granted the privilege of learning how to build the ETL for dimensional models when I was visiting Metaphors head office in 1993.

So I had to learn the hard way!

Learning how to build multi-level dimensional data models the “Ralph Kimball way” took some time. The cobol code (remember cobol?) was really hard to write the first time.

I soon realised that if I could produce ETL software that made this ETL development easy I might make some money out of that software. This software I built back in 1995 is what has evolved in to the free open source #SeETL product today, 2019.

And if you are interested? Yes, I made a lot of money out of the SeETL product because in the 90s we used it to sell million dollar deals to billion dollar companies. We could build the ETL subsystem for big companies for a much lower cost than anyone else. #SeETL was the “secret sauce” we used to do that.

I built my first terabyte data warehouse using #SeETL Cobol in 1997. It was a great success. We embedded my cobol software in to the Hitachi Data Systems Data Warehousing Offering and we won many large deals with an end to end solution that by that time included Brio Query.

One of the major features of the “Ralph Kimball” way of doing things in the 90s was to have multi-level dimension tables and multi-level fact tables. Indeed, Ralphs database product Red Brick had these features built right in to the database manager.

Since it was hard to sell Red Brick against Oracle and DB2 in Australia in the 90s my #SeETL software achieved the same result as Red Brick only as cobol running on Oracle and DB2.

Brio Query suited the data models we were building very well because the founder of Brio Query was ex-Metaphor. And the final CEO of Metaphor, Chris Grejtak, moved over to Brio Query in 1996. I sold the first 1,000 seat Brio-Query customer in Australia in 1997.

One of the big differentiators I was able to implement for my customers in the 90s was “multi-level dimensions” and “multi-level fact tables.”

“Multi-Level” data gave performance boosts to all queries and in the 90s query performance was an issue.

Ralph and I talked about these things as a “great idea” on the dwlist forum but we were just not getting any real attention on the subject.

By 2003 both Ralph and I had pretty much given up trying to talk to people about why multi-level data inside the data warehouse database was a good idea. So many people were going to SSAS and Essbase Cubes we just wished them well on that journey!

In 2002 I re-wrote the old #SeETL Cobol product in to the then more modern C++. In 2004 we created the innovation of using the mapping workbooks as source code. Storing the workbook as xml and reading it with VB to generate all the objects needed.

Of course, in that period I also converted the cobol code that was needed to manage multi-level data across to C++. During that migration we added a lot of new features to increase the scalability of the attribution, aggregation and consolidation processing.

With the new C++ version we no longer had “generated code”. We had C++ programs that discovered the data structures of the source and target tables and did all the processing. We went from the situation with cobol of needing one program per function per mapping to just one program per function.

Whereas in the cobol #SeETL product it was not at all unusual to generate 300,000+ lines of cobol code that would then need to be maintained. On one project we even broke the 500,000 lines of generated cobol code line!

In the new 2002 C++ #SeETL, the version you can download today, we did not generate any code. We had programs that would “internally configure” themselves to adapt to each mapping. We separated the algorithm from the data structures.

This is what made C++ #SeETL so different. And this is why I decided to sell it for EUR20,000 per copy in 2003!

The attribution, aggregation, consolidation processing was MUCH improved across many dimensions. It ran faster. It scaled further. It ran on more hardware. It talked to more databases.

Indeed, in 2003 we used #SeETL to build the prototype data warehouse for Saudi Telecom who had 20 million customers and 60 million CDRs per day. When we took delivery of our 18 CPU Sun 12K and 15TB EMC storage system we put #SeETL on to it and ran it through it’s paces.

Using just 6 CPUs we could get the CDR processing done in about 6 hours. C++ #SeETL could be used all the way up to telcos quite successfully. We used it again on Orange Romania in 2005, Electronic Arts in 2006, Carphone Warehouse in 2008, and SkyTalk in 2010.

I also sold copies to consulting companies who wanted to do ETL but did not want to buy DataStage or Informatica which were licensed by CPUs back in those days.

When we did Carphone Warehouse in 2009 Brian Ganly suggested that rather than use the C++ version of #SeETL we could generate the ETL subsystem as SQL and run it on Netezza. I resisted this as “totally crazy” for a couple of months. But Brian can be very persuasive and so I finally spent a weekend testing this theory. That Netezza was so fast that it could do the 80M CDR batch of records in one statement for the attribution processing.

In testing this was taking about 5 hours to get through on the development machine. We were running C++ #SeETL on a linux server and accessing the data on the netezza machine and then reloading the finished records on to the netezza machine.

In the testing I did on that fateful weekend the small development netezza machine was able to process the 80 million CDRs with 20 dimension table lookups in just 20 minutes.

On the production machine it was 5 minutes.

Brian Ganlys persistence had paid off. Sean Kelly and I were now believers that ETL subsystems could be written in SQL for Netezza.

There was, of course, one problem. The multi-level data C++ code could not be converted easily in to SQL because of the large amount of functionality that was included in it.

People who had Netezza didn’t care because all queries were fast anyway!

But for the people on SQL Server, this was still an issue. And so we continued to use the C++ version of #SeETL on SQL Server projects in order to get the mult-level summary fact table functionality.

Over the years since 2009 I have made a number of attempts to figure out if I could generate the code needed to create multi-level fact tables in SQL. Each time I ended up thinking “nope, can’t figure that one out”.

I was recently asked a different question that prompted a new way of looking at that code and today I decide today was the day I would have another crack at this problem.

And I cracked the case!!

As funny as it is now. The way to make this work is deceptively simple but it required me thinking about how to perform the processing very differently to how the processing has “always been done”. My problem over the last 10 years is every time I looked at trying to solve this problem as SQL I tried to emulate the “way we have always done it” in SQL and not start with a “blank slate”.

So here for your reading enjoyment is how you can create a multi-level fact table for sales transactions using SQL. Note that we are still using the C++ programs to create the multi-level dimensions. But since they are incrementally updated and dimension tables are so small we have no intention of trying to migrate that to SQL.

Firstly, lets check the #SeETL aggregation control table. It looks like this:

create table dbo.ctl_aggregation_control(pk_aggregate_number integer, run_type varchar(20), fact_table_name varchar(256), number_of_dimensions integer, level_dimension_1 integer, level_dimension_2 integer, level_dimension_3 integer, level_dimension_4 integer, level_dimension_5 integer..., level_dimension_50 integer

You give each row an aggregate number which must be unique. We generally use number ranges of 100 for each fact table. This allows 100 summary levels per fact table.

The run type is set to ‘always’ for daily processing.

The fact table name is the #SeETL mapping name for the fact table.

You tell it how many dimensions there are in this fact table. You have a maximum of 50.

You then tell it the level of aggregate in each dimension starting at dimension 1.

Each row in this table translates in to one summary level inside one fact table.

So if your first dimension is time. And the detailed level is “day” and level 1 is “week” and level 2 is “month” and level 3 is “quarter” and so on.

If you put 2 in level_dimension_1 then that level of aggregate will summarise to monthly data.

To show you how the levels of the keys work. td0005 is our day dimension table in BI4ALL. It has 9 summary level keys on it as well as the detailed level keys. The attibution view looks like the following.

create view [dbo].[z01_vm_day_01_at]as selecttd0005.dim_char_ky_fld pk_dim_char_ky_fld,td0005.pk_td0005 dk_z01_vm_day_01,td0005.td0005_key_ag1 z01_vm_day_01_key_ag1,td0005.td0005_key_ag2 z01_vm_day_01_key_ag2,td0005.td0005_key_ag3 z01_vm_day_01_key_ag3,td0005.td0005_key_ag4 z01_vm_day_01_key_ag4,td0005.td0005_key_ag5 z01_vm_day_01_key_ag5,td0005.td0005_key_ag6 z01_vm_day_01_key_ag6,td0005.td0005_key_ag7 z01_vm_day_01_key_ag7,td0005.td0005_key_ag8 z01_vm_day_01_key_ag8,td0005.td0005_key_ag9 z01_vm_day_01_key_ag9from dbo.td0005 td0005where level_col = 'detail'

So you can see that the aggergate keys are on the lookup table / view for each dimension that has aggregates possible.

So now lets look at the sql statement that is needed to perform this processing. I will put my comments inside the code.

Firstly, just truncate the work tables.

truncate table xxxxx.dbo.z01_vf_sale_txn_03_swk1 ;

truncate table xxxxx.dbo.z01_vf_sale_txn_03_swk2 ;

insert into xxxxx.dbo.z01_vf_sale_txn_03_swk1(pk_aggregate_number,pk_z01_vm_day_01,pk_z01_vm_product_01,pk_z01_vm_customer_01,pk_z01_vm_customer_demographic_01,pk_z01_vm_geography_01,pk_z01_vm_currency_01,pk_z01_vm_sale_txn_type_01,pk_z01_vm_sale_txn_status_01,pk_z01_vm_unit_of_measure_01,sale_extended_amount,cost_extended_amount,tax1_extended_amount,tax2_extended_amount,discount_extended_amount,sale_units,number_sales)select

— select out the aggregate number as it goes on the target summary table

ctl_aggregation_control.pk_aggregate_number

— this was the BIG innovation. Rather than moving the key forward in the generated code which is how this has— “always been done”. I used a case statement that selected the right key at run time.— This was the idea that had always escaped me. It’s so simple once you do it!—— Notice that we are looking at the ctl_aggregation_control.level_dimension_1 to find the level of the key to use— based on that level we retrieve the correct key from the lookup table and send it forward.—,case ctl_aggregation_control.level_dimension_1when 0 then coalesce(dk_z01_vm_day_01, 0 )when 1 then coalesce(z01_vm_day_01_key_ag1, 0 )when 2 then coalesce(z01_vm_day_01_key_ag2, 0 )when 3 then coalesce(z01_vm_day_01_key_ag3, 0 )when 4 then coalesce(z01_vm_day_01_key_ag4, 0 )when 5 then coalesce(z01_vm_day_01_key_ag5, 0 )when 6 then coalesce(z01_vm_day_01_key_ag6, 0 )when 7 then coalesce(z01_vm_day_01_key_ag7, 0 )when 8 then coalesce(z01_vm_day_01_key_ag8, 0 )when 9 then coalesce(z01_vm_day_01_key_ag9, 0 )else 0end

— Same for products

,case ctl_aggregation_control.level_dimension_2when 0 then coalesce(dk_z01_vm_product_01, 0 )when 1 then coalesce(z01_vm_product_01_key_ag1, 0 )when 2 then coalesce(z01_vm_product_01_key_ag2, 0 )when 3 then coalesce(z01_vm_product_01_key_ag3, 0 )when 4 then coalesce(z01_vm_product_01_key_ag4, 0 )when 5 then coalesce(z01_vm_product_01_key_ag5, 0 )when 6 then coalesce(z01_vm_product_01_key_ag6, 0 )when 7 then coalesce(z01_vm_product_01_key_ag7, 0 )when 8 then coalesce(z01_vm_product_01_key_ag8, 0 )when 9 then coalesce(z01_vm_product_01_key_ag9, 0 )else 0end

— Same for partys

,case ctl_aggregation_control.level_dimension_3when 0 then coalesce(dk_z01_vm_party_1001, 0 )when 1 then coalesce(z01_vm_party_1001_key_ag1, 0 )when 2 then coalesce(z01_vm_party_1001_key_ag2, 0 )when 3 then coalesce(z01_vm_party_1001_key_ag3, 0 )when 4 then coalesce(z01_vm_party_1001_key_ag4, 0 )when 5 then coalesce(z01_vm_party_1001_key_ag5, 0 )when 6 then coalesce(z01_vm_party_1001_key_ag6, 0 )when 7 then coalesce(z01_vm_party_1001_key_ag7, 0 )when 8 then coalesce(z01_vm_party_1001_key_ag8, 0 )when 9 then coalesce(z01_vm_party_1001_key_ag9, 0 )else 0end

— Same for demographics

,case ctl_aggregation_control.level_dimension_4when 0 then coalesce(dk_z01_vm_party_demographic_1001, 0 )when 1 then coalesce(z01_vm_party_demographic_1001_key_ag1, 0 )when 2 then coalesce(z01_vm_party_demographic_1001_key_ag2, 0 )when 3 then coalesce(z01_vm_party_demographic_1001_key_ag3, 0 )when 4 then coalesce(z01_vm_party_demographic_1001_key_ag4, 0 )when 5 then coalesce(z01_vm_party_demographic_1001_key_ag5, 0 )when 6 then coalesce(z01_vm_party_demographic_1001_key_ag6, 0 )when 7 then coalesce(z01_vm_party_demographic_1001_key_ag7, 0 )when 8 then coalesce(z01_vm_party_demographic_1001_key_ag8, 0 )when 9 then coalesce(z01_vm_party_demographic_1001_key_ag9, 0 )else 0end

— Same for geography

,case ctl_aggregation_control.level_dimension_5when 0 then coalesce(dk_z01_vm_geography_01, 0 )when 1 then coalesce(z01_vm_geography_01_key_ag1, 0 )when 2 then coalesce(z01_vm_geography_01_key_ag2, 0 )when 3 then coalesce(z01_vm_geography_01_key_ag3, 0 )when 4 then coalesce(z01_vm_geography_01_key_ag4, 0 )when 5 then coalesce(z01_vm_geography_01_key_ag5, 0 )when 6 then coalesce(z01_vm_geography_01_key_ag6, 0 )when 7 then coalesce(z01_vm_geography_01_key_ag7, 0 )when 8 then coalesce(z01_vm_geography_01_key_ag8, 0 )when 9 then coalesce(z01_vm_geography_01_key_ag9, 0 )else 0end

— And where there are no aggregation levels available we do not need the case statement

,coalesce(dk_z01_vm_currency_01, 0 ),coalesce(dk_z01_vm_sale_txn_type_01, 0 ),coalesce(dk_z01_vm_sale_txn_status_01, 0 ),coalesce(dk_z01_vm_unit_of_measure_01, 0 )

— And we send the data forward out of the #SeETL generated view.

,sale_extended_amount,cost_extended_amount,tax1_extended_amount,tax2_extended_amount,discount_extended_amount,sale_units,number_sales

— This is an input view that is created in the #SeETL workbook

from xxxxx.dbo.z01_vf_sale_txn_03 z01_vf_sale_txn_03

— We inner join to the aggregation control table using the mapping name.— This is repeated here for documentation and to make it obvious which table is being processed.

inner join xxxxx.dbo.ctl_aggregation_control ctl_aggregation_control on 1=1and ctl_aggregation_control.fact_table_name = ‘z01_vf_sale_txn_03’

— We left join on the dimension table lookup views. These might also be extract out in to their own— small lookup tables to get better processing speeds.

left join xxxxx.dbo.z01_vm_day_01_at z01_vm_day_01_aton z01_vf_sale_txn_03.char_key_z01_vm_day_01 = z01_vm_day_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_product_01_at z01_vm_product_01_aton z01_vf_sale_txn_03.char_key_z01_vm_product_01 = z01_vm_product_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_party_1001_at z01_vm_party_1001_aton z01_vf_sale_txn_03.char_key_z01_vm_party_1001 = z01_vm_party_1001_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_party_demographic_1001_at z01_vm_party_demographic_1001_aton z01_vf_sale_txn_03.char_key_z01_vm_party_demographic_1001 = z01_vm_party_demographic_1001_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_geography_01_at z01_vm_geography_01_aton z01_vf_sale_txn_03.char_key_z01_vm_geography_01 = z01_vm_geography_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_currency_01_at z01_vm_currency_01_aton z01_vf_sale_txn_03.char_key_z01_vm_currency_01 = z01_vm_currency_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_sale_txn_type_01_at z01_vm_sale_txn_type_01_aton z01_vf_sale_txn_03.char_key_z01_vm_sale_txn_type_01 = z01_vm_sale_txn_type_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_sale_txn_status_01_at z01_vm_sale_txn_status_01_aton z01_vf_sale_txn_03.char_key_z01_vm_sale_txn_status_01 = z01_vm_sale_txn_status_01_at.pk_dim_char_ky_fld

left join xxxxx.dbo.z01_vm_unit_of_measure_01_at z01_vm_unit_of_measure_01_aton z01_vf_sale_txn_03.char_key_z01_vm_unit_of_measure_01 = z01_vm_unit_of_measure_01_at.pk_dim_char_ky_fld

where 1=1

— I have repeated the constraint on the fact table name just for documentation purposes

and ctl_aggregation_control.fact_table_name = ‘z01_vf_sale_txn_03’

— I have put the aggregate number in to the query just for documentation purposes.

and ctl_aggregation_control.pk_aggregate_number in ( 101 , 102 , 103 , 104 , 105 , 106 )

;

— The above query will write all the detailed records to the sort work file but the integer keys at the front of the record will— be set to the correct aggregate key.

— So now the records simply need to be summed as follows:

insert into xxxxx.dbo.z01_vf_sale_txn_03_swk2(pk_aggregate_number,pk_z01_vm_day_01,pk_z01_vm_product_01,pk_z01_vm_customer_01,pk_z01_vm_customer_demographic_01,pk_z01_vm_geography_01,pk_z01_vm_currency_01,pk_z01_vm_sale_txn_type_01,pk_z01_vm_sale_txn_status_01,pk_z01_vm_unit_of_measure_01,sale_extended_amount,cost_extended_amount,tax1_extended_amount,tax2_extended_amount,discount_extended_amount,sale_units,number_sales)selectpk_aggregate_number,pk_z01_vm_day_01,pk_z01_vm_product_01,pk_z01_vm_customer_01,pk_z01_vm_customer_demographic_01,pk_z01_vm_geography_01,pk_z01_vm_currency_01,pk_z01_vm_sale_txn_type_01,pk_z01_vm_sale_txn_status_01,pk_z01_vm_unit_of_measure_01,sum(sale_extended_amount),sum(cost_extended_amount),sum(tax1_extended_amount),sum(tax2_extended_amount),sum(discount_extended_amount),sum(sale_units),sum(number_sales)

from xxxxx.dbo.z01_vf_sale_txn_03_swk1group bypk_aggregate_number,pk_z01_vm_day_01,pk_z01_vm_product_01,pk_z01_vm_customer_01,pk_z01_vm_customer_demographic_01,pk_z01_vm_geography_01,pk_z01_vm_currency_01,pk_z01_vm_sale_txn_type_01,pk_z01_vm_sale_txn_status_01,pk_z01_vm_unit_of_measure_01;

Now, in sort work 2 you have the multiple levels of summary fact records that are the result of the current cycle of processing.

Where the key combination does not exist in the target summary fact table you can perform an insert.

Where the key combination does exist in the target summary fact table you need to consolidate the data in the target summary fact table with the data in the sort work 2 table and then perform an update back in to the summary fact table.

Those operations are so simply they don’t really justify a spot on the blog.

This new invention, at least I have never seen of it or heard of it before, means this.

It means that anyone can build multi-level summary fact tables using SQL using #SeETL for free as an open source tool.

This means that if you are having performance problems with your data warehouse and you would like to have more summary fact tables? Or if you have lots of cubes for your summaries and you would like to have summary fact tables to load your cubes? Or you would like to get rid of some of your cubes?

You can copy this model of processing and save your self a massive amount of query processing.

For those of you who have MicroStrategy? The multi-level models we implement based on Ralphs good help can be used, as is, with MicroStrategy. All we do is put views over the tables to create the needed MicroStrategy hierarchy for the dimension tables and we also create views over the summary fact table and alter the key column names so that the MicroStrategy schema designer can properly link up the views to the correct levels of the dimension hierarchy. Ok?

Well, Ladies and Gentlemen?

This has been one of the most EPIC posts I have ever done!

This “problem” of how to do multi-level summary fact tables in SQL has been playing on my mind for 9 years now and I could not figure out how to get it to work!

Now that I have gotten it to work and #SeETL is free?

I am pleased to put it “out there” because the more people who adopt the idea of multi-level summary fact tables the better our delivery capability to our customers will be.

Multi-level summary fact tables give you a BIG query performance boost for almost zero cost in disk. And now that the ETL for it is just SQL and the updates are incremental per batch cycle? There is very little extra processing required to create and maintain multi-level summaries inside the database. If you are a MicroStrategy user you are in luck because this is exactly the sort of data model MicroStrategy need, without all the extra code of one mapping per dimension level and one mapping per summary level.

So have fun working on implementing your first multi-level summary fact tables!

If you would like me to implement yours for you?

I am currently (2019-09-03) charging EUR50/USD60 for working from my home office.

I sincerely hope that rate will go back up to where it really belongs as I get busier!

Best Regards

Peter

The post SeETL039 – Multi-Level Fact Table Summaries Using SQL appeared first on Instant BI.

IBI-070-Running SSIS Packages From SeETL

Peter Andrew Nolan — Sat, 16 Aug 2025 21:45:00 GMT

Note: You can listen to the blog post on the video or read the blog post.

Hello and Welcome.

I am Esther.

I am Peters A I Assistant to create voice overs.

I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.

Hello and welcome Gentlemen.

As I have recently announced on my channel.

I will be doing some opinion pieces and response videos.

In this blog post I wanted to talk about a recent finding with SQL Server Integration Services.

All around the world millions of people use Integration Services.

It’s not the worlds greatest ETL product, but it is good enough for a lot of situations.

This is especially true of smaller sites with less extensive needs in their ETL tool.

However, one of the problems that all the ETL tools have, and SSIS is no exception.

It is the problem of creating multiple streams of parallel processing which include semaphores to close out one stream of parallel processing and to start another.

Another problem having set up such a package is to be able to restart it properly after a failure.

If you are the owner of a data warehousing company that implements ETL using SSIS?

You know these two problems.

Of course, everyone knows I started in the IBM Mainframe world and IBM solved these problems in the nineteen seventies.

When I joined IBM in nineteen eighty six one of the many things I had to learn was how to write very complex Job Control Language jobs to be able to handle very large and complex batch loads.

So when I wrote the scheduler for See TL I included what I had learned all those years ago.

Why is this important to you as a man who runs his own data warehouse consulting company using SSIS as your standard ETL software?

In the last week or so I have had to do a crash course on SSIS.

One of the questions that occurred to me is whether it was possible to run SQL Server Jobs under SQL Server Agent using the see TL scheduler.

What is needed is the ability to run the job, detect errors that are fatal errors, and then to stop the stream if a fatal error has been detected.

Of course, once the error is fixed, in see TL the restart does not require any alteration to any code.

You just restart the batch.

This can be done from a Report Services Report now in one of the versions I am supporting.

But rest assured.

Last week I tested the ability of the see TL scheduler to execute a stored procedure, which then executes a job, which can detect if the job fails, and then return a non zero return code to the see TL scheduler to stop the job stream.

What does this mean for you mister, I own a data warehouse consulting company?

It means with SSIS you can now do the following, for free, with our see TL scheduler.

Consider the normal ETL batch.

The first thing you need to do is to extract all the source data from all your sources and land the data into the landing area.

Given you have SQL Server you want to run parallel streams to do that to make maximum use of the network bandwidth and your CPUs to make the overall process as fast as possible.

Say you have two hundred source tables from your source system.

This is quite normal for business central data warehouses.

So you would want at least four parallel streams on a sixteen core SQL server.

You would balance out the big tables with the smaller tables in terms of number of steps per stream.

But just for this blog post lets say all extract and landing jobs are the same so you want four streams of fifty jobs.

You need to do some initialisation for the overall batch.

So you will have a group oh oh one.

And then you need a semaphore group oh oh two.

And then you can have four groups oh oh three through to group oh oh six.

Those four groups will be able to run at the same time.

And now they can be SQL Server jobs in SQL server agent running SQL Server Integration Services Packages.

Then you have another semaphore group called group oh oh seven.

Group oh oh seven can not run until all four prior groups have completed.

If one group has a failed command the other three groups keep running.

When the error is fixed the support person can simply ask for a restart and the scheduler will restart just the failed stream and not touch the other in progress streams.

We called this in flight restart of partially failed batches.

Not even IBM had this back in the day.

Certainly SQL Server does not have it today despite the fact I wrote my scheduler in two thousand and three.

Once all four groups are finished and you have processed your two hundred inputs you will want to process your delta detection and send data from the landing area to the staging area.

Only this time you might want eight parallel groups.

So you create groups oh oh nine to oh oh sixteen and you run eight parallel groups of SSIS packages to perform your delta detection for your two hundred input tables.

And so this goes on.

Let us say four more groups for dimension table processing.

Let us say four more groups for fact table processing.

And then a last semaphore group and a final group for finalization processing of the batch.

Pretty soon you are up to around thirty processing groups that have to all be defined and managed.

The wonderful thing about the see TL scheduler is that you can move processing from group to group between the batch runs as easily as cut and paste in a spreadsheet because that is where the schedule is maintained.

Over the period of a week or so you can balance out the processing in the streams to make the overall batch stream run faster by having the SQL Server loaded to the maximum possible level all through processing.

When we were doing this with Data Stage we were able to run the data stage batches thirty percent faster than Data Stage was able to run the exact same batch workload.

We haven’t tested it to that level with SQL Server and Integration Services.

But I would be surprised if we were not able to cut thirty percent off the run time of a sizable ETL stream.

In the past we have reduced ETL run times by up to seventy percent when we replaced ETL systems built with stored procedures.

So there is every chance that just using our scheduler will reduce run times by up to thirty percent.

I would be interested to see someone try it out.

As a man who owns his own data warehousing consulting company using SSIS?

You know the amount of time it takes to run ETL streams and you know that for a lot of that time the SQL Server is not that heavily loaded.

See TL will load up the SQL Server and cut your run times.

See TL will mean that when a batch fails there is no manual change required to any packages to restart the batch and run it to completion.

And all that is available for free from my web site.

You can have the see TL scheduler for free.

And you can test and run exactly what I have talked about for free.

If you like it?

Keep it and use it.

If you want a version of just the scheduler with your own company brand name on it?

Please just ask.

We will work out a price.

And please remember, you can just buy a branded version of see TL from a partner or you can get one for yourself for the very low price of ten thousand euros or dollars.

I just want it to be a nice round number either way.

So.

In summary?

Creating large batch schedules for processing large volumes of data and tables for data warehouses takes quite a bit of effort in SSIS.

If you have production accounts doing this?

You can have my see TL scheduler for free.

My see TL scheduler can now schedule SSIS packages properly.

Including stopping processing for serious errors that require processing to stop.

The restart is as simple as going to a report services report and selecting the failed batch and setting the report parameter to restart the batch to yes.

All in all?

A much simpler way to run your already existing SSIS ETL system.

You will get faster overall run times, fewer failures, and fewer days when the data is not ready for your users at nine in the morning.

And with that?

I hope you found this blog post interesting and informative.

Thank you very much for your time and attention.

I really appreciate that.

Best Regards.

Esther.

Peters A I Assistant.

The post IBI-070-Running-SSIS-Packages-From-SeETL Publish Options appeared first on Instant BI.

Thanks for reading! This post is public so feel free to share it.

IBI-047-SeETL for MicroStrategy Users

Peter Andrew Nolan — Sun, 18 Aug 2024 00:06:39 GMT

Note: You can listen to the blog post on the video or read the blog post.

Hello and Welcome.

I am Esther.

I am Peters A I Assistant to create voice overs.

Peter is using me as his Assistant, because men prefer to listen to a woman’s voice.

I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.

Hello and welcome Gentlemen.

I’d like to say thank you very much, for coming along and listening to my latest blog post.

As you know, I have been banned off LinkedIn for some years.

Now that I am back, I will post items that I think are of general interest to people in what is now loosely called the data community.

One area I have an interest is the cost of E T L development for MicroStrategy users.

Let me tell you a story.

You will find this very amusing.

In ninety ninety three, I was involved in selling Metaphor Computer Systems, Data Interpretation System, to Coles Myer in Australia.

Coles Myer was a big I B M customer, and they are the largest revenue generating company in Australia.

In nineteen ninety three, Coles Myer had about twenty five percent of the retail market in Australia.

We tried really hard on the sale.

We left no stone unturned.

Eventually Coles Myer made a terrible mistake and bought I R I Express.

They would try and build their reports on the top of I R I Express for the next three years.

Of course, I R I Express would never, ever, scale to meet their needs.

So in nineteen ninety seven they finally abandoned I R I Express.

I made a number of calls again, to try for this next project.

We did not get considered.

They went with Oracle and PRISM Solutions for the E T L software.

They went for MicroStrategy on the front end because that was the only product in the market that would scale to their needs.

I stayed in touch with Coles Myer all those years, just in case an opportunity ever came up, where I could get some work.

The whole time I was promoting my version of E T L software to Coles Myer.

But being Coles Myer, they went for PRISM Solutions software, and that’s fair enough.

About twelve months later, I became the Professional Services Manager for Ardent Software, when they bought PRISM Solutions.

One of my first calls was back with Coles Myer to inform them I was now responsible for their account.

The gentlemen in question had the first name of Bernie.

Given I was now responsible for their E T L development team, Bernie decided to show me their data models.

These were printed and laid out on a large conference room table.

I walked into the room and took one look at the model and said, quote, oh my God Bernie, what have you done, end quote.

Bernie said, quote, that does not sound good, end quote.

I looked over the models and could instantly tell that they had a physical table for each level of each multi-level dimension.

I could also see they had a physical table for each level of each summary level fact table.

They had HUNDREDS of tables in this data model.

I said to Bernie.

You have a physical table for each level of each dimension, and each level of each fact table, and so you must have PRISM mappings for each table.

Bernie said yes, my presumption was correct.

I said to him, quote, my God Bernie, I have been coming here telling you for five years, that you have one physical table for each dimension and each summary level fact table, and you partition those tables on a level column. That way you only have one mapping, one piece of E T L, for each, end quote.

His response was hilarious.

It was, quote, yes, I know you told me that, but I didn’t believe you, end quote.

I asked Bernie, who told him to develop his E T L and data models this way.

He said MicroStrategy told him this was how it should be done.

I checked across many MicroStrategy accounts.

Sure enough, that was the advice MicroStrategy gave to accounts.

The reason is simple enough.

If MicroStrategy can get a client to adopt putting each level of each dimension, and each level of each different fact table, into a different physical table, there are no other products, that can read that data model.

You will be a MicroStrategy customer for ever, if you design your tables like that.

Don’t get me wrong.

MicroStrategy is my number two favourite end user software behind Meta5.

They did a great job with aggregate level navigation.

They are the only company that implemented aggregate level navigation properly.

But the underlying data model can be multi-level tables the same way that Metaphor designed.

And those tables can be read by any B I product.

So, all jokes aside.

There are a lot of MicroStrategy accounts in the world today.

Almost all of them could reduce their cost of E T L development, and support, if they switch to see T L, which is already multi-level aware.

Let me give you the simplest of examples.

In a MicroStrategy implementation you have a day dimension, a week dimension, a month dimension, a quarter dimension, and a year dimension.

That’s five dimension tables.

That’s five pieces of E T L to maintain.

In my B I 4 ALL models you have one table.

T D 0 0 0 5.

This is a multi level table that contains all levels of time.

It then presents out V M day view, V M week view, V M month view, V M quarter view and V M year view.

One table.

One piece of E T L. .

Five views.

When you start to multiply that out across the number of dimensions and fact tables you have today, the savings a MicroStrategy customer can get from changing to see T L are very large.

I have even tested keeping the multiple tables, and using a union view over them, to be able to replace existing E T L with see T L. .

It works on S Q L Server.

So, it will most certainly work on Oracle, Teradata, and D B 2.

So, if you are a MicroStrategy customer?

And you would like to cut your costs of E T L development and support?

Download see T L and give the old C plus plus dimension table processing a run for your money.

You will find that you can replace your old E T L for pennies.

Your future E T L will cost less, and be more reliable.

It is frustrating to me that MicroStrategy did this.

But you can’t blame Michael Saylor for wanting to lock in his clients, if he can.

That’s just good business.

Until your customers find out you did it.

See T L will cut the cost of E T L development and support for MicroStrategy customers.

Especially those customers using hand coded SQL for your E T L systems.

Now.

I hope you found this blog post interesting and informative.

If you are a MicroStrategy customer and you want to reduce your costs of E T L development?

You are one click away from getting started.

Thank you very much for your time and attention.

I really appreciate that.

Best Regards.

Esther.

Peters A I Assistant.

IBI Downloads

IBI Videos

Carphone Warehouse Reference Video:

The post IBI-047-SeETL for MicroStrategy Users appeared first on Instant BI.

IBI-046-SeETL For Informatica / DataStage Users

Peter Andrew Nolan — Sun, 18 Aug 2024 00:02:49 GMT

Note: You can listen to the blog post on the video or read the blog post.

Hello and Welcome.

I am Esther.

I am Peters A I Assistant to create voice overs.

Peter is using me as his Assistant, because men prefer to listen to a woman’s voice.

I will simply read Peters blog posts, so that you have a choice of reading the blog post, or listening to my voice.

Hello and welcome Gentlemen.

I’d like to say thank you very much, for coming along and listening to my latest blog post.

As you know I have been banned off LinkedIn for some years.

Now that I am back, I will post items that I think are of general interest to people in what is now loosely called the data community.

One area I have an interest is the cost of E T L development using tools like Informatica and DataStage.

Just to let you know, I ran Ardent Professional Services for Asia Pacific in the late 90s.

I also implemented DataStage into Saudi Telecom in 2003, and Orange Romania in 2004.

Indeed, many of my inventions from Saudi Telecom, and Orange Romania, went into DataStage.

When I joined Ardent, in 1998, I sent all my software to Jason Silvia who was head of development.

I explained to him, how I was able to generate E T L, and suggested Jason get his people to look at my software.

Jason came back reporting that his best people were astonished at the idea of generated E T L.

He said they could not find a way to adopt my ideas into DataStage.

So we left it at that.

I also implemented Informatica at such places as Lindorff Financial in Norway, New Jersey Media Group in the U S A, Electronic Arts from the U S A and in Talk Talk in the U K.

So, I am very familiar with implementing Informatica as well.

I am sure the products have moved on a bit since I last used them.

However, for data warehouses, they remain fundamentally the same.

Having done many implementations with both products, both with see T L and not with see T L, I am very well aware of the productivity profiles of both products.

Fundamentally, the G U I makes it possible for low skilled programmers to write mappings in either of DataStage or Informatica.

And that is what is done today.

Specifications are written by someone of a relatively high skill level.

Then the Informatica or DataStage jobs are written by someone of relatively low skill level.

See T L removes the need to have the low skill level person write the Informatica or DataStage jobs.

See T L puts the DataStage and Informatica developers out of a job and replaces them with the higher skilled person.

Simply put.

When I was at Orange Romania we invented the idea of saving the mapping spreadsheet as an X M L document from Excel.

This was a newly introduced feature in two thousand and four when we were doing the project.

It came in with Office X P.

So what we did at Orange Romania was to develop all E T L using see T L. .

Then, at the end of the project, we migrated the see T L, E T L to Datastage jobs.

This was the first time we did it quite like this.

We had our issues and we worked out the kinks.

My next project after Orange Romania was Electronic Arts which was an Informatica site.

We did exactly the same thing and it worked just fine.

What I pioneered in two thousand and four to six, was to prove see TL could be used to develop E T L in it’s own right.

And for large companies that wanted a name brand E T L tool, we could migrate to that tool at the end of a project, in about two weeks work.

So, if you are a DataStage or Informatica site?

Or if you are consultants using DataStage or Informatica?

You can cut your costs of E T L development by fifty percent, or more, simply by adopting see T L in the development phase.

If you are consultants?

Using see T L will give you a fifty percent cost reduction in E T L development for new projects over your competitors.

You will get this fifty percent advantage, even if you go into production with Informatica or DataStage.

Of course, the reason I B M and Informatica did not want to tell anyone this was possible, was because if customers saw they could build their whole E T L system with see T L, many would not go into production with Informatica or DataStage.

So, both I B M and informatica, hid see T L from their customers, and prospects.

You might want to remember that.

So, if you use DataStage or Informatica?

You can cut your costs of E T L development just by downloading and getting started with see T L today.

Lastly, I thought I would share the story of how the C plus plus version of see T L came about.

I wrote an article for Ralph Kimball for his D B M S magazine in two thousand and one.

This was about how to make money using business intelligence.

The article was very well received.

A little while later I was working at North Jersey Media Group in New Jersy.

I was using Informatica as Sybase was an Informatica reseller.

It turned out that I was having to change my data models in order to accommodate the Informatica processing.

I was writing to Ralph and complaining about how these very expensive E T L tools were forcing me to make changes to my data models.

Ralph jokingly sent back an email saying.

If you are so smart, why don’t you write me an article on the top ten features all E T L tools should have?

I would love such an article and I think it would go over well with my audience.

I wrote back and told him that’s not a bad idea at all.

And I started writing the list.

However, I was very busy on the project and the list got put aside not long after.

When I finished the project I got back to my list, and completed the article.

I sent Ralph a draft of the article and he was very impressed.

However, he told me his tenure at D B M S magazine would soon be coming to a close, and so he did not believe the article would be published.

Nevertheless, we talked about the article as it was.

He asked me how much time and money an E T L tool with all those features would save.

I told him I would guess such a tool, would easily cut the cost of E T L development in half, over the current best practices for Informatica and DataStage.

Then Ralph said something that would change my life, again.

He said.

Well? If you are so smart?

Why don’t you write that E T L tool?

I started back with, but I don’t even know C plus plus.

But over the course of a few weeks the idea really got under my skin.

I knew that if I could write such an E T L tool, I could easily sell it for twenty thousand euros per copy.

And so I set about learning C plus plus, and writing the very first C plus plus version of see T L. .

And the rest, as they say, is history.

We used this very first version at Saudi Telecom to come in way under budget for their operational data store project.

It was at Saudi Telecom we added memory mapped I O, and the ability to scale fact table processing linearly.

In testing at Saudi Telecom, we had over 200 million C D Rs, twenty million customer and account records, and we were running on a Sun 18 K with 18 C P ewes.

My see TL software could split up the C D Rs into 100 separate files and then process those separate files down many parallel processing programs.

The parallel processing programs were able to share the same lookup tables, and also maintain a unique big integer at the front of each record for the primary key.

My customer, Knowledge Net, could not believe that we could process these volumes using the software I wrote.

But they had already sold DataStage and we had to go into production with DataStage.

So, as early as 2004.

We knew that see T L could handle twenty million customer records, twenty million account records, and two hundred million call records, for a telco.

Obviously, telcos want to buy name brand software for E T L. .

But it was very clear I could sell see T L for twenty thousand euros per copy.

And so I did.

I sold three copies to the richest man in Australia.

I sold another copy to Key Work Consulting in Germany.

The fifty percent, or more, reduction they got in their development costs helped them grow very rapidly.

Key Work Consulting were a reference account for me for many years.

When I divorced my wife of eighteen years in two thousand and seven, she gave me a lot of trouble, and so I was not able to sell see T L as an independent product any more.

My loss is your gain.

You can have the last public release of see T L that was selling for twenty thousand euros per copy, for free.

Now.

I hope you found this blog post interesting and informative.

If you want to reduce your costs of E T L development?

You are one click away from getting started.

Thank you very much for your time and attention.

I really appreciate that.

Best Regards.

Esther.

Peters A I Assistant.

IBI Downloads

IBI Videos

Carphone Warehouse Reference Video:

The post IBI-046-SeETL For Informatica DataStage Users appeared first on Instant BI.