In my previous post, Why truncating a target table is a bad practice in ETL?, I strongly suggest that we do not truncate our target tables.
In this blog, I’ll tell you the worst possible way I’ve seen to truncate a target table.
Before I get into that, I’ll share with you the experience a business user has shared with me:
….Sherry, can you check the report again? It has failed with the new data load today again.
….What caused it to do that? Has been happening sporadically lately where the whole report just goes down.
….I thought the issue has been resolved since last change.
Before the “last change”, the report has no staging data. It’s pulling large amount of data directly from our data source via OPENQUERY (you can read my other posts to know my opinions about OPENQUERY). If you are lucky, the report will send back data within 5 minutes.
So the last change was made is to pull data every morning and store it in a table.
Here is how this table (in my opinion, this is your target table, not a staging table) is populated every morning:
It is truncated first, then with an attempt to populate it. Unfortunately, almost 5 out of 10 days, the OPENQUERY failed, leaving the target table empty 5 days out of 10. In those 5 lucky days, the target table will remain empty for about 10 minutes, which is how long it takes for the table to be fully populated.
That explains the business users’ experience with the report.
Do you still need more convincing not to truncate your target tables?
I’ve done ETL design and development in both integration projects and in reporting environment.
In master data integration projects, it’s not hard to distinguish staging area from the target area. In the staging area, truncating tables with old data and loading new data is a pretty common practice. I’ have not seen any truncating tables in the target area. This is a good thing. It means that developers understand the role of staging, and understand that loading data into target area needs to be carefully designed, or “reconciled”, as a consultant called it. Depending on the business rules, we will end up either creating new records in the target, or updating existing records, or deleting obsolete records.
In the reporting environment, however, developers have very different attitude and very different understanding of staging VS. target.
This is a very typical architecture I’ve seen so far. The flaw in this design is that there is no target area, only staging. When developers truncate the table in the staging, the report shows empty data between the time when table is truncated and new data finished loading. Or even worse, the report hangs when the loading takes a long time to finish.
You might argue the chance of the coincidence that users run the report at the same time when the data is in the middle of refreshing. Based on my experience, if I hear about it once from the business users, then the chance is big, because the possibility for it to happen again is always there.
Well, the above is not the only type of architectural design I’ve seen. Here is another one. There is no physical staging area in this design. The report is accessing the data source directly with T-SQL code with OPENQUERY. The developer argued that this type of design will eliminate the need of maintaining the staging area on the SQL server. That part is true. But it does not eliminate the emails and phone calls we get from the business users when they are so frustrated because the report takes for ever to load the data. Reporting Services routinely report time out when trying to retrieve large amount of data via OPENQUERY.
The third type of architectural design can avoid many of the problems of long-running reports, or reports with empty data. By separating target from staging, and not truncating the target tables, the reports will always look good.
Back to the title of this post, “why truncating target tables is a bad practice in ETL”. I have to confess that I was one of those developers who made such mistakes until I received enough emails and phone calls from our business users.
There is more to discuss on how to move data from staging to target. Stay tuned.