Home > SSIS Best Practices > Why truncating a target table is a bad practice in ETL?

Why truncating a target table is a bad practice in ETL?

I’ve done ETL design and development in both integration projects and in reporting environment.

In master data integration projects, it’s not hard to distinguish staging area from the target area. In the staging area, truncating tables with old data and loading new data is a pretty common practice. I’ have not seen any truncating tables in the target area. This is a good thing. It means that developers understand the role of staging, and understand that loading data into target area needs to be carefully designed, or “reconciled”, as a consultant called it. Depending on the business rules, we will end up either creating new records in the target, or updating existing records, or deleting obsolete records.

In the reporting environment, however, developers have very different attitude and very different understanding of staging VS. target.

This is a very typical architecture I’ve seen so far. The flaw in this design is that there is no target area, only staging. When developers truncate the table in the staging, the report shows empty data between the time when table is truncated and new data finished loading. Or even worse, the report hangs when the loading takes a long time to finish.

image    

You might argue the chance  of the coincidence that users run the report at the same time when the data is in the middle of refreshing. Based on my experience, if I hear about it once from the business users, then the chance is big, because the possibility for it to happen again is always there.

Well, the above is not the only type of architectural design I’ve seen. Here is another one. There is no physical staging area in this design. The report is accessing the data source directly with T-SQL code with OPENQUERY. The developer argued that this type of design will eliminate the need of maintaining the staging area on the SQL server. That part is true. But it does not eliminate the emails and phone calls we get from the business users when they are so frustrated because the report takes for ever to load the data. Reporting Services routinely report time out when trying to retrieve large amount of data via OPENQUERY.

image  

The third type of architectural design can avoid many of the problems of long-running reports, or reports with empty data. By separating target from staging, and not truncating the target tables, the reports will always look good. 

image

Back to the title of this post, “why truncating target tables is a bad practice in ETL”. I have to confess that I was one of those developers who made such mistakes until I received enough emails and phone calls from our business users.

There is more to discuss on how to move data from staging to target. Stay tuned.

  1. Adrian
    April 18, 2011 at 5:15 am

    Hi Sherri

    Firstly, I am relatively new to the BI world. In my current place of work the Fact tables and some Dimensions tables are truncated and re-loaded each night in the target database.

    I understand that the correct practice is to use a SCD for Dimensions. I’d like your opinion on what is the best way to update the data in the Fact tables in the target database.

    Thanks.

    Adrian

    Like

  2. Sherry
    April 19, 2011 at 12:17 am

    Hi Adrian,

    Although I strongly believe that truncating either fact tables or dimension tables is bad practice, in reality, there are situations where truncating is necessary. If your target table does not have unique key, and you cannot distinguish Update from Insert, then truncating/inserting is your only bet. Developers also do truncating/inserting target tables out of convenience, especially when the target table is small enough (or truncating happened during the night when users are mostly offline).

    In my opinion, the above two situations are exceptions, and should not be the common practice. This should be especially true for your fact tables. Before a fact table is designed, we must determine the granularity of the fact table. The granularity corresponds to how you define an individual low-level record in that fact table. An example of granularity of a fact table can be “product by account by day”. So if majority of the fact tables are loaded in the truncating/inserting fashion, I’d question the design/implementation strategy behind it.

    Sherry

    Like

  3. R.nagaraj
    November 9, 2011 at 2:40 am

    Hi Sherry
    you said that we will discuss more on how to move data from staging to the destination but i did not see that can you please explain me that.

    Thank you
    sql

    Like

  4. Martin Aleman
    November 24, 2011 at 9:35 pm

    You can not say that truncating a target table is a bad practice in general becuase they depend on the type of tables and type of requirements from the customer. Usually there is a window to run the processes where the user must know that the information is being updated. And if a a job fails updating a table I am surprised why the support team does not know it and fix it immediatly. Said this , use the truncate appropiately and carefully but do not start thinking that is a bad practice. 15 years of using DW has told me that is just another tool for BI ETL.

    Like

  1. April 22, 2011 at 3:22 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: