Home > SSIS Best Practices > SSIS #114 – How many ways can you do INSERT-UPDATE?

SSIS #114 – How many ways can you do INSERT-UPDATE?

For the purpose of this blog, my answer is 4.

I’ve always wanted to do a benchmark comparison so I can proudly say that my “normal” way is the best in terms of run time.

    1. STAGE-TSQL: Use the data flow to bring the raw data into staging, and use do the INSERT-UPDATE in TSQL. This is my “normal” way. And the best way according to my testing. The name STAGE-TSQL implies: 1) two steps are involved, 2) raw data is staged first, 3) INSERT-UPDATE are done in TSQL only.
    2. UPDATE-ALL: Only one step is involved. INSERT-UPDATE is done in one data flow step. 1) Transformation OLE DB Command is used for UPDATE, 2) Destination OLE DB Destination is used for INSERT.
    3. UPDATE-STAGE: Two steps are involved. 1) INSERT is done in the data flow step, 2) but the matching rows are saved to a staging table, and UPDATE is done in TSQL using the matching rows.
    4. UPDATE-DELTA: similar to number 2 UPDATE-ALL. In stead of directly sending all matching rows to transformation OLE DB Command for UPDATE, the Script Component transformation is used to determine if there are actually changed rows. Send data to Transformation OLE DB Command only if there are changes in the matching rows.

      1. STAGE-TSQL

    Here is the Control Flow. 

clip_image001

There is nothing exciting about the Data Flow. All it does is to bring the source data into a staging table.

clip_image002

The TAREGT_TSQL task does all the wonderful work of INSERT-UPDATE. TAREGT-TSQL is the winner with only 3 seconds and 885 mini-seconds.

2. UPDATE-ALL

The Control Flow is very simple. Only one Data flow is used. No truncating staging tables because no staging tables are used.

clip_image003

But the Data Flow step has two more transformation components: Lookup and OLE DB Command. The Lookup transformation is used to split outputs into No Match Output and Match Output. No Match Output are the rows that will be inserted, and Match Output are the rows that will be updated.

clip_image004

This seems to be an elegant solution but the run time is the worst, comparing almost 3 minutes to only about 4 seconds in STAGE-TSQL. 

The Transformation OLE DB Command is to blame for the poor performance here.

3. UPDATE-STAGE

Since I blamed the Transformation OLE DB Command for the poor performance, I am going to swap it with a staging table, and finish the UPDATE in a TSQL task. 

clip_image005

In the Data Flow step, matching rows are no longer sent to the OLE DB Command transformation, instead, they are saved into a staging table. The actual UPDATE are done in the TSQL task TARGET_MyChangingTable_UPD.

clip_image006

The run time has improved from almost 3 minutes to only 8 seconds and 291 mini-seconds.

4. UDPATE-DELTA

Here is another way to improve that 3 minutes run time.

clip_image007

The Control Flow looks the same, but I’ve added a Script Component (SC) Transformation. The SC is used to create two outputs. Only the rows that have changed data are sent to the UpdateRow path, otherwise the rows will be sent to a Row Sampling and are ignored. Adding the SC has indeed improved the run time from 3 minutes to only 7 seconds and 494 mini-seconds.

clip_image008

5. STAGE-TSQL is the winner!

It’s a relief knowing that our skills in TSQL are serving us well.

Notes:

  1. My testing involved only INSERT-UPDATE. No DELETE is done in data flow or in TSQL.
  2. I’ve made sure that all four runs are against the same Source data and Target data to make sure that the run time comparison is valid.
  3. I don’t think the data size matters much in my testing. No data size is given.
  4. Using staging tables means more disks spaces are needed. Most teams would prefer faster run time over disks spaces for their ETL processes.
Categories: SSIS Best Practices Tags:
  1. Echo
    October 25, 2012 at 3:25 pm

    Hi Sherry, would you mind offering the script for “script component” in “UDPATE-DELTA” approach? Thank you very much!

    Like

  2. kklotzle
    June 20, 2014 at 12:44 pm

    but what is the best way to determine whether or not there are any updated rows? in other words, is there a faster way to perform/determine updates than a lookup?

    Like

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: