Home > SSIS Best Practices > SSIS #110 – Case Sensitivity of Lookup Data Flow Component in SSIS 2008

SSIS #110 – Case Sensitivity of Lookup Data Flow Component in SSIS 2008

I have a blog, SSIS #108 – The Makeover of the Lookup Data Flow Component in SSIS 2008, a few months ago. Because of the two improvements of the lookup data flow component in SSIS 2008, the cache mode and the redirecting rows to no match output, I find myself giving more and more special favor to the lookup component in loading lookup tables, mapping tables, and dimension tables.

This comparison part in the lookup data flow component is case sensitive

In SSIS, things are generally case-sensitive. Variable name is one. If you have a variable named varRecipients, but you used  varRecipientS in an expression, you will get a clear message that  varRecipientS cannot be found.

clip_image001

What does the case-sensitivity have to do with the lookup data flow component? Quite a bit. The main part of the lookup component is to compare certain data elements between the source and the destination. This comparison part is case sensitive (There is an exception to this statement. See the last part of the blog.).

Take care to ensure that the comparison in the lookup data flow component is also case insensitive

SQL servers I’ve worked with are all configured to use case-insensitive collation. In most data warehouse design, case shouldn’t matter. An existing record of Courtesy Waiver in a dimension table should prevent “Courtesy waiver” being added to the same dimension table.

If your SQL server is using case-insensitive collation and your data warehouse is also case insensitive, then you should take care to ensure that the comparison in the lookup data flow component is also case insensitive. Otherwise, you will end up with either

  1. a duplicate record of “Courtesy Waiver “ and “Courtesy waiver “ being inserted into your dimension table, or
  2. your lookup data flow component will fail during runtime if you have taken care to create an alternate key on your dimension table.

Since I always have alternate (unique) key defined on my dimension tables, the second scenarios will happen if I hadn’t done the following steps.

Use Upper (or Lower) Function for comparison, but preserve the case for data to be inserted

clip_image002

In the OLE DB Source, I will need two columns, one for comparison, and one for inserting into my lookup table (if it has not existed yet). The one for comparison Reason_Compare will need the Upper (or Lower function will do too). The one that will be potentially inserted into my destination table needs to preserve the case, so no Upper function is used.

clip_image003

In the Lookup Transformation Editor, also use the Upper function to query from the destination lookup table.

clip_image004

And also make sure to use the Reason_Compare column for comparison.

clip_image005

Finally in the In the OLE DB Destination Editor, the mappings are done without the Reason_Compare column.

clip_image006

Partial cache or No cache result in the comparison being done directly on the SQL Server which would result in case-insensitive comparison

This blog will not be complete if I don’t mention the Cache mode. There are several blogs mentioning that if you change the cache mode to Partial cache or No cache, the comparison will be done directly on the SQL Server which would result in case-insensitive comparison (provided the SQL Server uses case-insensitive collation, which is the default setting of SQL Server).

clip_image007

Here are the links to a couple of them:

I’ve always chosen to use the Upper function for comparison and preserve the case for data for two good obvious reasons.

  1. I like Full cache and 
  2. As a developer, how the servers are configured is totally out of my control
    Enjoy using the lookup data flow transformation!
  1. Mark Wojciechowicz
    May 28, 2012 at 6:03 pm

    Thanks for a great post explaining this issue. I really think this a product defect. We live in a case insensitive world. If i were to lookup a product id, two IDs that differed by case have the same meaning. There really should be an option in the lookup transform with case insenstive as the default. Hope that gets figured out in the next release

    Like

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: