SSIS #110 – Case Sensitivity of Lookup Data Flow Component in SSIS 2008
I have a blog, SSIS #108 – The Makeover of the Lookup Data Flow Component in SSIS 2008, a few months ago. Because of the two improvements of the lookup data flow component in SSIS 2008, the cache mode and the redirecting rows to no match output, I find myself giving more and more special favor to the lookup component in loading lookup tables, mapping tables, and dimension tables.
This comparison part in the lookup data flow component is case sensitive
In SSIS, things are generally case-sensitive. Variable name is one. If you have a variable named varRecipients, but you used varRecipientS in an expression, you will get a clear message that varRecipientS cannot be found.
What does the case-sensitivity have to do with the lookup data flow component? Quite a bit. The main part of the lookup component is to compare certain data elements between the source and the destination. This comparison part is case sensitive (There is an exception to this statement. See the last part of the blog.).
Take care to ensure that the comparison in the lookup data flow component is also case insensitive
SQL servers I’ve worked with are all configured to use case-insensitive collation. In most data warehouse design, case shouldn’t matter. An existing record of Courtesy Waiver in a dimension table should prevent “Courtesy waiver” being added to the same dimension table.
If your SQL server is using case-insensitive collation and your data warehouse is also case insensitive, then you should take care to ensure that the comparison in the lookup data flow component is also case insensitive. Otherwise, you will end up with either
- a duplicate record of “Courtesy Waiver “ and “Courtesy waiver “ being inserted into your dimension table, or
- your lookup data flow component will fail during runtime if you have taken care to create an alternate key on your dimension table.
Since I always have alternate (unique) key defined on my dimension tables, the second scenarios will happen if I hadn’t done the following steps.
Use Upper (or Lower) Function for comparison, but preserve the case for data to be inserted
In the OLE DB Source, I will need two columns, one for comparison, and one for inserting into my lookup table (if it has not existed yet). The one for comparison Reason_Compare will need the Upper (or Lower function will do too). The one that will be potentially inserted into my destination table needs to preserve the case, so no Upper function is used.
In the Lookup Transformation Editor, also use the Upper function to query from the destination lookup table.
And also make sure to use the Reason_Compare column for comparison.
Finally in the In the OLE DB Destination Editor, the mappings are done without the Reason_Compare column.
Partial cache or No cache result in the comparison being done directly on the SQL Server which would result in case-insensitive comparison
This blog will not be complete if I don’t mention the Cache mode. There are several blogs mentioning that if you change the cache mode to Partial cache or No cache, the comparison will be done directly on the SQL Server which would result in case-insensitive comparison (provided the SQL Server uses case-insensitive collation, which is the default setting of SQL Server).
Here are the links to a couple of them:
- Jorg Klein : SSIS – Lookup is case sensitive
- Case insensitive comparison in SSIS 2008 LookUp transformation
I’ve always chosen to use the Upper function for comparison and preserve the case for data for two good obvious reasons.
- I like Full cache and
- As a developer, how the servers are configured is totally out of my control
- Enjoy using the lookup data flow transformation!