SSIS #107 – Parameterized Query in OLE DB Data Source: parameter information cannot be derived from SQL statements with sub-select queries
I have a few posts blogging about how to pass variables into parameterized queries into Execute SQL Task and OLE DB Data Source in a data flow.
- SSIS – Pass a variable to a OLE DB Source in a Data Flow
- SSIS – Pass a list of values to a data flow source
As you gain more confidence in SSIS, you start to design your ETL process with higher efficiency and higher portability. That means using more variables, and more package configuration files. This is when you will start to hit the limitations. One of the limitations that is relevant to this topic comes from this unfortunate message:
Parameter information cannot be derived from SQL statements with sub-select queries
This message is generated from an OLE DB Data Source in a data flow. It has a simple SQL command, but with a parameter inside a sub-query.
This is what you will see when you try to Preview or click the Parameters… button.
This is a very interesting message actually. It is telling us why SSIS refuses to accept the query because a parameter is inside a sub-query. It also gives us a very interesting recommendation, that is to use the “SQL command from variable” mode.
Another recommendation Microsoft might have given us here is to create a stored procedure with parameters. It didn’t, because creating stored procedures is essentially taking us away from Integration Services, and back to the SQL Server land. I guess the Integration Services designers wouldn’t want that to happen.
Integration Services’ inability to parse SQL statements with parameters inside sub-select queries comes from the OLE DB provider. In my case, we are using the native OLE DB native provider. You would think that this provider will be “smarter” than other providers, such as OLE DB provider for Oracle. Unfortunately, no privilege here.
Use “SQL command from variable” for OLE DB Data Source
I don’t want to go back to the SQL Server land, so following the above recommendation is my choice. The screenshots and steps I am sharing with you here are really backwards in order. If you want to implement this recommendation in your own package, you will need to follow the steps below in backward order.
To follow the recommendation from Microsoft, you will need to change the Data access mode from SQL command to SQL command from variable.
Select the user variable you have created that stores the query as an expression. You should see that your parameter(s) are actually showing the values you have assigned to the variable(s).
Create a variable to store the parameterized query
This is the package-scope variable I’ve created to store the parameterized query.
To make this work, there are a few things you will need to pay attention to:
1) The data type of this variable should be String. A word of caution here. My query has sub-query, but it’s really a very simple query. If you have a query that is very complicated, and possibly exceeds the max length of a String data type, then the recommendation from Microsoft is not really a good choice for you. Going back to the SQL Server land to create a stored procedure will be a more manageable solution.
2) In the Properties window, set EvaluateAsExpression to True. This will give you a chance to use the Expression Builder.
3) Open the Expression Builder by clicking on the eclipse button in Expression
Use the Expression Builder to build the parameterized query
Expression Builder is a invaluable tool in SSIS, but it’s not a tool I look forward to in my daily job. The reason is that everything needs to be wrapped around the double quote. When our variable itself is a String data type, it’s actually very easy to construct the query.
String is one of the most used data types, but it’s not the only data type. In my case, my variable varDAYS_HRS is set to 60, and it’s an integer.
In the above screenshot, the two red boxes should be 60, if I hard-code it. Since all needs to be string, I must use some data type casting function to cast it as string. (DT_WSTR, <<length>>) is a way to do just that.
A more generic syntax for explicitly converting an expression from one data type to a different data type is here:
See Cast (SSIS Expression): Convert SSIS Data Types from MSDN for details.
Always use the Evaluate Expression button on the Expression Builder screen to make sure that your expression is acceptable by Integration Services.
Create another variable to hold the dynamic parameter value
Variable varDAYS_HRS is created to hold the dynamic parameter we need to pass to the query. This is a simple user variable:
“SQL command from variable” for OLE DB Data Source is a valuable recommendation
Next time, when Integration Services cannot parse your parameterized queries, give this recommendation a try if your query is reasonably simple and short.
NonEmpty() is a very powerful MDX function for improving your MDX query performance. It is primarily used to remove unwanted empty sets in a very fast manner.
According to MDX Function Reference, it returns the set of tuples that are not empty from a specified set, based on the cross product of the specified set with a second set. The syntax is simple enough:
Look closely, you will see that the usage of the set_expression2 is optional. I’ve been using NonEmpty() without the set_expression2 for many of my SSRS reports. Going from “out-of-memory” on the report server to only a few seconds to render my reports, I was so happy that I’ve never thought about testing record counts or some measures, until I finally had time.
What I saw in my test results shocked me. After some research and reading, I realized that the problem is in the omission of the set_expression2. By including the appropriate set_expression2, I got the perfect results.
In this blog, I’ll try to share with you this little lesson I’ve learned.
Non-empty is a characteristic of the cells references by the tuples, not the tuples themselves
This is an important note from the above MDX Function Reference from Microsoft. Now that I am reading this note, I start to visualize:
1) set_expression1 will be typically the cross product of members of some hierarchies, for example,
DEDESCENDANTS([Queue].[Queue].[Level 03].[Online Email], [Queue].[Queue].[Level 05], self) *
2) Since the non-empty is really a characteristic of the cells, and we want to remove sets that have no certain measures, does that that mean we really need to specify the measures we are referring to? Now it makes sense to me that we do need to specify the measures we are referring to.
If a second set is not provided, the expression is evaluated in the context of the current coordinates of the members of the attribute hierarchies and the measures in the cube
This is another important note from the above MDX Function Reference from Microsoft. So if the second set is not provided, we are blindly relying on the Analysis Services to find these “current coordinates”. “Current” really means at the moment of evaluation. Unfortunately, these “current coordinates” can be pretty random at the moment of evaluation, depending on how we setup our default members, and especially if we are defining calculated measures in our query. The default initial value of any calculated measure is null. This can lead to a very messy, confusing, and unexpected query results.
set_expression2 is not provided – wrong result
This is part of my initial query, where the second set expression is not provided.
Although the NonEmpty() function is used, rows with all nulls are plainly showing. Further testing showed that the record counts and measures are incorrect.
set_expression2 is provided – All looks good!
This is part of my revised query, where the second set expression is now provided with a measure.
There is no longer any rows that are all null. Further testing showed that all numbers are correct.
Conclusion – Always include a measure in the second set in NonEmpty() function
I will put this lesson in my MDX best practice list. Hope you will do the same.
I had many posts that share my experiences with using package configuration files to make my SSIS packages portable from my local development BIDS environment, to the test environment, and the production environment.
In the next few blogs, I’ll share with you a few simple steps to create your own XML configuration files and then how to use them to make your SSIS packages dynamic and portable.
Benefit of using package XML configuration files
There are many resources on this topic. In a nutshell, using package configuration files is a powerful way to make your SSIS package extremely versatile.
Example 1 – Make a connection string for a connection manager dynamic
We typically have our server name, database name, user name (password as well sometimes) embedded in our Connection Manager.
When we are ready to move our packages from development to test, or to production, the least we would expect is that the server name will be different.
In this case, the goal of using a package configuration file is:
1) no modification to our packages themselves
2) the only change we will make is to the package configuration file. I’ll need to change the Data Source and the Initial Catalog accordingly.
Example 2 – Make a Value for a variable dynamic
A typical example is a path where we fetch our incoming files. This file path will be most likely different from what is on our local PC and what is on our test or production server, especially the root folder will be most likely different.
Again the goal of using a a package configuration file in this case is:
1) no modification to our packages themselves
2) the only change we will make is to the package configuration file. I’ll need to change the ConfiguredValue accordingly when I move my package to the test or the production server.
So, it’s fair to summarize the benefits of using package configuration files as:
no modification to our packages themselves
the only change we will make is to the package configuration file
If it’s your first time to use a package configuration files, and you received it from a co-worker, you might wonder how the package configuration file is created.I’ll share two examples with you. One example is to create a package configuration file for a connection manager. Another example is for a variable.The steps of creating package configuration file is simple enough. There are two things that are not crystal clear though:1) The multiple choices of properties that are available to us. All the properties are exposed to us. Which one do we need?
2) What are we going to do about the configuration file?
I’ll continue to share with you these two tasks when I get time in the future.
A reader asked me how to add grand totals a matrix (A Sample of Matrix in SSRS).
It’s actually pretty easy to accomplish that in both SSRS 2005 and 2008. I am going to show a few screen shots here.
Suppose you have a very simple table with sales data for products sold from each store.
You want to rearrange the data so it shows the stores vertically, all products horizontally, and the Sales Amount at each intersection. Finally you want to show totals at the bottom and also to the right.
Matrix will be a perfect choice for this purpose with no extra coding on your part.
Solution – create a matrix and then add totals to row and column
The steps are very straightforward. I am going to show the screen shots from SSRS 2008 here.
Here is a matrix without any configuration yet.
Point the cursor to the Rows cell, an icon appear on the upper right corner. Click the icon, you will see the list of columns from your dataset. Select Store.
For the Columns cell, select Product.
For the Data cell, select SalesAmount.
Now, right-click the data cell, you will see choices under Tablix. Select Add Totals, and then Row.
Repeat the above step to create a total next the column Product.