Archive

Posts Tagged ‘SSIS’

SSIS #118–Is it an OLEDB or an SMO Server Connection?

December 17, 2014 2 comments

OLEDB is the native client for SQL Server. Every SSIS developer is familiar with how to use it. Almost every task in Integration Services can use an OLEDB connection. As a matter of fact, it’s a preferable client to use if your backend is SQL Server.

However, there is one exception, that is the Transfer SQL Server Objects Task. This task uses an SMO Server connection only.

 

pict1

 

 

 

 

 

 

 

 

 

If you don’t know this before you use this transfer object task, there can be confusions during development and also during deployment.

Here are my lessons learned.

  1. Visual differences between the OLEDB and the SMO Server connections in Integration Services 2012 helps

pic2

 

 

In SSIS 2012, I am happy to see the visual differences. If you are still stuck in 2008, you will not see any visual differences.

  1. Name your SMO Server connections properly

A good practice is to name the SMO Server connection properly. I used such names as:

  • SMOServer_Staging (this could be the source server for the transfer task)
  • SMOServer_Reporting (this could be the destination server for the transfer task)
  1. Rename the SMO Server connection properly

In SSIS 2012, you have three difference places to create connections.

  • At project level: connections are available to all packages in the project
  • At package level: only available to the packages in which the connection is created
  • At task level: also only available to the packages in which the connection is created

The best practice is to create connections at the project level.

However, no matter where you are creating the connections, the UI in the SSIS always prefix the connection with the server name:

  • For OLEDB – servername.databasename: removing the servername is usually the best practice here, leaving just the database name.
  • For SMO Server – servername: since we cannot just simply remove the servername, the best practice here is to rename it to what is suggested in the section 2.
  1. Confusions during deployment over where to change to production/test server

In 2008 , during deployment, I’ve seen DBAs done the following:

  • Create new connection: because of the confusions about OLEDB vs. SMOServer connections, and even bigger confusions if your connections have your development server name in it, DBAs would most likely choose to create new connections.
  • Rename the connection name: again if you have not named your SMO connections properly, they will tend to rename it.

Neither of the above practices is good. If you as a developer have done if properly, DBA’s will not need to mess too much with the connections:

  • Named your SMO Server connections properly so they know the differences with the OLEDB connections.
  • Your connection names do not have your development server name so they don’t need to rename it.
  1. In 2012, the deployment is a lot easier

In 2012, deployment is much more straightforward because

  • Connections can be created at project level: there is only one place to mess with the connections, including the connection name, server name and database name, and bunch of other connection properties
  • Of cause you can take even greater advantages of many features such as project level parameters if you have chosen to deploy packages to SSIS Catalog

More about SMO connection manager

SMO Connection Manager

Quote from the MSDN article:

An SMO connection manager enables a package to connect to a SQL Management Object (SMO) server. The transfer tasks that SQL Server Integration Services includes use an SMO connection manager. For example, the Transfer Logins task that transfers SQL Server logins uses an SMO connection manager.

When you add an SMO connection manager to a package, Integration Services creates a connection manager that will resolve to an SMO connection at run time, sets the connection manager properties, and adds the connection manager to the Connections collection on the package. The ConnectionManagerType property of the connection manager is set to SMOServer.

You can configure an SMO connection manager in the following ways:

  • Specify the name of a server on which SQL Server is installed.
  • Select the authentication mode for connecting to the server.

More about Transfer SQL Server Objects task

Besides, the data flow object, the Transfer SQL Server Objects task is another task you can use to transfer not just data but also other types of objects between instances of SQL Server. For more info, check out this MSDN article.

Transfer SQL Server Objects Task

 

SSIS #117–What is the Location in the MSOLAP Connection String

December 13, 2014 Leave a comment

MSOLAP is a native OLE DB provider for Analysis Services

A common task during ETL is to process the OLAP dimensions and the cube after all the dimension and fact tables are loaded into the data warehouse. Integration Services provide two special control flow tasks for this purpose, the Analysis Services Processing Task and the Analysis Services Execute DDL Task.

Before you can use either of these two tasks, you will need to create a SSIS Connection Manager that knows where the Analysis Services server is. The Connection Manager is just the name of the GUI in Integration Services. Behind the fancy GUI, it’s just a connection string that uses the native data provider, the Analysis Services OLE DB provider (or just short as MSOLAP DLL).

 

pic1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pic2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Location in the connection string causing “connection cannot be made”

In the above screenshot, I not only provided the Server name but also the Location which I made it the same as the server. The testing of the connection in the GUI was successful.

After the GUI is completed, the final connection string looks like this.

Provider=MSOLAP.5;Data Source=localhost\MULTI2012;Integrated Security=SSPI;Initial Catalog=SSAS2012DEMO;Location=localhost\MULTI2012

The control flow task Analysis Services Execute DDL Task is usually the more flexible choice for processing Analysis Services objects than the Processing Task. But to makes sure that my connection string is correct, I used the Analysis Services Processing Task, which is the GUI version of the DDL task, to test the connection string. I ran into this “connection cannot be made” error.

pic3

 

 

 

 

 

 

 

 

It turned about the problem is in the Location. Removing the Location from the connection string solved the connection issue.

AS a matter of fact, the shortest connection string that can pass the connection is just this:

Data Source=localhost\MULTI2012

No explanation for Location in connection strings for MSOLAP and ADOMD.NET provider

I checked both the connection string document in MSDN library for MSOLAP and ADOMD.NET , and couldn’t find much information about what the Location is supposed to be.

AdomdConnection.ConnectionString Property

Connection String Properties (Analysis Services)

In the first link for the ADOMD.NET provider it just simply states:

Location: Sets the name of the server.

In the second link for the MSOLAP provider, I couldn’t even find any mentioning of Location.

Three types of data providers for Analysis Services

There are really three types of data providers for Analysis Services.

Data providers used for Analysis Services connections

Analysis Services provides three data providers for server and data access. All applications connecting to Analysis Services do so using one of these providers. Two of the providers, ADOMD.NET and Analysis Services Management Objects (AMO), are managed data providers. The Analysis Services OLE DB provider (MSOLAP DLL) is a native data provider.

More ways to access Analysis Services in SSIS than the DDL Task and the Processing Task

Check out this book if you want to know more than just the DDL Task and the Processing Task to access Analysis Services in SSIS.

Microsoft® SQL Server® 2008 Integration Services Problem–Design–Solution

Quote from the book:

 

* Using the control flow object Analysis Services Processing Task is the most straightforward

approach, and is discussed in the next section. This allows any SSAS objects (that involve

processing) and their processing method to be selected. A few drawbacks are listed in the next

section.

 

* Although the Analysis Services Execute DDL Task may sound as if it only allows objects to be

modified, created, or deleted, its functionality goes way beyond Data Definition Language

(DDL). This task runs an SSAS XMLA script. XMLA includes the capability not just to run DDL,

but also query and process. Therefore, this task is very useful for SSAS processing, because the

XMLA can be modified in an SSIS package before it is executed.

 

* Another approach involves using the Execute Process Task to call the ASCMD executable that

comes with SSAS. This executable can run MDX, Data Mining Expressions (DMX), and XMLA,

and also contains some command – line switches to make your processing more dynamic.

The data flow in SSIS includes two destinations: the Dimension Processing destination and the

Partition Processing destination . These allow data directly from the pipeline to be pushed into

either an SSAS dimension or SSAS partition, respectively. Unlike all the other SSIS approaches

discussed here, this approach is the only one where data is pushed directly to SSAS. The

other approaches essentially tell SSAS to start pulling the data from a data source. Like the

other SSIS approaches, this approach is described later in this chapter.

 

* Finally, the programming API for SSAS, called Analysis Management Objects (AMO), allows the

interface into the SSAS object model and can leverage the processing methods to process any of

the SSAS objects described. To use AMO for processing, you must leverage either the Script Task

in the control flow, or the Script Component in the data flow.

 

SSIS #116–What is the variable value at runtime?

January 29, 2014 1 comment

Using variables in SSIS packages can make your ETL process more dynamic and respond to different scenarios during runtime.

Not only we can use variables, we also have many different ways to set values for our variables. This is all good news for developers. But like many things in real life, when we have too many choices, we can get carried away and overcome by our enthusiasm. This is especially true in terms of variables in SSIS packages.

5 different ways to set values for variables

In general, we have the following 5 different ways to set values for variables. The question we need to ask is what is the order of precedence during runtime when we are using all these 5 different ways to set values for a variable.

  • In the Variable Editor in the Value field: although varCubeName is a String type, we do not need to use the double quotes to enclose the value Cube Name in Value.

image

  • In the Variable Expression Editor: variable values can also be set in the Expression field. We must enclose the value in double quotes. This is true even for Integer type variable. Notice that the value in the Value field has changed from Cube Name in Value to Cube Name in Expression. Once I entered an expression, both the Value and the Expression field has become the same.

image

  • In the package configuration file: I always prefer an XML configuration file, in which I set varCubeName to a value of Cube Name in Package Configuration.
  • image

  • In many control flow tasks, such as an Execute SQL Task, and some data flow transformation tasks. for the purpose of this blog, I picked the Execute SQL Task to set the value for varCubeName to Cube Name from Execute SQL Task during runtime.
  • In Script Task, I can also modify the value of varCubeName. For the purpose of this blog, I set it to Cube Name in Script Task in a scrip task.
    To summarize, here are the values I set for the variable varCubeName using 5 different ways:

  1. Cube Name in Value: in the Variable Editor in the Value field
  2. Cube Name in Expression: in the Variable Editor in the Expression field
  3. Cube Name in Package Configuration: in the package configuration file
  4. Cube Name from Execute SQL Task: in the Execute SQL Task
  5. Cube Name in Script Task: in a scrip task

4 and 5 are essentially the same, so I’ll just test it one at a time. In my first test, I disabled the scrip task because I only want to test 1, 2, 3, and 4. I want to see what the value for my variable varCubeName really is at runtime. I enabled the package configuration so I can throw #3 in the midst.

I used two script tasks with the MessageBox function to show me the value at runtime, both before and after the Execute SQL Task.

image

 

“Cube Name in Expression” took precedence over all other values

Before the Execute SQL Task…

image

After the Execute SQL Task…

image

The value is always Cube Name in Expression. It’s like the Execute SQL Task had never even executed.

“Cube Name in Package Configuration” took precedence over Cube Name in Value, but gives way to “Cube Name from Execute SQL Task”

Now we know that Cube Name in Expression has the highest precedence order. Let’s remove it and only leave the Cube Name in Value, as shown in the following screenshot.

image

Before the Execute SQL Task…

image

After the Execute SQL Task…

image

This time we see that the Cube Name in Value which is set in the Variable Editor in the Value field never showed up. Cube Name in Package Configuration and Cube Name from Execute SQL Task both overwrite the Cube Name in Value.

Enabling the Scrip Task produces similar results, in which “Cube Name in Script Task” has the same precedence order as the “Cube Name from Execute SQL Task”.

Conclusion on the precedence order

  1. Cube Name in Expression: the value is set in the Variable Editor in the Expression field. The value set in this way has the highest precedence order and overwrites the values in all other ways.
  2. Cube Name from Execute SQL Task: the value is set in the Execute SQL Task. It has the second highest precedence order. Cube Name in Script Task has the same precedence order.
  3. Cube Name in Package Configuration: the value is set in the package configuration file. It can only overwrites the variable’s default value.
  4. Cube Name in Value: the value is set in the Variable Editor in the Value field. This value is usually called the default value, because it can be overwritten by all the proceeding ways during runtime.

Lessons learned

During the SSIS package deployment, your DBA decided to include all the variables along with all the connection strings as the property/value pairs in the package configuration files, and somehow your packages are not working the way you had expected.

During development, your packages are not working the way you would expect and you decided to debug your variables and found out that the variable value at runtime was incorrect.

The above two scenarios happened to me in the past and they have inspired this blog.

The lessons I’ve learned are:

  • During deployment, do not include variables property/value pair in the package configuration file. Variables should only be handled in the SSIS packages.
  • If variable expression is sufficient, use variable expression only.
  • If you need to combine different ways to set variable values at runtime, only use these two combinations.
    Default value +  Execute SQL Task, or

    Default value +  Script Task

Note

  • All screenshots are from Integration Services 2012.

SSIS C Sharp #115 – C# Script to detect delta

November 3, 2012 1 comment

A reader asked me to share a C Sharp script I used in this post, SSIS #114 – How many ways can you do INSERT-UPDATE?.

This code snippet tried to determine if the original value of MyFirstColumn is different or not compared to the incoming data in the pipeline. It is used in the last UPDATE-DELTA design.

image

Notes:

The Script Component needs some configuration. Without the proper configuration, the above code will not work.

Categories: SSIS Best Practices Tags: ,

SSIS #114 – How many ways can you do INSERT-UPDATE?

September 26, 2012 2 comments

For the purpose of this blog, my answer is 4.

I’ve always wanted to do a benchmark comparison so I can proudly say that my “normal” way is the best in terms of run time.

    1. STAGE-TSQL: Use the data flow to bring the raw data into staging, and use do the INSERT-UPDATE in TSQL. This is my “normal” way. And the best way according to my testing. The name STAGE-TSQL implies: 1) two steps are involved, 2) raw data is staged first, 3) INSERT-UPDATE are done in TSQL only.
    2. UPDATE-ALL: Only one step is involved. INSERT-UPDATE is done in one data flow step. 1) Transformation OLE DB Command is used for UPDATE, 2) Destination OLE DB Destination is used for INSERT.
    3. UPDATE-STAGE: Two steps are involved. 1) INSERT is done in the data flow step, 2) but the matching rows are saved to a staging table, and UPDATE is done in TSQL using the matching rows.
    4. UPDATE-DELTA: similar to number 2 UPDATE-ALL. In stead of directly sending all matching rows to transformation OLE DB Command for UPDATE, the Script Component transformation is used to determine if there are actually changed rows. Send data to Transformation OLE DB Command only if there are changes in the matching rows.

      1. STAGE-TSQL

    Here is the Control Flow. 

clip_image001

There is nothing exciting about the Data Flow. All it does is to bring the source data into a staging table.

clip_image002

The TAREGT_TSQL task does all the wonderful work of INSERT-UPDATE. TAREGT-TSQL is the winner with only 3 seconds and 885 mini-seconds.

2. UPDATE-ALL

The Control Flow is very simple. Only one Data flow is used. No truncating staging tables because no staging tables are used.

clip_image003

But the Data Flow step has two more transformation components: Lookup and OLE DB Command. The Lookup transformation is used to split outputs into No Match Output and Match Output. No Match Output are the rows that will be inserted, and Match Output are the rows that will be updated.

clip_image004

This seems to be an elegant solution but the run time is the worst, comparing almost 3 minutes to only about 4 seconds in STAGE-TSQL. 

The Transformation OLE DB Command is to blame for the poor performance here.

3. UPDATE-STAGE

Since I blamed the Transformation OLE DB Command for the poor performance, I am going to swap it with a staging table, and finish the UPDATE in a TSQL task. 

clip_image005

In the Data Flow step, matching rows are no longer sent to the OLE DB Command transformation, instead, they are saved into a staging table. The actual UPDATE are done in the TSQL task TARGET_MyChangingTable_UPD.

clip_image006

The run time has improved from almost 3 minutes to only 8 seconds and 291 mini-seconds.

4. UDPATE-DELTA

Here is another way to improve that 3 minutes run time.

clip_image007

The Control Flow looks the same, but I’ve added a Script Component (SC) Transformation. The SC is used to create two outputs. Only the rows that have changed data are sent to the UpdateRow path, otherwise the rows will be sent to a Row Sampling and are ignored. Adding the SC has indeed improved the run time from 3 minutes to only 7 seconds and 494 mini-seconds.

clip_image008

5. STAGE-TSQL is the winner!

It’s a relief knowing that our skills in TSQL are serving us well.

Notes:

  1. My testing involved only INSERT-UPDATE. No DELETE is done in data flow or in TSQL.
  2. I’ve made sure that all four runs are against the same Source data and Target data to make sure that the run time comparison is valid.
  3. I don’t think the data size matters much in my testing. No data size is given.
  4. Using staging tables means more disks spaces are needed. Most teams would prefer faster run time over disks spaces for their ETL processes.
Categories: SSIS Best Practices Tags:

SSIS #113 – Querying SSAS Cube with MDX from SSIS

August 14, 2012 6 comments

I worked on a SSIS package a few months back to retrieve data from a SSAS cube with MDX queries. A co-worker recently is developing a SSIS package to perform a similar task, but couldn’t get the SSIS package to work. The data flow was lost somewhere and the package could never finish the data loading.

This is what it will look like in the Data Flow task. The OLE DB Source will stay yellow forever.

image

I couldn’t remember right away how my SSIS package was configured to make the data flow work. I had to find my old SSIS package and open it to refresh my memory.

Need to set Format=Tabular in the Data Link Extended Properties

After choosing the OLE DB Provider for Analysis Services 10.0 as the provider for the connection manager, we need to go to the All tab to set Format=Tabular in the Data Link Extended Properties.

image

There are many blogs on this little “secret” developers discovered. Some claimed that this is a fix for a bug in SSIS 2005 SP1. I had to do this in both SSIS 2005 and 2008 to make it work. So I am not sure if it is caused by a bug or a designed “feature” in SSIS.

Neither can I find more information about what this property value means. You are welcome to share if you have more insight on this.

The happy green color is what you will see once you set Format=Tabular in the Data Link Extended Properties.

  image

A couple of notes:

1. About the data type. Whenever you try to access the column information, you will get a “nice” warning from SSIS that data type DT_WSTR (nvarchar) will be used instead. Again if you have more insight on this, you are welcome to share.

image

2. I used a Data Conversion transformation to convert the DT_WSTR to DT_STR or DT_NUMERIC accordingly.

image

SSIS #111 – Looking for Conditional IIF() in SSIS Expression?

July 31, 2012 Leave a comment

There is no IIF statement in SSIS scripting functions

If you are looking for IIF() for SSIS expressions, you’ll be disappointed.

Most of us are familiar with the IIF statement in SSRS expressions (or in many other scripting languages).

image

But in SSIS, any Program Flow functions are unmistakably missing. Here are all the function categories you can see in the Expression Builder in SSIS. Program Flow is not one of them.

image

We can achieve IIF() using ( Condition? Value_when_true  : Value_when_false)

There is a very good explanation for the missing IIF statement in SSIS. Before I go too far on this topic, I want to give you the good news first. The Integration Services did give us a tool to accomplish the same function as the IIF statement does, only in different disguise.

Here is what we can use:

( Condition? Value_when_true  : Value_when_false)

You can write any acceptable expression in the Condition part, but it’ll only make sense for the expression to include at least one variable in order to achieve the goal of dynamic as you set out to achieve with IIF in the first place.

The value_when_true and value_when_false part can obviously include variables too.

Here is an example. For a user variable varSourceServerPrefix, I want to set it to an alia name of the linked server in our development environment, but set it to blank in the production server.

( @[User::varProduction] == 0? @[User::varLinkedServer]  : "")

image

Processing flows are largely controlled by the Precedence Constraints

There is an even better reason why Integration Services are missing the entire category of Program Flow Functions.

In Integration Services, processing flows are largely controlled by the Precedence Constraints between tasks. These Precedence Constraints are almost completely controlled by conditions that are expressed using variables, while variables in Integration Services can be manipulated by Script Tasks.

For examples how variables can be manipulated by Script Tasks and example of Precedence Constraints:

Note: the parentheses are not really required. I don’t think the parentheses are required even in nested conditional expressions. It certainly doesn’t hurt to have the parentheses in nested conditional expressions for better readability.

Here is the MSDN link on this: http://msdn.microsoft.com/en-us/library/ms141680.aspx

%d bloggers like this: