Archive

Archive for September, 2012

SSIS #114 – How many ways can you do INSERT-UPDATE?

September 26, 2012 2 comments

For the purpose of this blog, my answer is 4.

I’ve always wanted to do a benchmark comparison so I can proudly say that my “normal” way is the best in terms of run time.

    1. STAGE-TSQL: Use the data flow to bring the raw data into staging, and use do the INSERT-UPDATE in TSQL. This is my “normal” way. And the best way according to my testing. The name STAGE-TSQL implies: 1) two steps are involved, 2) raw data is staged first, 3) INSERT-UPDATE are done in TSQL only.
    2. UPDATE-ALL: Only one step is involved. INSERT-UPDATE is done in one data flow step. 1) Transformation OLE DB Command is used for UPDATE, 2) Destination OLE DB Destination is used for INSERT.
    3. UPDATE-STAGE: Two steps are involved. 1) INSERT is done in the data flow step, 2) but the matching rows are saved to a staging table, and UPDATE is done in TSQL using the matching rows.
    4. UPDATE-DELTA: similar to number 2 UPDATE-ALL. In stead of directly sending all matching rows to transformation OLE DB Command for UPDATE, the Script Component transformation is used to determine if there are actually changed rows. Send data to Transformation OLE DB Command only if there are changes in the matching rows.

      1. STAGE-TSQL

    Here is the Control Flow. 

clip_image001

There is nothing exciting about the Data Flow. All it does is to bring the source data into a staging table.

clip_image002

The TAREGT_TSQL task does all the wonderful work of INSERT-UPDATE. TAREGT-TSQL is the winner with only 3 seconds and 885 mini-seconds.

2. UPDATE-ALL

The Control Flow is very simple. Only one Data flow is used. No truncating staging tables because no staging tables are used.

clip_image003

But the Data Flow step has two more transformation components: Lookup and OLE DB Command. The Lookup transformation is used to split outputs into No Match Output and Match Output. No Match Output are the rows that will be inserted, and Match Output are the rows that will be updated.

clip_image004

This seems to be an elegant solution but the run time is the worst, comparing almost 3 minutes to only about 4 seconds in STAGE-TSQL. 

The Transformation OLE DB Command is to blame for the poor performance here.

3. UPDATE-STAGE

Since I blamed the Transformation OLE DB Command for the poor performance, I am going to swap it with a staging table, and finish the UPDATE in a TSQL task. 

clip_image005

In the Data Flow step, matching rows are no longer sent to the OLE DB Command transformation, instead, they are saved into a staging table. The actual UPDATE are done in the TSQL task TARGET_MyChangingTable_UPD.

clip_image006

The run time has improved from almost 3 minutes to only 8 seconds and 291 mini-seconds.

4. UDPATE-DELTA

Here is another way to improve that 3 minutes run time.

clip_image007

The Control Flow looks the same, but I’ve added a Script Component (SC) Transformation. The SC is used to create two outputs. Only the rows that have changed data are sent to the UpdateRow path, otherwise the rows will be sent to a Row Sampling and are ignored. Adding the SC has indeed improved the run time from 3 minutes to only 7 seconds and 494 mini-seconds.

clip_image008

5. STAGE-TSQL is the winner!

It’s a relief knowing that our skills in TSQL are serving us well.

Notes:

  1. My testing involved only INSERT-UPDATE. No DELETE is done in data flow or in TSQL.
  2. I’ve made sure that all four runs are against the same Source data and Target data to make sure that the run time comparison is valid.
  3. I don’t think the data size matters much in my testing. No data size is given.
  4. Using staging tables means more disks spaces are needed. Most teams would prefer faster run time over disks spaces for their ETL processes.
Categories: SSIS Best Practices Tags:

MDX #17 – Can we create a column alias?

September 17, 2012 2 comments

In SSRS, we often have another challenge. That is to control the column name.

Continuing from MDX #16 – Attribute [Month Name] Is Invisible.

With this MDX query, we got two columns back from the Query Designer in SSRS. The column [Month Name] represents all members from level [Date].[Month Name].[Month Name].

clip_image001

In SSRS, we often have another challenge. That is to control the column name. This query does not scream for column name control. I’ll only re-use it for demonstration purpose.

Need the month name to come back as “MyTimeInterval”

TSQL developers are all familiar with alias. Using table alias allow us to write more compact and easier-to-read queries. Using column alias allows us to control the column names. In the MDX + SSRS world, occasionally I find myself in the need to have the same column name from different hierarchies. The Date dimension is a good example. I want the column name to be “MyTimeInterval” no matter I am querying from the Month hierarchy or the Quarter hierarchy.

Query-scoped calculated set does not create column alias

My first reaction is to use query-scoped calculated set to “rename” the hierarchy. Calculated set, in some sense, does serve the purpose as the alias does in TSQL.

Interesting enough, the query-scoped calculated set name [MyTimeInterval]  is not what returns back from the Query Designer in SSRS. Instead the level name [Month Name] is the column name, even when I sort of “renamed” it to [MyTimeInterval].

clip_image002

A Query-Scoped Calculated Measure Can Create a Column Alias

Since the query-scoped calculated set didn’t really “rename” the column name for me, my next try is to create query-scoped calculated measure, [Measures].[MyTimeInterval], and then place it on the X axis.

clip_image003

I got a new column [MyTimeInterval] back, exactly as how I wanted it to be. Column [MyTimeInterval] and [Month Name] represent the same members from the Month Name attribute. The only difference is that [MyTimeInterval] is the column name I wanted it to be, and the [Month Name] is the original attribute name.

Categories: MDX Challenges Tags: ,

MDX #16 – Attribute [Month Name] Is Invisible

September 15, 2012 Leave a comment

In SSRS user-defined hierarchy Calendar returns the entire hierarchy

Continuing from my previous post, MDX #15 – Why do I get Year, Semester, Quarter when I only need Month?.

clip_image001

The above query gave me the months and each month’s sales amount in SSMS, but gave me the entire hierarchy in Calendar in SSRS, which includes Year, Semester, Quarter and Month.

Use Attribute Hierarchy Instead

I am not particularly fond of this unexpected feature in SSRS. In most of the SSRS reports, I need to have control of

  • which columns to return from the datasets, and
  • what the column names are for each dataset

Fortunately, SSAS provides so much flexibility in the dimension design that allows us the same level of flexibility (and more) in how we write our MDX queries.

Month is the fourth level down in the Calendar hierarchy. In the Properties window, we can see that the Month Name is the source attribute.

clip_image002

This is a good news. Instead of using the Calendar hierarchy, I can use the Month Name attribute in my query. Hopefully, SSRS will only give me back the Month.

On the Y axis, use [Date].[Month Name].[Month Name], we get exactly one column back, [Month Name] from the Query Designer in SSRS.

clip_image003

Use both user-defined hierarchies and attribute hierarchies in MDX queries

Don’t forget that we have the flexibilities to use both user-defined hierarchies and attribute hierarchies in MDX queries.

Even if attribute hierarchies are invisible

Attribute [Month Name] is not a visible attribute. In BIDS you can see that the AttributeHierarchyVisible property is set to False. When you browse the Date dimension in SSMS,  [Month Name] is no where to be found. This doesn’t mean that we cannot use it in our MDX queries.

clip_image004

Categories: MDX Challenges Tags: ,

MDX #15 – Why do I get Year, Semester, Quarter when I only need Month?

September 10, 2012 Leave a comment

Descendants() function gives me the exact level in the Date hierarchy in SSMS

In MDX #14, I used this query to get the internet sales amount for each month.

clip_image001

The results above are from SSMS, and they are what I was expecting.

In SSRS, the same MDX query will give more than what the Descendants() function is asking

Run this same MDX query in the Query Designer in SSRS, this is what you will get.

clip_image002

We’ve noticed two differences.

  1. Unlike the results from the SSMS where the returns on the Y axis are not given any column names, all attributes from the Date dimension has a column name, and each column name is the same as the level name in the Calendar hierarchy.
  2. Unlike the results from the SSMS where only the Month is returned, the SSRS returned 3 more levels in the calendar hierarchy  that are above the level Month.

Bad feature or good feature?

Having a column name is certainly a good thing in Reporting Services. But getting columns more than what I asked for can be annoying, although I can, on the other hand, also view it as a good feature, in case I need Year, Semester or Quarter for other purposes such as sorting. 

MDX #14 – Visualize User-defined Hierarchies

September 10, 2012 Leave a comment

User-defined hierarchies define one-to-many relationships between attributes in a dimension, and these relationships tell the MDX engine how to roll up measure values.

User-defined hierarchies in Microsoft SQL Server Analysis Services define the attribute relationships in a dimension. Once all the attributes and their properties are defined in a dimension, defining the attribute relationships will be the next step. A simple way to understand user-defined hierarchies (the attribute relationships)  is that they define one-to-many relationships between attributes in a dimension, and these relationships tell the MDX engine how to roll up measure values.

A good example – Date dimension in the AdventureWorksDW database

A good example is the Date dimension in the AdventureWorksDW database.

If you open the Attribute Relationships in the Date dimension. here is what you will see.

clip_image001

In addition to the obvious one-to-many relationships that are defined between Date and Month Name, Month name and Calendar Quarter, etc., it also tells us that we can roll up a measure value from Date to Month Name, from Calendar Semester to Calendar Year, etc.

Write up some MDX queries to help visualize the Calendar hierarchies

Another way to understand the user-defined hierarchies is to write up some MDX queries.

clip_image002

The above screen shot is from SSMS when you browse the [Adventure Works] cube.The triangle-shaped symbol next to the Date.Calendar indicates a user-defined hierarchy. The rectangle-shaped symbol next to Date.Calendar Year indicates an attribute hierarchy.

Here is a simple MDX query with the DESCENDANTS() function that allows us to get to the different levels in the Calendar hierarchy.

clip_image003

Changing the level number 4 to any number between 0 and 5 and observe the results, you will start to appreciate what the dimension designer has done.

  • level 0 = All Periods
  • level 1 = Calendar Year
  • level 2 = Calendar  Semester
  • level 3 = Calendar Quarter
  • level 4 = Month
  • level 5 = Date

Defining hierarchies in a dimension is a critical task

I certainly appreciate the thoughts that the dimension designer has put into it based on the business requirements.

MDX #13 – Use Crossjoin() to Get the number of cars each customer owned

September 1, 2012 Leave a comment

In a previous post, MDX #11 – How to get number of cars each customer owned using Properties() function?, I used the Properties() function to get the number of cars each customer owned.

There must be another way

Is that the only way? We (programmers/developers), with a trademark of being curious, never stopped thinking that “there must be another way”.

Here is another way using the Crossjoin() function. Its purpose is the same as in TSQL, to produce cross product of one or more sets (tables). Unlike in TSQL where cross join is sparsely used, Crossjoin() is one of the most used functions in MDX.

image

Comparing to the result from my MDX #11 post, the above screen shot shows the same returns, except that the number of cars owned no longer has a proper column name.

Although this query worked, I have more questions to answer now.

If Crossjoin() has the same purpose as in TSQL, why the result didn’t show the combination of each customer with all the possible numbers of cars?

If it does, the result wouldn’t be correct. The MDX engine must know something so it would not mechanically create every possible combinations. This “something” that the MDX engine does very well is commonly referred to as the “Auto Exists” feature.   

  • “Auto Exists” applies whenever multiple hierarchies from the same dimension are referenced.
  • When “Auto Exists” is at work, the resulting set is filtered to only the non-empty sets.

There are not many articles about the “Auto Exists” feature. My only explanation for the lack of more information is that the feature is so “intuitive” that most experts feel no need to mention it.

Are we using the [Number of Cars Owned] as an attribute hierarchy?

Yes, it is used as an attribute hierarchy in the above query rather than a member property as in MDX #11.

Is the * a short hand version of Crossjoin()?

Yes. Check out the MSDN MDX reference here: http://technet.microsoft.com/en-us/library/ms144816

Categories: MDX Challenges Tags: ,
%d bloggers like this: