Archive

Archive for March, 2012

MDX #2 – An Example of “Context”

March 30, 2012 Leave a comment

If you are familiar with T-SQL, then you are not a total stranger when it comes to MDX scripts. If you have written formulas in Excel, then you should be comfortable with calculations in MDX.

The difference is that in MDX, we don’t enter calculations in every cell. In another word, MDX engine is context-aware. This is also where the trouble begins, or our learning curve starts. Context and interpretation turned out to be quite difficult to visualize in the multi-dimensional cube space.

Every portion of our MDX query has a particular Context within a cube’s space. This means that MDX engine knows how to combine the explicit parts (what we say in the query)  with the implicit parts (what we leave out in the query). This context awareness makes MDX powerful and useful. The interpretation refers to the interpretation of invalid data, missing data, and invalid members.

Take the above concept and start to build SSRS reports against an analysis services database, you will very soon realize that you have stepped into a new territory again. A typical MDX Query Designer auto-generated SSRS report will look like this in the FROM and WHERE clause.

FROM (
  SELECT ( StrToSet ( @Site, CONSTRAINED ) ) ON COLUMNS
  FROM [MyCube]
)
WHERE ( IIf (
  StrToSet ( @Site, CONSTRAINED ) .Count = 1,
  StrToSet ( @Site, CONSTRAINED ),
  [MyDim].[DimHier].CurrentMember
) )

image

It has a nested Sub Select, and a where clause with the same query parameter. You might think they serve the same purpose. Not exactly. Many times, I have to experiment with both the sub-select and the where clause to get the expected results.

In this blog, I’ll show you one example from my work to demonstrate that the where clause can be used to set the “context” for your query.

1. Without the where clause, there is no context.

The where clause is not the only place where query context is established. But in my example, I want to demonstrate that the sub-select (sub cube)  alone is not enough to establish the query context.

image

Here is the result from the above query. The result  shows the incorrect MEMBER_CAPTION for the member value 31306. As a matter fact, it shows the ALL member from the hierarchy.

image

2. With the correct where clause, the query context is correct now.

I added the where clause with the same parameter as in the sub-cube to the query.

image

Here is the correct result.

image

3. Understand the concept of “query context” is important

Some people might be able to visualize but I need to see concrete examples. If you are like me, go ahead create some query scenarios next time to make sure the query context is what you would expect.

Categories: MDX Challenges

SSAS #27 – How filtered (calculated) measures are aggregated along dimensions?

March 28, 2012 2 comments

In an SSAS cube I designed recently, I created a dimension (a junk dimension) where it has either Approved or Declined value for an attribute hierarchy called “Decision”. Most of the queries will just focus on the approved count and amount, without showing the declined amount. This naturally lead me to create two calculated measures, Count Approved, and Amount Approved.

Here is the simple MDX script to filter the count for Approved only.

CREATE MEMBER CURRENTCUBE.[MEASURES].[The Count Approved]
AS ([Measures].[The Count], [The Decision].[Decision].&[Approved]),
FORMAT_STRING = "#,#",
VISIBLE = 1;

image

 

To make sure that the calculated/filtered measure behaves as I expected, I did this simple browsing (in BIDS or in SSMS).

1. How the calculated/filtered measure is aggregated along dimensions other than The Decision dimension.

It’s good to see that the measure is summed over the Time dimension as expected. Sum is the default Aggregation Function for the original measure.

image

2. How the calculated/filtered measure is aggregated along The Decision dimension.

It’s also good to see that the measure is NOT aggregated in any way along the Decision hierarchy. This is also expected. Notice that the Approved measure also stays the same for the different members of the attribute hierarchy. This does look a little odd at first, but this should be also expected.

image

Categories: SSAS

SSIS #109 – Wait for data with For Loop Container

March 10, 2012 1 comment

Believe it or not, in the BI development world, waiting is fact of life. We often have to wait for the data is ready, or wait till it’s time to do certain things.

I had some examples in my blog showing how to use the Foreach Loop Container.

image

I often use the For Loop Container in my SSIS design to handle the wait-for scenario. One very common usage of the For Loop Container is to wait for data become ready in the source before we pull data.

In this example, I want to wait for the prior day’s data before I start to pull the data. Actually because our data source does not give us “load status”, I designed two wait-for-data steps before I start to pull the data. The first For Loop container detects the presence of the prior day’s data. The second For Loop container determines if the row counts and data counts for the past 7 days are stable or not.

image

In this blog, I’ll just show you how I configured the first For Loop Container.

Goal – IF prior day’s data is present, THEN move to next task; Otherwise, sleep for 5 minutes and check again.

Step 1 – Configure the For Loop Container

The step 0.9 is to create a variable, varDataReady_Prior_Day with an initial values of 0.

image 

There are really only two things we need to configure the For Loop Container:

  • InitExpression: @varDataReady_Prior_Day = 0; this tells SSIS that I don’t know if my data is ready or not, so go ahead start the loop.
  • EvalExpression: @varDataReady_Prior_Day == 0; this tells SSIS that if the data is not ready, please loop through again; but if the data is ready (@varDataReady_Prior_Day == 1), do not continue the loop.

image

Step 2 – Configure a task to update the variable varDataReady_Prior_Day

In my example I used an Execute SQL Task to check if the data is ready or not. If it’s ready, the task updates the variable varDataReady_Prior_Day to 1.

Again there are only two things we need to configure the Execute SQL Task to make it happen.

  • Result Set = Single row
  • Pass the SQL query result to the variable

image

image

Step 3 – Configure a Script Task to sleep for 5 minutes

image

Two is a magical number in my writing today. Again I only need to configure two things for the Script Task.

  • ReadOnlyVairables: varDelayTime; this is a variable that I set to 5.
  • Script in C#: I used C# script in SSIS 2008; SSIS 2005 only gives you VB script choice.

image

Here is the simple C# script to sleep for n (varDelayTime) minutes. The Sleep() function takes mini-seconds so there is conversion from min-sec-mini sec.

image

public void Main()
{
// sleep for 5 minutes before rechecking
Double sec = Convert.ToDouble(Dts.Variables["varDelayTime"].Value) * 60;
Int32 ms = Convert.ToInt32(sec * 1000);
System.Threading.Thread.Sleep(ms);
Dts.TaskResult = (int)ScriptResults.Success;
}

Step 4 – Configure Precedence Constraint

I want to save me 5 minutes when the data is ready on the first try. So I configured the Precedence Constraint to skip the Sleeping task if the data is ready.

 image

A side note

For less complex scenario, I also often use the SQL’s native delay function WAITFOR to achieve the wait-for goal.

This code snippet uses the TIME clause for the WAITFOR statement to wait till it’s 8:00 CST to call a SQL job on another SQL server.

image

See WAITFOR (Transact-SQL) on MSDN for more info on how to use SQL’s native WAITFOR statement.

Categories: SSIS Best Practices

SSAS 26# – “Ignore Error” when processing cube

March 10, 2012 1 comment

I guess maintaining an Analysis Services database is not easy as 1-2-3. Maintaining SSAS for both development and production environment with multiple developers can be hard. It’s even harder when there are also many linked dimensions and measures. Now also imagine that there are also service level agreement that dictates a time window for cube processing. Also image that on top of all these, there is a proprietary data visualization tool that needs a lot of attention.

Fortunately there are more experienced co-workers who are supporting all of these. My job as a BI developer has a defined scope of responsibilities, that is, the projects I am assigned for development work. Since it’s development work, the development server is my playground. I recently noticed that my cubes were “browseable” one moment, and “cannot be browsed” minutes later. Processing the main cube with many linked dimensions and measures also generated errors.

It turned out that some shared dimensions were processed during the day. This broke my cubes and also the main cube that is linked to those shared dimensions. Fortunately it takes me only a few minutes to reprocess my cubes.

The more troublesome is the main cube where there are many linked dimensions and measures. It generated lots of errors during processing. If you read through the errors, there are really two types of errors. One type is the Dimension Key errors, where fact tables have key values that are missing from the dimension tables. Another type of errors are missing object errors, where tables or views that are sources of fact tables or dimension tables are missing.

Although missing dimension key values is a pretty serious problem, but in the development environment, we do have the freedom to ignore the errors as shown in the screen shot below. Missing object errors can also be fixed easily by copying them from the production server.

From the settings before processing the cube, we can choose to ignore four types of dimension key errors.

  • Key not found
  • Duplicate key
  • Null key converted to unknown
  • Null key not allowed

image

The information below is from a technet article Processing Options and Settings.

image

I guess the real lesson is that we, as developers, should try to avoid these types of mistakes in our design.

  • Key not found: this is because many of the dimension tables have no primary key constraints. This prevents foreign key constraints being created on the fact tables. When data is deleted or missing from the dimension tables, fact tables are left with many records with missing dimension key values.
  • Duplicate key: this is caused by the lack of primary key or unique index constrains on the dimension tables.
  • Null key converted to unknown: key values on the fact tables should not be left Null in general. If they truly have no values, populate them with “Unknown”, “N/A”, “Not Defined”, or something that is not Null. Null is a SQL term that represents Not Defined, so let’s translate it into something that human eyes can read.
  • Null key not allowed: again it’s better that we take care of the Null key value issue than letting Analysis Services guess what we meant in the first place.
    It takes me very short moment to choose “Ignore error”, and takes me just a moment longer for me to reflect on my disciplines as a BI developer.   
Categories: SSAS
%d bloggers like this: