Archive

Posts Tagged ‘dimension’

MDX # 40 – CROSS JOIN does not return Cartesian product of two hierarchies from the same dimension

April 15, 2014 2 comments

Correction

4/25/2014

Thanks to Chris Webb’s comment (see the comment section). I am making a correction on this blog.

The title of this blog contains a few important words, “from the same dimension”. You might have guessed the opposite side of it is hierarchies “from different dimensions”.

But my correction is not about this. In trying to explain why CROSS JOIN in MDX does not return Cartesian product of two hierarchies from the same dimension, I originally stated:

With the CROSSJOIN operation in MDX, we get only the combinations that actually have fact data. We now know that out of the 10 colors of Bib-Shorts only the Multi color Bib-Shorts has sales data (fact data).

Unlike the SQL CROSS JOIN, where we had to join one fact table and two dimension tables to find out which color of bib-shorts have sales, in MDX, the CROSSJOIN function automatically took into the consideration of the fact table. I have to say that the twists and turns are for the best. 

I highlighted “fact table” above because that is what made my statement incorrect in the context of two hierarchies from the same dimension. When we crossjoin two sets of members from hierarchies from the same dimension, the fact table(s) does not come into play, rather, it’s the validate combinations from the dimension table(s) that will determine the resulting tuples.  In this case, the Multi color is the only color for the Bib-Shorts. This can be verified by joining the two dimension tables, DimProduct and DimProductSubcategory. Here is the SQL query and the result.

Multi-Bib-shorts-correction

 

 

 

 

 

Here is the original post.

Cartesian product can be illustrated by a Cartesian square

Most SQL developers are familiar with the concept of CROSS JOIN and Cartesian product. Cartesian product is the result that is produced from a CROSS JOIN.

In this Wikipedia article, Cartesian product, a Cartesian product is illustrated by a Cartesian square as shown below.

Cartesian Square

 

 

 

 

 

 

 

In SQL Server, CROSS JOIN returns the Cartesian product of rows from tables in the join

Both of the following articles claimed that the SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table, if no WHERE clause is used along with CROSS JOIN. This kind of result is called as Cartesian Product.

The above definition of the SQL CROSS JOIN can be illustrated with the following SQL script using the AdventureWorksDW database.

use AdventureWorksDW
go

with color as (
select      distinct Color
from  dbo.DimProduct
),

sub as (
select      distinct EnglishProductSubcategoryName
from  dbo.DimProductSubcategory
whereEnglishProductSubcategoryName = ‘Bib-Shorts’
)

select      *
from  sub CROSS JOIN color
order by
EnglishProductSubcategoryName, Color
–10

To simplify the result set, I am using only one product subcateogry ‘Bib-Shorts’. The above SQL script generates the results shown in the screenshot below. With 1 product category and 10 product colors, the result set is the every possible combination of these two lists, which is 10 (1 * 10) rows.

Multi-Bib-shorts

 

 

 

 

 

 

 

 

 

 

Keep in mind that the above 10 combinations of the colors and the subcategory do not necessarily have any sales. In order to find witch combinations have sales, we will need to do an INNER JOIN on these 3 tables:

dbo.FactResellerSales or dbo.FactInternetSales
dbo.DimProductSubcategory
dbo.DimProduct

The following 2 SQL queries can be used. It turned out that there is no internet sales for multi color bib-shorts, but there are reseller sales for them.

— There is no Internet Sales for Multi color Bib-Shorts
select      *
from  dbo.FactInternetSales fact join dbo.DimProduct prod
on          fact.ProductKey = prod.ProductKey
join  dbo.DimProductSubcategory sub
on          prod.ProductSubcategoryKey = sub.ProductSubcategoryKey
wheresub.EnglishProductSubcategoryName = ‘Bib-Shorts’
and         prod.Color = ‘Multi’
–0

— There is 756 Reseller Sales for Multi color Bib-Shorts
select      *
from  dbo.FactResellerSales fact join dbo.DimProduct prod
on          fact.ProductKey = prod.ProductKey
join  dbo.DimProductSubcategory sub
on          prod.ProductSubcategoryKey = sub.ProductSubcategoryKey
wheresub.EnglishProductSubcategoryName = ‘Bib-Shorts’
and         prod.Color = ‘Multi’
–756

In MDX, CROSS JOIN does not return the Cartesian product members from hierarchies

Enter the world of MDX. CROSS JOIN is an important operation in MDX, with some twists and turns.

The MDX query below is the equivalent of the SQL CROSS JOIN in the previous section. However, the result set contains only one row.

select
{ } on 0,
{ CROSSJOIN(
[Product].[Subcategory].[Subcategory].[Bib-Shorts],
[Product].[Color].[Color].members
)
} on 1
from
[Adventure Works]

Multi-Bib-shorts Crossjoin

 

 

 

 

 

With the CROSSJOIN operation in MDX, we get only the combinations that actually have fact data. We now know that out of the 10 colors of Bib-Shorts only the Multi color Bib-Shorts has sales data (fact data).

Unlike the SQL CROSS JOIN, where we had to join one fact table and two dimension tables to find out which color of bib-shorts have sales, in MDX, the CROSSJOIN function automatically took into the consideration of the fact table. I have to say that the twists and turns are for the best.

Please refer to the section ‘Finding related members in the same dimension“ in Chapter 5 of our book, MDX with SSAS 2012, to gain more knowledge on this topic and how it is related to the concept of auto-exists and the EXISTS and EXISTING function.

Three different ways to do crossjoining

In MDX, you can cross join different hierarchies using three different ways. Chris Webb has a blog here, The rather-too-many ways to crossjoin in MDX, which is better than anything I can write about CROSS JOIN.

 

MDX #22 – What to use for explicit sorting on dimensions

December 3, 2012 Leave a comment

I often think that I am lucky to work in a profession that I enjoy very much. It makes everything I do a fun thing to do. Majority of the fun comes from the fact that I have my hands in almost every aspect of building a data warehouse. From business analysis, to ETL, to star schema/cube design and development to reporting, and to customer service. My blogs most of the time reflect the different roles I play in my profession.

In this example, I am going from a reporting requirement to the MDX queries that allow me to meet the requirement.

Note: all the examples are demonstrated in the sample Adventure Works database (star schema or the multi-dimensional cube) from Microsoft.

Very often we need to produce reports that compare monthly data. Providing two Month reporting filters will be the first step.

My last two blogs dealt with some aspects of this kind of reporting requirements.

MDX #21 – Days in a month

MDX #20 – Last Data Date   

Reporting Requirements

In this example I need to accomplish the followings:

1) Get all the months that have data

2) Sort the months in descending order

3) Default the first filter to the most recent month that has data

4) Default the second filter to the previous month that has data

image

In this blog, I’ll try to accomplish 1) and 2).

To get all the Months, let’s put the Date dimension on Y axis

To make sure that the months I get have Sales data, let’s put the [Sales Amount] measure on X axis.

Note: If you’ve been using the Query editor in SSRS, you might have noticed that the only “Dimension” that is allowed on the X axis is the [Measures] dimension. So you cannot place the Date dimension on the X axis. This can create a bit confusion first, but it really doesn’t prevent me from writing MDX queries to my hearts desire. 

image

Use the key word non empty to remove the months that do not have data

Note: Both the keyword NON EMPTY and the NONEMPTY(set_expression1, set_expression2) function will evaluate the cells for "empty" condition. The first only works on the axis. The latter can be used in any parts of the MDX queries and covers much more ground because it can take any two sets as parameters (although the second one is optional). Both should work in this example.

 

image

So far, I have met the first requirement 1) Get all the months that have data.

Explicitly sort the months in descending order

In order to meet the second requirement, I need to explicitly sort the months in descending order. I know I need some values I can find in the cube about the month for the sorting. Before I can figure out that, I need some help on all the things I can find about the month. 

They are the member properties that I need to find out.

image

The Member Value is the one I can use to sort the months explicitly

image

All looks good, except the first member_key which returned a NULL value. It turned out that I cannot really use the member_key function here because the [Month Name] attribute has a composite key. Open the Date Dimension Editor, we can see the KeyColumns has a collection of Year and Month Number. The property Key0 and Key1 will give us the correct key values.

image

But the keys cannot provide help to sort the months. The Member Value is the one I can use to sort the months explicitly.

Use attribute [Month Name]  rather than user hierarchy [Calendar]

Note: Since I am using the attribute [Month Name] for sorting, I’ll need to use it also on the Y axis. Otherwise using the [Calendar] user hierarchy will not work.

 image

It took us a few tries along the way, but the final query is simple enough.

To summarize, here are what we’ve learned

1) Many member properties, including ID, key, caption, name, member value, can be potential candidates for explicit sorting.

2) I used to use member key for explicit sorting on dimensions. But it will not work on the attributes that have composite keys. To get to the composite keys, we need to use the Key0, Key1, etc.

3) Member Value can be a good choice if we need to sort by numerics. Sometimes, Member Name can also be a good choice if we need to sort alphabetically.

SSAS #24 – Implement Attribute Relationship in SSAS 2008

June 13, 2011 Leave a comment

Do you still remember the blue squiggly from my previous blog post, SSAS #23 – Implement dimensional hierarchy in SSAS 2008?

On the Hierarchies tab in the Dimension Designer, the blue squiggly says:

“Attribute relationships do not exist between one or more levels in this hierarchy. This may result in decreased query performance.”

You’ve already know that this is a Best Practice Warning and it is telling us to create attribute relationships.

Fortunately, in SSAS 2008, attribute relationships are no longer a “hidden” feature, as in SSAS 2005 where there is no graphic to show the relationships. Now we have a dedicated tab with graphics. How cool can it get?

After the drag-and-drop of the two attributes, TreatmentType and Treatment, to the Hierarchies tab, the initial graphics on the Attribute relationships shows:

image  

This is not exactly correct.

The correct graphics should be:

image

The trick here is to create the attribute relationships in reverse, that is, lower-order objects (many) then higher-order objects (one), instead of in the order of one-to-many.

There are two ways to get the relationships into the correct order.

One way –drag the lower-order object and drop in onto the higher-order object

Give it a try. Drag Treatment and drop it onto TreatmentType.

Another way – use the Attribute Relationships pane at the lower right corner.

Right click on the relationship. Select Edit Attribute Relationship…

image

Make sure these:

  • The One side of the relationship = Source Attribute
  • The Many side of the relationship = Related Attribute
  • Relationship type = Rigid (in my example, the relationship will not change over time)
  •  

image

The blue squiggly on the Hierarchies tab disappeared

Here is the proof.

image

Closing Remark

In SSAS #22, 23 and 24 posts, I made the following assumptions:

  • you will use the star schema in your data mart design, and
  • there are natural one-to-many relationships among dimensional attributes, therefore,
  • the dimensional hierarchical structures are stored in one single dimensional table, and
  • the dimensional table is a non-parent-child table
    There are many discussions about Snowflake schema verses Star schema. If you use a Star schema, the Dimension Wizard will not be able to detect the natural hierarchies that exist.

Dimension Role-Playing in Employee Comp Reporting

March 17, 2011 Leave a comment

The employee hierarchy table is just one physical table, with a date range to keep employee’s employment history. This same employee hierarchy table can be used sometimes to show different business functional teams, for example, LAS VS. Underwriters on the same fact table. 

I can simply create two SQL views to handle this, or just use two sub-queries if I don’t want to create too many SQL views.

image

image

This is called dimension role-playing. Role-playing in a data warehouse occurs when a single dimension simultaneously appears several times in the same fact table. The underlying dimension exists as a single physical table, but each of the roles should be presented in a separated labeled view.  

Another common example in the employee productivity reporting, is the date dimension. We will just need one single date dimension table, but very often, we need to have a second date, such as application date, besides the primary decision date, on the same fact table. Again we only need to create two different SQL views to handle this, with only one underlying physical date dimension table.

Introduction to Employee Compensation and Productivity Reporting

March 16, 2011 Leave a comment

I came back to my old job, but am not working on exactly same projects. I am very happy that I have new puzzles to solve now.

Obviously employee compensation and productivity reporting has become a hot topic in the business recently.

I am going to blog on this subject as I get to understand more about the business needs.

After completing several ETL processes and reports so far, I gradually start to put all the puzzles together.

In its simplistic form, I can envision a subset of the data warehouse bus matrix. This matrix will revolve as I understand more.

image

The dimensions will involve hierarchies. All the facts basically are from a transactional table. But as we put together all the fact tables for reporting, we will get all three types of fact tables, i.e. the transactional fact tables, snapshot fact tables, and also accumulating fact tables. The third type will be the most common.

The natural granularity for our business transactional fact table will be one row for each action a team member does for each account. For the purpose of employee compensation and productivity reporting, the facts associated with the business process typically just include the counts of the actions, such as the counts all “declines”, or the counts of all “approvals”.

In the next blog I’ll talk about whether I need to normalize my fact tables by fact types.  

Define the grain of fact tables with unique key constraints

March 13, 2011 Leave a comment

I have been working with SQL Server, and Integration Services and Reporting Services for years. Data from our data warehouse in DB2 are ETL’d to the SQL server to be ready to be consumed by SSRS report. Until recently, I’ve only been using unique key constrains on the fact tables sparsely. I’ve started to do so on all the fact tables, and dimension tables now.

Out of three basic fact tables, transactional fact table, fact table for periodic snapshots, and for accumulating snapshots, we very often deal with the first type of transactional fact table, which holds data at the most detailed level. Generally speaking, this type of fact table will have the most number of dimensions associated with it. Fortunately, it’s not that many, in the type of fact tables I deal with. The most common grain of the fact tables I deal with involves account number, transaction date, employee login code, type of product, type of actions (this can grow to a few sometimes). If it’s an accumulating fact table, several milestone dates can also be part of the grain.

OK, back to the creation of the unique key constraints. Most of the unique key constraints will just be a composite key that is composed of all the dimensions in a grain. The grain of a fact table represents the most atomic level by which the facts may be uniquely defined. There might be also some dimensions (such as employee’s location) that do not contribute to the uniqueness of the fact records. Do not include those dimensions in the unique composite key.

%d bloggers like this: