SSIS #118–Is it an OLEDB or an SMO Server Connection?

December 17, 2014 Leave a comment

OLEDB is the native client for SQL Server. Every SSIS developer is familiar with how to use it. Almost every task in Integration Services can use an OLEDB connection. As a matter of fact, it’s a preferable client to use if your backend is SQL Server.

However, there is one exception, that is the Transfer SQL Server Objects Task. This task uses an SMO Server connection only.

 

pict1

 

 

 

 

 

 

 

 

 

If you don’t know this before you use this transfer object task, there can be confusions during development and also during deployment.

Here are my lessons learned.

  1. Visual differences between the OLEDB and the SMO Server connections in Integration Services 2012 helps

pic2

 

 

In SSIS 2012, I am happy to see the visual differences. If you are still stuck in 2008, you will not see any visual differences.

  1. Name your SMO Server connections properly

A good practice is to name the SMO Server connection properly. I used such names as:

  • SMOServer_Staging (this could be the source server for the transfer task)
  • SMOServer_Reporting (this could be the destination server for the transfer task)
  1. Rename the SMO Server connection properly

In SSIS 2012, you have three difference places to create connections.

  • At project level: connections are available to all packages in the project
  • At package level: only available to the packages in which the connection is created
  • At task level: also only available to the packages in which the connection is created

The best practice is to create connections at the project level.

However, no matter where you are creating the connections, the UI in the SSIS always prefix the connection with the server name:

  • For OLEDB – servername.databasename: removing the servername is usually the best practice here, leaving just the database name.
  • For SMO Server – servername: since we cannot just simply remove the servername, the best practice here is to rename it to what is suggested in the section 2.
  1. Confusions during deployment over where to change to production/test server

In 2008 , during deployment, I’ve seen DBAs done the following:

  • Create new connection: because of the confusions about OLEDB vs. SMOServer connections, and even bigger confusions if your connections have your development server name in it, DBAs would most likely choose to create new connections.
  • Rename the connection name: again if you have not named your SMO connections properly, they will tend to rename it.

Neither of the above practices is good. If you as a developer have done if properly, DBA’s will not need to mess too much with the connections:

  • Named your SMO Server connections properly so they know the differences with the OLEDB connections.
  • Your connection names do not have your development server name so they don’t need to rename it.
  1. In 2012, the deployment is a lot easier

In 2012, deployment is much more straightforward because

  • Connections can be created at project level: there is only one place to mess with the connections, including the connection name, server name and database name, and bunch of other connection properties
  • Of cause you can take even greater advantages of many features such as project level parameters if you have chosen to deploy packages to SSIS Catalog

More about SMO connection manager

SMO Connection Manager

Quote from the MSDN article:

An SMO connection manager enables a package to connect to a SQL Management Object (SMO) server. The transfer tasks that SQL Server Integration Services includes use an SMO connection manager. For example, the Transfer Logins task that transfers SQL Server logins uses an SMO connection manager.

When you add an SMO connection manager to a package, Integration Services creates a connection manager that will resolve to an SMO connection at run time, sets the connection manager properties, and adds the connection manager to the Connections collection on the package. The ConnectionManagerType property of the connection manager is set to SMOServer.

You can configure an SMO connection manager in the following ways:

  • Specify the name of a server on which SQL Server is installed.
  • Select the authentication mode for connecting to the server.

More about Transfer SQL Server Objects task

Besides, the data flow object, the Transfer SQL Server Objects task is another task you can use to transfer not just data but also other types of objects between instances of SQL Server. For more info, check out this MSDN article.

Transfer SQL Server Objects Task

 

SSAS #39–Building Optimal and Well-Tuned Dimensions

December 14, 2014 Leave a comment

The Analysis Services Performance Guide from bunch of Microsoft Analysis Services experts have been updated since 2005 edition for 2008 R2 and for 2012.

  • Microsoft SQL Server 2005 Analysis Services Performance Guide
  • Microsoft SQL Server 2008 R2 Analysis Services Performance Guide
  • Microsoft SQL Server Analysis Services Multidimensional Performance and Operations Guide

I highly recommend the guide. It is short, to the point and it’s FREE.

Here is my summary of how to build optimal dimensions from the Part 1 of the guide.

  1. Efficient Use of Attributes Is a Key Design Skill to Master
  • Dimensions are composed of attributes, which are related to each other through hierarchies.
  • Studying and implementing the attribute relationships available in the business model can help improve cube performance.
  • Using the KeyColumns, ValueColumn, and NameColumn properties effectively can reduce processing time. They also reduce the size of the dimension and the likelihood of user errors. This is especially true for attributes that have a large number of members, that is, greater than one million members.
  • KeyColumns: define the attribute; uniquely identify each instance of the attribute; It is a best practice to assign a numeric source field, if available, to the KeyColumns property rather than a string property; Furthermore, use a single column key instead of a composite, multi‐column key.
  • NameColumn: displayed to end users; if empty, default to the KeyColumns property.
  • ValueColumn: typically used for calculations; strongly typed and can be accessed through the MemberValue MDX function.
  • Using ValueColumn, and NameColumn properties: eliminates the need for extraneous attributes. This reduces the total number of attributes in your design, making it more efficient.
  • Consider using Key fields (numeric) for the KeyColumns property, rather than the alphanumeric fields.
  1. Considering Hiding Most Attribute Hierarchies
  • Set AttributeHierarchyVisible = false for most attribute hierarchies.
  • Hiding the Surrogate Key.
  1. Considering disabling Attribute Hierarchy
  • If attribute is included for ordering purpose, set AttributeHierarchyEnabled = false and AttributeHierarchyOptimizedState = NotOptimized to save on processing operations
  1. Considering Disabling Ordering of Attribute
  • For such hidden attribute that you used only for implementation purposes, you don’t care about the ordering of an attribute (the surrogate key is one such case), set AttributeHierarchyOrdered = false to save time during processing of the dimension.
  1. Set Explicit Default Attribute Members Properly (carefully)
  • Default Attribute Members: Any query that does not explicitly reference a hierarchy will use the current member of that hierarchy.
  • The default behavior of Analysis Services is to assign the All member of a dimension as the default member.
  • Explicitly set default members only on dimensions with single hierarchies or in hierarchies that do not have an All level.
  1. Removing Meaningless All Level
  • It can even be expensive to ask for the All level of dimension if there is not good aggregate to respond to the query.
  • You can disable the All member in a hierarchy. You do this by setting the IsAggregateable = false on the attribute at the top of the hierarchy.
  • Note that if you disable the All level, you should also set a default member as described in the previous section–if you don’t, Analysis Services will choose one for you.
  1. Identifying Attribute Relationships Other Than the Relationships to the Primary Key
  • Bushy attribute relationship: By default, all attributes are related to the primary key, and the attribute relationship diagram represents a “bush” where relationships all stem from the key attribute and end at each other’s attribute.
  • Redefine attribute relationships: You can optimize performance by defining hierarchical relationships supported by the data.
  • Attribute relationships help performance in 3 ways:
    • Cross products between levels (for example the cross-product between Subcategory and Category) in the hierarchy do not need to go through the key attribute, saving CPU time during queries.
    • Aggregations built on attributes can be reused on related attributes, saving resources during processing and for queries.
    • Auto-exits can be more efficient to eliminate attribute combinations that do not exist in the data.
    • If the attribute relationship is defined, the Analysis Services engine knows beforehand which category each subcategory belongs to via indexes built at process time.
  1. Flexible vs. Rigid Relationships: considering using rigid relationships
  • A flexible attribute relationship (customer-city) is one where members can move around during dimension updates, and
  • A rigid attribute relationship (month-year) is one where the member relationships are guaranteed to be fixed.
  • Processing flexible attribute relationship is expensive: When a change is detected during process in a flexible relationship, all indexes for partitions referencing the affected dimension (including the indexes for attribute that are not affected) must be invalidated. This is an expensive operation and may cause Process Update operations to take a very long time. Indexes invalidated by changes in flexible relationships must be rebuilt after a Process Update operation with a Process Index on the affected partitions; this adds even more time to cube processing.
  • Flexible relationships are the default setting. Carefully consider the advantages of rigid relationships and change the default where the design allows it.
  1. Considering unnatural user hierarchies
  • There are two types of user hierarchies, natural and unnatural hierarchy.
  • Unnatural hierarchies have at least two consecutive levels that have no attribute relationships, commonly created for drill‐down paths of commonly viewed attributes that do not follow any natural hierarchy. An example is gender-education relationship.
  • Unnatural hierarchies are not materialized on disk, and the attributes participating in unnatural hierarchies are not automatically considered as aggregation candidates. Rather, they simply provide users with easy‐to‐use drill‐down paths for commonly viewed attributes that do not have natural relationships.
  • By assembling these attributes into hierarchies, you can also use a variety of MDX navigation functions to easily perform calculations like percent of parent.
  1. Considering accessing attributes via member properties only
  • Disable the attribute’s hierarchy: by setting the AttributeHierarchyEnabled property to False if you only want to access an attribute as member property.
    • Member properties provide a different mechanism to expose dimension information.
    • For a given attribute, member properties are automatically created for every direct attribute relationship.
    • For the primary key attribute, this means that every attribute that is directly related to the primary key is available as a member property of the primary key attribute.
  • After you verify that the correct relationship is in place, you can disable the attribute’s hierarchy.
  • Disabling the attribute hierarchy can improve performance and decrease cube size.
  • Because the attribute will no longer be indexed or aggregated. This can be especially useful for highcardinality attributes that have a one‐to‐one relationship with the primary key.
  • High cardinality attributes such as phone numbers and addresses typically do not require slice‐and‐dice analysis.
  • By disabling the hierarchies for these attributes and accessing them via member properties, you can save processing time and reduce cube size.

 

SSIS #117–What is the Location in the MSOLAP Connection String

December 13, 2014 Leave a comment

MSOLAP is a native OLE DB provider for Analysis Services

A common task during ETL is to process the OLAP dimensions and the cube after all the dimension and fact tables are loaded into the data warehouse. Integration Services provide two special control flow tasks for this purpose, the Analysis Services Processing Task and the Analysis Services Execute DDL Task.

Before you can use either of these two tasks, you will need to create a SSIS Connection Manager that knows where the Analysis Services server is. The Connection Manager is just the name of the GUI in Integration Services. Behind the fancy GUI, it’s just a connection string that uses the native data provider, the Analysis Services OLE DB provider (or just short as MSOLAP DLL).

 

pic1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pic2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Location in the connection string causing “connection cannot be made”

In the above screenshot, I not only provided the Server name but also the Location which I made it the same as the server. The testing of the connection in the GUI was successful.

After the GUI is completed, the final connection string looks like this.

Provider=MSOLAP.5;Data Source=localhost\MULTI2012;Integrated Security=SSPI;Initial Catalog=SSAS2012DEMO;Location=localhost\MULTI2012

The control flow task Analysis Services Execute DDL Task is usually the more flexible choice for processing Analysis Services objects than the Processing Task. But to makes sure that my connection string is correct, I used the Analysis Services Processing Task, which is the GUI version of the DDL task, to test the connection string. I ran into this “connection cannot be made” error.

pic3

 

 

 

 

 

 

 

 

It turned about the problem is in the Location. Removing the Location from the connection string solved the connection issue.

AS a matter of fact, the shortest connection string that can pass the connection is just this:

Data Source=localhost\MULTI2012

No explanation for Location in connection strings for MSOLAP and ADOMD.NET provider

I checked both the connection string document in MSDN library for MSOLAP and ADOMD.NET , and couldn’t find much information about what the Location is supposed to be.

AdomdConnection.ConnectionString Property

Connection String Properties (Analysis Services)

In the first link for the ADOMD.NET provider it just simply states:

Location: Sets the name of the server.

In the second link for the MSOLAP provider, I couldn’t even find any mentioning of Location.

Three types of data providers for Analysis Services

There are really three types of data providers for Analysis Services.

Data providers used for Analysis Services connections

Analysis Services provides three data providers for server and data access. All applications connecting to Analysis Services do so using one of these providers. Two of the providers, ADOMD.NET and Analysis Services Management Objects (AMO), are managed data providers. The Analysis Services OLE DB provider (MSOLAP DLL) is a native data provider.

More ways to access Analysis Services in SSIS than the DDL Task and the Processing Task

Check out this book if you want to know more than just the DDL Task and the Processing Task to access Analysis Services in SSIS.

Microsoft® SQL Server® 2008 Integration Services Problem–Design–Solution

Quote from the book:

 

* Using the control flow object Analysis Services Processing Task is the most straightforward

approach, and is discussed in the next section. This allows any SSAS objects (that involve

processing) and their processing method to be selected. A few drawbacks are listed in the next

section.

 

* Although the Analysis Services Execute DDL Task may sound as if it only allows objects to be

modified, created, or deleted, its functionality goes way beyond Data Definition Language

(DDL). This task runs an SSAS XMLA script. XMLA includes the capability not just to run DDL,

but also query and process. Therefore, this task is very useful for SSAS processing, because the

XMLA can be modified in an SSIS package before it is executed.

 

* Another approach involves using the Execute Process Task to call the ASCMD executable that

comes with SSAS. This executable can run MDX, Data Mining Expressions (DMX), and XMLA,

and also contains some command – line switches to make your processing more dynamic.

The data flow in SSIS includes two destinations: the Dimension Processing destination and the

Partition Processing destination . These allow data directly from the pipeline to be pushed into

either an SSAS dimension or SSAS partition, respectively. Unlike all the other SSIS approaches

discussed here, this approach is the only one where data is pushed directly to SSAS. The

other approaches essentially tell SSAS to start pulling the data from a data source. Like the

other SSIS approaches, this approach is described later in this chapter.

 

* Finally, the programming API for SSAS, called Analysis Management Objects (AMO), allows the

interface into the SSAS object model and can leverage the processing methods to process any of

the SSAS objects described. To use AMO for processing, you must leverage either the Script Task

in the control flow, or the Script Component in the data flow.

 

MDX #45–Find all the Calculated Sets and Their Calculation Scripts

November 14, 2014 Leave a comment

Now we know how to find all the calculation scripts in a cube by querying the DMV $SYSTEM.MDSCHEMA_MEASURES.

MDX #43–Find a MDX Calculation Script

What about all those calculated sets in the cube?

In the Adventure Works cube, there are quite many calculated sets scattered around in the Sets folder in some dimensions, as shown in the following screenshot.

pic1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

How do we find out all of them and their calculation scripts using the DMVs?

Now this time we can use this DMV $SYSTEM.MDSCHEMA_SETS.

SELECT * FROM    $SYSTEM.MDSCHEMA_SETS

Here is the result. The EXPRESSION field shows the calculation scripts.

pic2

 

 

 

 

 

Categories: MDX Challenges Tags: ,

MDX #44–How to find all the calculated measures

November 13, 2014 Leave a comment

Calculated measures are visually identifiable

If you have access to a cube, it’s quite easy to find all the calculated measures.

The following is a screenshot from SSMS MDX query editor. The icon next to each measure can visually tell you if it’s a regular measure or a calculated measure. The icon with three uneven bars indicates a regular measure, and the one with a calculator and an orange bar (not sure what it represents) on top indicates a calculated measure.

pic1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AllMembers function returns both the regular and calculated measures

I am always interested in knowing how to also accomplish things with MDX queries.

It turned out it’s pretty simple. The key is the AllMembers function.

By default, MDX excludes calculated members when it resolves set functions. In this blog, I am only examining the special dimension, Measures, which has only one single visible hierarchy; as a result, the hierarchy can be either referred to by the dimension name or by the hierarchy name. So both of the following are valid expressions, and both will return regular measures only.

[Measures].[Measures].Members
[Measures].Members

The AllMembers function is very straightforward. It will return what it is meant to do, both the regular measures and the calculated measures.

Here are 4 simple statements regarding how to use AllMembers function to get calculated measures.

  1. Members function only returns regular members, or regular measures on the Measures hierarchy.
  2. AllMembers function returns regular members AND calculated members, or calculated measures on the Measures hierarchy.
  3. (AllMembers – Members) gives us calculated measures only.
  4. AddCalculatedMembers() function is semantically similar to the AllMembers function.

The following are 4 MDX queries to demonstrate the 4 statements above.

pic2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The following screenshot shows the results.

pic3

 

 

 

 

 

 

 

Here are the queries in text.

–1. Members function only returns regular members, or regular measures on the Measures hierarchy
SELECT
{[Measures].Members} ON COLUMNS
FROM
[Adventure Works]
go

–2. AllMembers function returns regular members AND calculated members, or calculated measures on the Measures hierarchy
SELECT
{[Measures].AllMembers} ON COLUMNS
FROM
[Adventure Works]
go

–3. AllMembers – Members gives us calculated measures only
SELECT
{ [Measures].AllMembers -
[Measures].Members
} ON COLUMNS
FROM
[Adventure Works]
go

–4. AddCalculatedMembers() function is the same as the AllMembers function
SELECT
AddCalculatedMembers([Measures].Members) -
[Measures].Members
ON COLUMNS
FROM [Adventure Works]

The AllMembers function and the AddCalculatedMembers() function can be also applied to other regular dimensions to get the calculated members.

MDX #43–Find a MDX Calculation Script

November 12, 2014 Leave a comment

Finding out MDX calculation scripts is a common task

A co-worker recently asked me what the calculation for a calculated measure is in our reporting cube.

If you have the Analysis Services project in Visual Studio locally, it is easy to find what the calculation script is from the Calculations tab in the cube designer.

But what if you don’t have the VS project handy, and you have access to the cube from SQL Server Management Studio?

EXPRESSION field in $SYSTEM.MDSCHEMA_MEASURES

Here is a simple script you can run to quickly get the calculation script.

This script queries the SSAS Dynamic Management View $SYSTEM.MDSCHEMA_MEASURES. The EXPRESSION field will return the actual calculation script. You will need to run the DMV queries in the MDX query editor, not the SQL query editor.

pic1

 

 

 

 

 

Here is the result.

pic2

 

 

 

Here is the query in text.

SELECT    CUBE_NAME
,        MEASURE_UNIQUE_NAME
,        EXPRESSION
,        MEASUREGROUP_NAME
from    $SYSTEM.MDSCHEMA_MEASURES
where    MEASURE_UNIQUE_NAME = ‘[Measures].[Ratio to Parent Product]’

Use $SYSTEM.DBSCHEMA_COLUMNS to find all the columns in a DMV

You might ask how do I know what are all the columns in this view. Here is a DMV view, $SYSTEM.DBSCHEMA_COLUMNS, you can query to find out all the columns in a DMV view.

pic3

 

 

 

Here is the result.

 

pic4

 

 

 

 

 

 

 

 

 

 

 

 

Here is the query in text.

SELECT *
FROM    $SYSTEM.DBSCHEMA_COLUMNS
WHERE    TABLE_SCHEMA = ‘$SYSTEM’
AND        TABLE_NAME = ‘MDSCHEMA_MEASURES’

$SYSTEM.DISCOVER_SCHEMA_ROWSETS is the only DMV name you need to remember

You might also ask how would I know to use the view $SYSTEM.MDSCHEMA_MEASURES to find out the calculation script.

The answer is to just remember one view, $SYSTEM.DISCOVER_SCHEMA_ROWSETS.

SELECT * FROM $SYSTEM.DISCOVER_SCHEMA_ROWSETS

pic5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The $SYSTEM.DISCOVER_SCHEMA_ROWSETS view will show you all the DMV views that you can use to get the metadata about your dimensions and cubes.

 

DMVs can be very useful for documenting SSAS databases, monitoring usage and activity. To know more about how to use these DMVs check out our book “MDX with SSAS 2012 Cookbook”.

Categories: MDX Challenges, SSAS Tags: , ,

MDX #42–IsEmpty? or = 0 ? or IS NULL?

November 12, 2014 1 comment

In SQL, NULL means undefined value

Any SQL programmer can tell you that zero is not NULL, NULL is not zero, and NULL is not even equal to NULL either, because NULL means undefined. One undefined value cannot be equal to another undefined value.

In MDX, NULL means an empty cell

What about in MDX, how do we detect zero and NULL? What does even NULL mean? Since we are leaving the one dimensional SQL world, and are stepping into the multi-dimensional cube, NULL has a slightly different meaning. To me, it really means an empty cell. When the Internet Sales is showing as (null) for 11/11/2014, it means that the cell of {[Internet Sales], [11/11/2014]} is an empty cell. If you query the fact table, SELECT Internet Sales FROM factSales WHERE Date = 11/11/2014, it should return no rows back.

What if the Internet Sales is showing as 0 (zero) for 11/11/2014? The cell is no longer empty. the fact table should have one row WHERE Date = 11/11/2014, only that the Internet Sales is zero.

Detecting zero and NULL in MDX

How do we detect when Internet Sales is zero or when the cell is totally empty in MDX?

I wrote the following demo MDX code to see the various ways and the results.

I first created two values, a zero value and a NULL value.

pic1

 

 

 

Then I created three new values to detect the zero value, using three different ways

  • 0 = 0?: is 0 = 0?. It should be true.
  • 0 Is NULL?: It should be false.
  • 0 IsEmpty?: It should be false.

 

pic2

 

 

 

 

 

 

 

 

 

I also created three new values to detect the NULL value, using three different ways

  • NULL = 0?: a big question mark here.
  • NULL Is NULL?: It should be false. Remember NULL is not equal to NULL in SQL.
  • NULL IsEmpty?: It should be true. Remember NULL means empty cell in MDX.

pic3

 

 

 

 

 

 

 

 

 

Then I wrote the following MDX query.

pic4

 

 

 

 

 

 

 

Here is the result.

pic5

 

 

Zero is zero, AND NULL is also zero

Do the four results in red surprise you?

  • Zero is zero, AND NULL is also zero.
  • Zero is certainly not empty.
  • NULL IS empty.

It did surprise me. I don’t have much explanation why MDX thinks NULL is zero. At this point, I am just taking in this fact.

Here is the code in text.

with
member [measures].[A Zero Value] as 0
member [measures].[A NULL Value] as null

member [measures].[0 = 0?] as
iif ( [measures].[A Zero Value] = 0,
“true”,
“false”
)
member [measures].[0 Is NULL?] as
iif ( [measures].[A Zero Value] is null,
“true”,
“false”
)
member [measures].[0 IsEmpty?] as
iif ( IsEmpty([Measures].[A Zero Value]) = True,
“true”,
“false”
)
member [measures].[NULL = 0?] as
iif ( [measures].[A NULL Value] = 0,
“true”,
“false”
)
member [measures].[NULL Is NULL?] as
iif ( [measures].[A NULL Value] is null,
“true”,
“false”
)
member [measures].[NULL IsEmpty?] as
iif ( IsEmpty([Measures].[A NULL Value]) = True,
“true”,
“false”
)

select { [Measures].[A Zero Value],
[measures].[0 = 0?],
[measures].[0 Is NULL?],
[measures].[0 IsEmpty?],
[Measures].[A NULL Value],
[measures].[NULL = 0?],
[measures].[NULL Is NULL?],
[measures].[NULL IsEmpty?]
} on 0
from    [RADM_REPORTING]

Empty cells, natural or user-defined can be a powerful way to optimize our MDX queries

Then you might think that since MDX thinks NULL is zero, then why many people set certain side of a MDX calculation to be NULL. Why cannot we all use zero, not mess with the NULL? Well, empty cells, natural or user-defined can be a powerful way to optimize our MDX queries, with the help of either the NONEMPTY() function or the Non EMPTY keyword that works only axes. Check out our book MDX with SSAS 2012 Cookbook on this topic.

Categories: MDX Challenges Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 175 other followers

%d bloggers like this: