Archive

Archive for the ‘DW Interview Qs’ Category

ETL #73–NULL or NOT NULL and SQL Interview Questions

November 25, 2016 Leave a comment

Today is the day after Thanksgiving. There are many things to be thankful so I decided to write a short post today.

The first thing to be thankful is that Tomislav and I have completed the third edition of our MDX book, MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. The book is published by Packt Publishing and has been uploaded to all the major publishing channels.  

Recommend SQL interview questions on deskbright.com

The second thing to be thankful is the enthusiastic audience who have been faithfully reading my posts. I recently received two inquires, which are somehow related. One reader was a bit confused by what I said about the SQL NULL values and what I said about being careful of what we put in the WHERE clause. Another reader is from Deskbright.com, a new online learning platform. Deskbright.com offers a number of free online resources to help people learn business skills — such as SQL. They are wondering if I’d be willing to post a link to their page on SQL interview questions (https://www.deskbright.com/sql/sql-interview-questions/) on my site.

As I am browsing through the SQL interview questions on deskbright.com, I see the #3 question, “Why is this query not returning the expected results?”, and thought this is perfect for the question from the reader I mentioned previously. Instead of overwhelming readers, deskbright.com listed only 7 most common SQL interview questions.

I’d rather not to repeat what they have and would recommend their SQL interview questions to those of you who are still new to SQL or Business Intelligence.

MDX Cook book third edition

The full title of the book is MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. By the time you see this post, the book should be on Amazon, PacktPub.com, and all other online technology book stores and the e-subscription sites.

See also

ETL #72–Your data can mysteriously disappear after a WHERE clause was added (2)

ETL #71–Your data can mysteriously disappear after a WHERE clause was added (1)

What is a surrogate key and where do you use it?

August 4, 2010 Leave a comment

Question:
What is a surrogate key and where do you use it?

Answer:

A surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension table’s primary keys. They can use Informatica sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Not only can these natural key values change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, LocationId. This would be internal to the system and as far as the client is concerned you may display only the LocationName.

Another benefit you can get from surrogate keys (SID) is for tracking the SCD – Slowly Changing Dimension.

A classical example from Data warehouse IT Toolbox:

On the 1st of January 2002, Employee ‘E1’ belongs to Business Unit ‘BU1’ (that’s what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit ‘BU1’ But on the 2nd of June the Employee ‘E1’ is muted from Business Unit ‘BU1’ to Business Unit ‘BU2.’ All the new turnover have to belong to the new Business Unit ‘BU2’ but the old one should Belong to the Business Unit ‘BU1.’
If you used the natural business key ‘E1’ for your employee within your data warehouse everything would be allocated to Business Unit ‘BU2’ even what actually belongs to ‘BU1.’

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee ‘E1’ in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee ‘E1’ + ‘BU1.’ All new data (after 2nd of June) would take the SID of the employee ‘E1’ + ‘BU2.’

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code ‘E1’ but for you it becomes
Employee Code + Business Unit, i.e. ‘E1’ + ‘BU1’ or ‘E1’ + ‘BU2.’

Categories: DW Interview Qs
%d bloggers like this: