Archive for the ‘DW & BI Interview Qs’ Category

ETL #73–NULL or NOT NULL and SQL Interview Questions

November 25, 2016 Leave a comment

Today is the day after Thanksgiving. There are many things to be thankful so I decided to write a short post today.

The first thing to be thankful is that Tomislav and I have completed the third edition of our MDX book, MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. The book is published by Packt Publishing and has been uploaded to all the major publishing channels.  

Recommend SQL interview questions on

The second thing to be thankful is the enthusiastic audience who have been faithfully reading my posts. I recently received two inquires, which are somehow related. One reader was a bit confused by what I said about the SQL NULL values and what I said about being careful of what we put in the WHERE clause. Another reader is from, a new online learning platform. offers a number of free online resources to help people learn business skills — such as SQL. They are wondering if I’d be willing to post a link to their page on SQL interview questions ( on my site.

As I am browsing through the SQL interview questions on, I see the #3 question, “Why is this query not returning the expected results?”, and thought this is perfect for the question from the reader I mentioned previously. Instead of overwhelming readers, listed only 7 most common SQL interview questions.

I’d rather not to repeat what they have and would recommend their SQL interview questions to those of you who are still new to SQL or Business Intelligence.

MDX Cook book third edition

The full title of the book is MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. By the time you see this post, the book should be on Amazon,, and all other online technology book stores and the e-subscription sites.

See also

ETL #72–Your data can mysteriously disappear after a WHERE clause was added (2)

ETL #71–Your data can mysteriously disappear after a WHERE clause was added (1)

SQL Interview Q #1 – What is First Normal Form (1NF)?

November 12, 2011 1 comment

My blog has many practical tips and best pratices for SQL/BI developers,but I haven’t focused on interview questions for SQL/BI developers so far. This might change in the future. It’s been a challenge for many people to break into or stay competitive in the SQL/BI profession.

I am very lucky to be able to stay in the profession and also stay in the financial industry. It’s been very rewarding for me to share my experience and knowledge through this blog.

Many recruiters do not understand what exactly a SQL/BI Developer does. One thing they assume we don’t do is design. On the contrary, designing from simple table structure to the entire sub-system for staging and ETL is our daily job.

In this blog, I’ll share with you one simple SQL design interview question and the answer that will set you apart from other candidates.

Interview question: what is First Normal Form (1NF)?

In order to give an answer that will earn you an A, we need to relate 1NF to what we do every day first. Memorising answers from hundrands of SQL blogs will not get you very far, because under the pressure of being interviewed by several people, your memory will start to suffer from impairment very soon.

Have you ever created primary keys for your SQL tables? I bet you have. But have you ever asked yourself why do we need to create primary keys? Or have you ever asked yourself a question in an even bigger scope: how do we efficiently organize data in a database?

Here are the simple answers to the above questions:

1. Normalization is the process of efficiently organizing data in a database.
2. There are two goals of the normalization process.
3. One goal is to eliminate redundant data.
4. Another goal is to ensure data dependencies make sense (only storing related data in a table).
5. First normal form (1NF) happens to be the very basic rule for an organized database.
6. The implementation of 1NF principle in DBMS is to create primary key for a SQL table.

Now we know 1NF is implemented as primary key in DBMS. With the above answers, you’ve already received an A. With the additions below, you will for sure get a solid A+.

7. When creating PK for a table, we want accomplish two thins.
8. One is to eliminate duplicative columns from the same table.
9. Another one is to create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Categories: SQL Interview Qs

SSIS – Generate Package ID

October 6, 2010 Leave a comment

Problem: when using a package template, or copying and pasting a package to clone a package in Solution Explorer, the newly created package has the same ID as its parent. This can be a problem in production environment where multiple packages having the same PackageID, logging into the same database. If something happens, I will not be able to trace which package is having the issue, since the PackageID is how I will map back to the package’s name.

Solution: the solution is simple. We can fix it by generating a new PackageID in BIDS.

In the package Properties windows, find ID, click the dropdown arrow on the right, and click <Generate New ID>.


Categories: SSIS Interview Qs

What is a surrogate key and where do you use it?

August 4, 2010 Leave a comment

What is a surrogate key and where do you use it?


A surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension table’s primary keys. They can use Informatica sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Not only can these natural key values change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, LocationId. This would be internal to the system and as far as the client is concerned you may display only the LocationName.

Another benefit you can get from surrogate keys (SID) is for tracking the SCD – Slowly Changing Dimension.

A classical example from Data warehouse IT Toolbox:

On the 1st of January 2002, Employee ‘E1’ belongs to Business Unit ‘BU1’ (that’s what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit ‘BU1’ But on the 2nd of June the Employee ‘E1’ is muted from Business Unit ‘BU1’ to Business Unit ‘BU2.’ All the new turnover have to belong to the new Business Unit ‘BU2’ but the old one should Belong to the Business Unit ‘BU1.’
If you used the natural business key ‘E1’ for your employee within your data warehouse everything would be allocated to Business Unit ‘BU2’ even what actually belongs to ‘BU1.’

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee ‘E1’ in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee ‘E1’ + ‘BU1.’ All new data (after 2nd of June) would take the SID of the employee ‘E1’ + ‘BU2.’

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code ‘E1’ but for you it becomes
Employee Code + Business Unit, i.e. ‘E1’ + ‘BU1’ or ‘E1’ + ‘BU2.’

Categories: DW Interview Qs

SQL Technical Check Questions

May 30, 2010 Leave a comment

Here it is. SQL_Check_Sherry Google Docs 5/29/2010.

Categories: DW & BI Interview Qs

SQL – Proveit Test (SQL Sever 2005 for Developers)

April 23, 2010 Leave a comment

Hi Peter,

I did take the test last night with a 62% score. I had to remind myself that my score is still slightly above national average, which is 50%, to still feel pround of myself.
It was a tough test, as you said, in the sense that it is not testing your SQL skills, rather it’s testing how much you know (or used) the new enhancements in version 2005. I don’t regret taking the test, even if it means that the recruiter will not talk to me again. At least now I know how much I don’t know about those enhancements.
Thanks for keeping an eye out there for me.

Sherry Li

From: Friend Peter

Subject: ProveIt test
Date: Fri, 23 Apr 2010 08:25:38 -0700

Hi Sherry,

Sorry I missed your IM last night.  I’ve had to take those before.  Sometimes they use proveit and sometimes they use brainbench.  I find the tests are somewhat similar to the Microsoft certification exams in that there are some very easy questions, and some hard ones which are only hard because they are asking questions about features that you may have never used.  For instance, I’ve done very little with XML, and I remember there being several questions about using XML clauses in SQL queries.  I had to guess on those.  Some companies  (staffing companies and/or end client companies) request these tests as a way to further screen their candidates, but it only happens to me about 10% of the time.  

I will look out for SQL opportunities for you at (). 


SSIS Interview Questions

April 15, 2010 1 comment

Here are some SSIS related Interview Questions with answers. 

1. What is the control flow

Answer: In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.

A control flow consists of one or more tasks and containers that execute when the package runs.

To control order or define the conditions for running the next task or container in the package control flow, you use precedence constraints to connect the tasks and containers in a package.

A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow.

SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: containers that provide structures in packages, tasks that provide functionality, and precedence constraints that connect the executables, containers, and tasks into an ordered control flow.

2. What is a data flow

Answer: A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations.

Before you can add a data flow to a package, the package control flow must include a Data Flow task. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package.

SQL Server 2005 Integration Services (SSIS) provides three different types of data flow components: sources, transformations, and destinations. Sources extract data from data stores such as tables and views in relational databases, files, and Analysis Services databases. Transformations modify, summarize, and clean data. Destinations load data into data stores or create in-memory datasets.

3. How do you do error handling in SSIS

Answer: When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.

For example, a data conversion fails because a column contains a string instead of a number, an insertion into a database column fails because the data is a date and the column has a numeric data type, or an expression fails to evaluate because a column value is zero, resulting in a mathematical operation that is not valid.

Errors typically fall into one the following categories:
1) Data conversion errors, which occur if a conversion results in loss of significant digits, the loss of insignificant digits, and the truncation of strings. Data conversion errors also occur if the requested conversion is not supported.
2) Expression evaluation errors, which occur if expressions that are evaluated at run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.
3) Lookup errors, which occur if a lookup operation fails to locate a match in the lookup table.

Many data flow components support error outputs, which let you control how the component handles row-level errors in both incoming and outgoing data. You specify how the component behaves when truncation or an error occurs by setting options on individual columns in the input or output.

For example, you can specify that the component should fail if customer name data is truncated, but ignore errors on another column that contains less important data.

4. How do you do logging in SSIS

Answer: SSIS includes logging features that write log entries when run-time events occur and can also write custom messages.

Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.

Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.

To customize the logging of an event or custom message, Integration Services provides a schema of commonly logged information to include in log entries. The Integration Services log schema defines the information that you can log. You can select elements from the log schema for each log entry.

To enable logging in a package:
1). In Business Intelligence Development Studio, open the Integration Services project that contains the package you want.
2). On the SSIS menu, click Logging.
3). Select a log provider in the Provider type list, and then click Add.

5. How do you deploy SSIS packages

Answer: Integration Services (SSIS) makes it simple to deploy packages to any computer.

There are two steps in the package deployment process:
1). The first step is to build the Integration Services project to create a package deployment utility.
2). The second step is to copy the deployment folder that was created when you built the Integration Services project to the target computer, and then run the Package Installation Wizard to install the packages.

6. How do you schedule SSIS packages to run on the fly
7. How do you run stored procedure and get data

8. A scenario: Want to insert a text file into database table, but during the upload want to change a column called as months – January, Feb, etc to a code, – 1,2,3.. .This code can be read from another database table called months. After the conversion of the data , upload the file. If there are any errors, write to error table. Then for all errors, read errors from database, create a file, and mail it to the supervisor.
How would you accomplish this task in SSIS?

9. What are variables and what is variable scope ?

Answer: Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions.

Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.

Scope: A variable is created within the scope of a package or within the scope of a container, task, or event handler in the package. Because the package container is at the top of the container hierarchy, variables with package scope function like global variables and can be used by all containers in the package. Similarly, variables defined within the scope of a container such as a For Loop container can be used by all tasks or containers within the For Loop container.

Question 1 – True or False

Using a checkpoint file in SSIS is just like issuing the CHECKPOINT command against the relational engine. It commits all of the data to the database.

Answer: False. SSIS provides a Checkpoint capability which allows a package to restart at the point of failure.

Question 2 – Can you explain the what the Import\Export tool

does and the basic steps in the wizard?

Answer: The Import\Export tool is accessible via BIDS or executing the dtswizard command.

The tool identifies a data source and a destination to move data either within 1 database, between instances or even from a database to a file (or vice versa).

Question 3 – What are the command line tools

to execute SQL Server Integration Services packages?

Answer: DTSEXECUI – When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package.

DTEXEC – This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of the SSIS package.

Question 4 – Can you explain the SQL Server Integration Services functionality

in Management Studio?

Answer: You have the ability to do the following:

Login to the SQL Server Integration Services instance
View the SSIS log
View the packages that are currently running on that instance
Browse the packages stored in MSDB or the file system
Import or export packages
Delete packages
Run packages

Question 5 – Can you name some of the core SSIS components

in the Business Intelligence Development Studio you work with on a regular basis when building an SSIS package?

Answer: Connection Managers
Control Flow
Data Flow
Event Handlers
Variables window
Toolbox window
Output window
Package Configurations

Question Difficulty = Moderate

Question 1 – True or False: SSIS has a default

means to log all records updated, deleted or inserted on a per table basis.

Answer: False, but a custom solution can be built to meet these needs.

Question 2 – What is a breakpoint in SSIS?

How is it setup? How do you disable it?

Answer: A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.

There are 10 unique conditions exist for each breakpoint.

Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.

Question 3 – Can you name 5 or more of the native

SSIS connection managers?


1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e., SQL Server 2005)
Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
2) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other connection exposed through managed code (like C#) in a custom task
3) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
4) File connection – Used to reference a file or folder. The options are to either use or create a file or folder

Question 4 – How do you eliminate quotes

from being uploaded from a flat file to SQL Server?

Answer: In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.

Additional information: How to strip out double quotes from an import file in SQL Server Integration Services

Question 5 – Can you name 5 or more of the main SSIS tool box widgets

and their functionality?

For Loop Container
Foreach Loop Container
Sequence Container
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task

Question Difficulty = Difficult

Question 1 – Can you explain one approach to deploy an SSIS package?

Answer: One option is to build a deployment manifest file in BIDS, then copy the directory to the applicable SQL Server then work through the steps of the package installation wizard.

A second option is using the dtutil utility to copy, paste, rename, delete an SSIS Package.

A third option is to login to SQL Server Integration Services via SQL Server Management Studio then navigate to the ‘Stored Packages’ folder then right click on the one of the children folders or an SSIS package to access the ‘Import Packages…’ or ‘Export Packages…’option.

A fourth option in BIDS is to navigate to File | Save Copy of Package and complete the interface.

Question 2 – Can you explain how to setup a checkpoint file in SSIS?

Answer: The following items need to be configured on the properties tab for SSIS package:

CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never (default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

Question 3 – Can you explain different options for dynamic

configurations in SSIS?

Answer: Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables

Question 4 – How do you upgrade an SSIS Package?

Answer: Depending on the complexity of the package, one or two techniques are typically used:
1) Recode the package based on the functionality in SQL Server DTS
2) Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate

Here is good link:

%d bloggers like this: