There are a couple good comments from the readers recently, such as this one:
….You can not say that truncating a target table is a bad practice in general…
I totally agree with the above statement. Actually, truncating a target table is used in many of my ETL processes. Many of the processes I designed also do incremental loading only. So the reader is correct that how we design our ETL process really depends on the characteristics of the data we are loading. It becomes a bad practice when we use wrong design for the wrong data. In the particular example I gave in ETL #70 – The worst possible way to truncate your target table, truncating the target table was a bad practice indeed.
OK, enough about such serious topic. Let’s move on to something lighter. I recently moved to a different group within the same company. I’ve been in the same group for quite awhile, and feel pretty sad about leaving my co-workers and friends. My friend Hamid Y composed a poem for me. I am “publishing” the poem here with Hamid’s consent.
Being a SQL/BI developer for so long, but I still cannot explain to my family what exactly what I do. I recently told an Engineer wife that I work for a bank. She immediately started to consult me with her most frustrating banking questions, such as if she can join a credit union to avoid fees.
I am not sure if a x-ray machine or a sewing machine will help me to answer her question or not, but at least it’s on the lighter side of our daily SQL/BI work. Here it goes:
I wrote this little poem for you…hope you like it..
You’re one of the smartest people I’ve seen
Pumping out SQL code and arithmetic mean
You found program bugs like a x-ray machine
And you saved this company more money than a teller machine
You kept this department together like a sewing machine
And making it run like a pinball machine
So now that you’re leaving our team
I’ll try not to make a scene
Just know that you’re loved and well-respected
By everybody in this department you affected
I am already missing people I’ve being working with day in and out. Thanks friends.
I hope so far you are convinced that you need to learn at least one ETL tool in order to stay competitive in the SQL/BI profession. I also hope that Microsoft SSIS is your choice.
I’ve worked with two types of people with very different views on ETL tools.
One is to be very skeptical about using an ETL tool over hand coded SQL code. They are even more skeptical when they see I am using stored procedures in a SSIS package. Their argument is that if all can be done in a stored procedure, why use both stored procedures and SSIS. Would it be simpler to just stick to one tool?
Another type tends to completely abandon TSQL code, in favor of data flow tasks only, such as Union All, Merge, Sort, Lookup, Slowly Changing Dimension etc.
To the first group of friends, I hope they have eventually been convinced that an ETL tool is the way to go for ETL work.
Some high level benefits of SSIS:
- visual representation of work
- control flow (Including multiple execution paths, and conditional execution path)
- data transformations from source to target
- event logging
- source and target do not need to be on the same server or same DBMS
- audit details
- package configurations for secure and easy deployment
- memory management
In this blog, I’d like share with you one very important lesson I’ve learned. That is never be afraid to leverage TSQL in SSIS packages.
Why use the Sort data flow task when you can write TSQL code to SORT the data?
Why use the Merge task when you can write TSQL code to JOIN tables?
Why bring too much from the source when you can write TSQL code with proper WHERE clause to limit it to only the data you need?
Why use Derived Column task when you can write TSQL code to derive the column?
Why use data flow at all if you can write a stored procedure that encapsulates all business logics and have it executed from a Execute SQL Task?
Set-based INSERT and UPDATE
This is not just because we want to leverage the SQL Server engine’s computing power, but also for another very good reason. A very simple reason. That is to prefer set-based INSERT and UPDATE over row-by-row in-memory operation.
If you are working with large amount of data, writing TSQL set-based code to do INSERT and UPDATE, can quickly become your only option. In this case, sending large changes into staging will be your first step.
A reader once asked me if using SCD (Slowly Changing Dimension) is the only way for loading dimensions. I hope you have drawn your own conclusion so far. No, it’s is not the only way. It’s not always the best way either.
My blog has many practical tips and best pratices for SQL/BI developers,but I haven’t focused on interview questions for SQL/BI developers so far. This might change in the future. It’s been a challenge for many people to break into or stay competitive in the SQL/BI profession.
I am very lucky to be able to stay in the profession and also stay in the financial industry. It’s been very rewarding for me to share my experience and knowledge through this blog.
Many recruiters do not understand what exactly a SQL/BI Developer does. One thing they assume we don’t do is design. On the contrary, designing from simple table structure to the entire sub-system for staging and ETL is our daily job.
In this blog, I’ll share with you one simple SQL design interview question and the answer that will set you apart from other candidates.
Interview question: what is First Normal Form (1NF)?
In order to give an answer that will earn you an A, we need to relate 1NF to what we do every day first. Memorising answers from hundrands of SQL blogs will not get you very far, because under the pressure of being interviewed by several people, your memory will start to suffer from impairment very soon.
Have you ever created primary keys for your SQL tables? I bet you have. But have you ever asked yourself why do we need to create primary keys? Or have you ever asked yourself a question in an even bigger scope: how do we efficiently organize data in a database?
Here are the simple answers to the above questions:
1. Normalization is the process of efficiently organizing data in a database.
2. There are two goals of the normalization process.
3. One goal is to eliminate redundant data.
4. Another goal is to ensure data dependencies make sense (only storing related data in a table).
5. First normal form (1NF) happens to be the very basic rule for an organized database.
6. The implementation of 1NF principle in DBMS is to create primary key for a SQL table.
Now we know 1NF is implemented as primary key in DBMS. With the above answers, you’ve already received an A. With the additions below, you will for sure get a solid A+.
7. When creating PK for a table, we want accomplish two thins.
8. One is to eliminate duplicative columns from the same table.
9. Another one is to create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).