Archive

Posts Tagged ‘T-SQL’

SQL Contractor #50 – What kind of SQL God (or Goddess) are you?

April 15, 2011 Leave a comment

I recently read an article on SQLCentralServer.com, The Job Posting – Do I really have to be the SQL God? The author was passing on some advices on how to interpret various SQL job postings.

Do you need to be the SQL God/Goddess? You absolutely do, as a SQL consultant/contractor, in order to be a respectable SQL profession and demand a higher rate. But the question is what kind of SQL god/goddess.

The author reviewed a concept that I thought it’s worth mentioning.

For SQL Server, there’s 5 general areas of work: Administration, T-SQL Development, ETL, Reporting, and Architecture.  In most of these cases, you end up blending a few of these areas together, but a solid, 40 hr/week position is going to concentrate on one of these and incorporate a little bit of some of the others.  For example, reporting will need some T-SQL and some ETL.  Architecture requires a basic understanding of the other four.  ETL will probably require a little architecture to build staging structures and some T-SQL to optimize some tasks.

Five general areas of SQL server work, blending a few of these areas together. I will say that I agree with this concept. the longer I am working on the SQL server platform, the more blended it has became. It’s a good thing for me, as a contractor, because it means that I’ve been down many different paths and gained experiences in wide range of areas. It’s also a good thing when it comes to job interview.

However, I always struggle with the question of “what title do I put on my resume”. When I was a full-time employee, I accepted the job title the company gave to me. Although essentially I’ve always been in SQL development, the job titles I received varied so much that people would think they are completely different jobs. Here are some of the job titles I had over the years: Data Analyst, Database Architect, Technical Advisor, Database Analyst, System Developer.   

Now I am on my own, I no longer have a job title (It’s not entirely true. “Contingent worker” is usually attached to my email in the Global Address List.), until it comes to the resume time. What should I put in that blank space underneath my name? It’s been 6 years, and the situation is getting worse. It is panicky time even when I started to think about that blank space. 

Fortunately the companies that hired me as a “Contingent worker” really don’t care about that blank space on my resume.

I’ll take that back. I never left that space blank. Here are some of the words I put in that space:

SQL Developer, Data Analyst, Database Analyst, ETL Developer

Maybe next time, I’ll consider leaving that space blank.

Why truncating a target table is a bad practice in ETL?

March 28, 2011 5 comments

I’ve done ETL design and development in both integration projects and in reporting environment.

In master data integration projects, it’s not hard to distinguish staging area from the target area. In the staging area, truncating tables with old data and loading new data is a pretty common practice. I’ have not seen any truncating tables in the target area. This is a good thing. It means that developers understand the role of staging, and understand that loading data into target area needs to be carefully designed, or “reconciled”, as a consultant called it. Depending on the business rules, we will end up either creating new records in the target, or updating existing records, or deleting obsolete records.

In the reporting environment, however, developers have very different attitude and very different understanding of staging VS. target.

This is a very typical architecture I’ve seen so far. The flaw in this design is that there is no target area, only staging. When developers truncate the table in the staging, the report shows empty data between the time when table is truncated and new data finished loading. Or even worse, the report hangs when the loading takes a long time to finish.

image    

You might argue the chance  of the coincidence that users run the report at the same time when the data is in the middle of refreshing. Based on my experience, if I hear about it once from the business users, then the chance is big, because the possibility for it to happen again is always there.

Well, the above is not the only type of architectural design I’ve seen. Here is another one. There is no physical staging area in this design. The report is accessing the data source directly with T-SQL code with OPENQUERY. The developer argued that this type of design will eliminate the need of maintaining the staging area on the SQL server. That part is true. But it does not eliminate the emails and phone calls we get from the business users when they are so frustrated because the report takes for ever to load the data. Reporting Services routinely report time out when trying to retrieve large amount of data via OPENQUERY.

image  

The third type of architectural design can avoid many of the problems of long-running reports, or reports with empty data. By separating target from staging, and not truncating the target tables, the reports will always look good. 

image

Back to the title of this post, “why truncating target tables is a bad practice in ETL”. I have to confess that I was one of those developers who made such mistakes until I received enough emails and phone calls from our business users.

There is more to discuss on how to move data from staging to target. Stay tuned.

Which one to choose – Foreach Loop Container or WHILE Loop in T-SQL

March 27, 2011 Leave a comment

I’ve written a post SSIS – Use “Foreach Loop Container” and pass a variable to a OLE DB Source in a Data Flow. I have to confess that I don’t use Foreach loop container very often. Actually I’ve only used it once in my entire SQL development life. I know how wonderful they are, and how much other developers like it. Deep down, I am a SQL developer. I still resort to T-SQL whenever I feel a bit dizzy thinking about SSIS packages.

I start to feel dizzy when I need to loop through the Foreach loop container more than 10 times. I think that’s my arbitrary limit. Looping through 10 days means connecting to the data sources 10 times. I bet Integration Services does some tricks on pooling connections. Still I don’t quite like it, especially when the data volume is large.

Here is what I would rather do. I’d bring all 10 days ( or even more, depending on the project) worth of data from my data source. The use WHILE loop in T-SQL to process data for each day.

You might wonder what would drive me to process data  for each day in a WHILE loop. That will be the topic for my next post.

%d bloggers like this: