Home > Uncategorized > All you need is Google Internet to do a Data Analyst’s job

All you need is Google Internet to do a Data Analyst’s job

My current boss asked me to have a chat with somebody whom he is considering for a Data Analyst position.

He told me that he had DBA experience a few years ago, and he has done some SQL work through ODBC.

I knew right away that he has not done any analysis work using SQL as a query tool. He has never heard of ETL process for data integration/migration/data warehouse project.

In order to report back to my boss, I had to ask him a few questions about data analysis, SQL and ETL. I started with explaining ETL process, tasks we need to do, tools we would use etc. Then I tried to explain what Data Profiling means, and gave him two specific examples. Then I asked him how he would use SQL as a query tool to “discover” the data pattern.

My good intention seemed to irritate him so much that he told me that “I would not site here babbling about SQL with you. I can do the job and find answers. ” I asked him how would he find answer. The answer is “Google internet”.

The next day he sent me links to two web sites, one is “SQL Interview Questions and Answers”, and one is a discussion site where people were suggesting Cursors for “merging rows in a SQL query”. He also told me what he would do to “merge table to get IP address”:

… what I might also have done would be to (1) create a new record using our rules, (2) delete the other two records, then (3) record the reason for the change in the meta-data / system log. –Either that, or update + commit the new record with the new IP, then log + delete the other one? Depending upon how the row has been indexed, it might not be too bad of a hit … ?

I sent him a very polite email:

Thanks for digging into the problem.

One mystery about SQL is that it’s a set-based query (or analytic) tool, not primarily for record-by-record processing as you would do in C# via the ODBC layer.

This concept is a little hard to understand until you start o understand the set-based functions in SQL, such as max/min, row_number, identity etc. You will see that the functions are applied to the entire record set without you writing code to loop through each record.

This concept leads to another mystery about data analysis, which is to look for data patterns in a given record set, instead of each individual record.

To apply business logics to a  record set, we would not look at each record individually, we would apply the logic to the entire set, then narrow down to the sub-set we need.

If you can keep the above concept in mind, and get some practices, you will be an expert soon. 

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: