Archive

Archive for November, 2010

ETL Developer

November 22, 2010 Leave a comment

Adding to the job description.

Functional Description:

The extraction, transformation, and load (ETL) developer creates jobs moving data from system to system. Source and target data often will be stored in enterprise-class databases, such as SQL Server, but also include file-based items. Jobs created by the ETL developer will undergo a QA and DBA evaluation prior to moving to production. The ETL developer will be responsible for scheduling, monitoring, and responding to issues resulting from the ETL job execution. Corporate reporting and security standards must be followed. The ETL developer also will be responsible for generating data quality metrics and overall summaries of target data statistics, including their publication to metadata consumers.

ETL developer will assist in SSIS package creation, SQL script creation, stored procedure development, and report development, in support of decision support analysts. Other non-central, but import tasks include Dimension Modeling within Star schema, working with the DBA team, and general data maintenance, reporting, and database-oriented tasks.

Requirements:

  • ETL job scheduling using SQL Server 2005 Agent.
  • Creation of data flows into different targets based on quality and consistency metrics.
  • SQL coding with SQL Server Management Studio.
  • Creation of T-SQL stored procedures.
  • Creation of reports using SQL Server 2005 Reporting Services.
  • Creation of consistent documentation for all development efforts.
  • Facilitate database object code promotion using source control and communication with the DBA team.
  • Work with business users to develop requirements for information delivery requirements.
  • Work with outside vendors to manage import and export of proprietary information in a secure manner.
  • Evaluate products to support IT and Decision Support missions.
  • Accomplish all tasks while meeting information security requirements.

Essential Qualifications:

  • Minimum of five years of general ETL experience using industry-standard tools (e.g., Informatica, DataStage, SQL Server Integration Services).
  • Minimum of two years of ETL development experience using SQL Server 2005 Integration Services.
  • Minimum of one year of experience using .NET programming (can be based on CLR-based development in SQL Server).
  • Demonstrable experience in enterprise application integration planning and execution.
  • Demonstrable experience of defining and meeting data quality parameters.
  • Intimate familiarity with database concepts, including normalization, indexing, physical and logical modeling.
  • Intimate familiarity data warehousing concepts, designs, and industry-standard practices.
  • Familiarity with project management controls and techniques.

Additional items:

  • Familiarity with the Kimball Methods of Data Warehousing techniques and standards.
  • Successfully created
Categories: Uncategorized

SSIS – How to change the default SQL Agent login

November 12, 2010 1 comment

Challenge:
In our production environment, we have a dedicated ETL server, while the target database is on a different server.

    server 1 = ETL server
    Server 2 – production DB server

How do SSIS jobs on server 1 access the target database on server 2?

As an ETL developer and data analyst, I don’t do this every day. So I am writing down my notes here.

Goal:
1) need to run SQL Agent job from server 1 to process data on server 2
2) do not want to have a SQL account to access server 2
3) need to have Windows Authentication account with Integrated Security when accessing server 2

Solution:

  1. create a domain account SVC-SQLAgentAccount in domain myDM (where server 1 and server 2 reside)
  2. add account SVC-SQLAgentAccount as a user to both server 1 and server 2
  3. grant account SVC-SQLAgentAccount appropriate role and access on both server 1 and 2
  4. stop the SQL Agent service, change the service’s login from LocalSystem to myDM\SVC-SQLAgentAccount
  5. restart the SQL Agent service
  6. If the service fails to start, it probably means myDM\SVC-SQLAgentAccount doesn’t have the “run as a service” right on the box
  7. For the job on server 1, set the data source to (use Integrated Security)
    Data Source=Server2;Initial Catalog=myTargetDB;Provider=SQLNCLI.1;Integrated Security=SSPI;Auto Translate=False;
  8. After the job ran successfully, you should see the step 

Executed as user: myDM\SVC-SQLAgentAccount
not   
Executed as user: server1\SYSTEM

Categories: SSIS Best Practices

T-SQL – Concatenate row values in Transact-SQL

November 9, 2010 3 comments

Here is very good blog about how to concatenate row values in Transact-SQL.

I recently needed to create a report for mainframe directors to show all the mainframe assets each director is linked to.

When the number of items is small and known beforehand, the most common approach is to use group by, case statement and max function.

When the number of items is unknown, concatenating row values can be very tricky. Fortunately, TSQL has provided some very elegant ways developers can use. Using Recursive CTE is one way. But my favorite is to use the FOR XML clause with PATH mode, which is available even in SQL 2005.

image

In this example, #temp table contains DirectorID, and DeviceName that each Director is linked to. One Director can be linked to several mainframe assets.

The above query will show one row per DirectorID, with each mainframe asset device name being concatenated together and separated by a comma.

image

Categories: SQL Challenge

SSIS – Staging in Relational Environment or in Raw Files?

November 4, 2010 1 comment

If you’ve ever used SQL Server Integration Services (SSIS), you will have come across raw files. Microsoft introduced them with SSIS as a mechanism for storing data on the local file system. The design of raw files enables data to be written very quickly but, unfortunately, to achieve this performance Microsoft has used a proprietary binary storage format.

Raw files are proprietary sources and destinations in the SSIS data flow that are only accessible through SSIS. The files are binary files that contain all the metadata information for the file (data types and column names) in the file header. Because this file is proprietary to SSIS, it’s the fastest source and destination.

When to use raw files?

I’ve read about using raw files for error storing. One of the big benefits of working with raw files is the ease with which you can add them to your packages. The raw file destination requires you to specify only the name of file to use and the columns you want to store. If you use a text file destination, then you have to build the structure of the file and ensure it makes your data flow. If you use an OLEDB destination, then you have to have a database and also a table created to store the data.

Their ease of configuration makes raw files ideal for storing errors in your data flows. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files. Of cause, you will need to use SSIS to be able to interrogate the errors stored in these raw files.

Should you use raw files for all your staging needs in the ETL process? I doubt that’s a smart choice for the following reasons.

  • Debugging is difficult. You have to rely on the built-in debugging tools in SSIS.
  • Interrogate the data is almost impossible. A few blogs suggested to extract data from sources and stage them in raw files. This is fine if you will never need to interrogate the data.

I am mostly for “staging in relational environment” for the reason that I can profile the data easily, and interrogate the data easily during troubleshooting.

But I do see three areas where staging in the raw file can be a good choice:

  • Store errors as described above
  • Staging for temporary needs where you will not need to interrogate the staged data
  • Breaking up a big extract into more practical files for later loading
Categories: SSIS Best Practices

SSIS – Using system variables in Send Mail Task

November 4, 2010 1 comment

“Send Mail Task” is not the most flexible way to send out emails from a SSIS package. See my previous blog “SSIS – Auditing Strategy Through Package Event Handlers” to see how to use a XML Task and a Script Task to send out HTML emails.

But there are also situations where we just want to use the straightforward Send Mail Task to send out a notification if an error occurs.

Goal

Send out an email notification when error occurred in a SSIS package; Need to include the error description and the error source.

Solution

Step 1: to capture the error, create a Send Mail Task in the OnError Event Handler at the package level.

image

Step 2: In the Send Mail Task, try not to hard code anything, instead, use the Expression window to create expressions for the task properties.

I’ve created expressions for 4 very basic properties for the Send Mail Task, i.e. FromLine, MessageSource, Subject, ToLine.

image

The Subject property captures the machine name, which can tell you which environment the error is from. The MessageSource property captures the error description and the error source. Finally, the ToLine is set to a user variable which holds the email address you want to send the email to.

Because this task is in the OnError even handler, you will only receive the email if your package failed.

If you are like me, and do not write expressions in SSIS package everyday, the syntax can be tricky. Here is the syntax for the MessageSource property:

"Error Desc: " + @[System::ErrorDescription] + " Source Name: " + @[System::SourceName]

Categories: SSIS Best Practices
%d bloggers like this: