Home > ETL Best Practices, SSIS Best Practices > Metadata – What Is It (4)? – Extract Metadata from SSIS Packages

Metadata – What Is It (4)? – Extract Metadata from SSIS Packages

In my last blog Metadata – What Is It (3)? – ETL Package/Task Metadata, I put down my notes about the naming conventions I am using for tasks/components/transformations in my current ETL design project.

Once I settled down on the naming conventions, 50% of my metadata task is completed. Another 50% of the effort is to actually extract the metadata I need from my SSIS packages.

There are some discussions on Internet about how to do this. There are commercial products out there too. But for my purpose (document my packages really), a commercial product will be an over kill. Luckily, the DTSX file is just a XML file. I can use simple XQUERY functions to extract what I need out of the dtsx file.

Goal

Extract task names from a SSIS package. Remember all my tasks are “encoded” with my naming conventions. So the name of the task will reveal 50% of the purpose of the task. I’ll complete rest of the metadata manually in a spreadsheet.

Solution

I am a SQL developer, so naturally I want to manipulate the dtsx file in T-SQL.

Before I can manipulate the dtsx file (or any xml file) in T-SQL, I need to load the file into SQL. I can do it in a couple of ways (I am using SQL 2008 for this purpose).

1. I can use OPENROWSET to BULK load the dtsx file into a variable of data type XML

image

DECLARE @PackageXML XML
SET @PackageXML =
(    SELECT    *
    FROM    OPENROWSET
        (    BULK ‘C:\etlEAMPC_EXT_ExtractRawDataSW.dtsx’, SINGLE_CLOB  ) AS xmlData )
SELECT @PackageXML

The result looks like this in SQL Management Studio:

image

SQL Management Studio is one of the best xml viewers. So click on the xml link, the xml file is opened in a new tab, and it looks like this.

image

2. I can use OPENROWSET to BULK load the dtsx file into a SQL table

image

USE tempdb
GO

IF EXISTS (select * from sys.tables where name = N’tblPackageXML’)
        DROP TABLE tblPackageXML;

GO

CREATE TABLE tblPackageXML(
        PackagePath    varchar(900)    NOT NULL PRIMARY KEY
,       PackageXML    XML             NOT NULL
);

INSERT INTO tblPackageXML (PackagePath, PackageXML)
SELECT ‘C:\etlEAMPC_EXT_ExtractRawDataSW.dtsx’, xmlData
FROM  (    SELECT    *
        FROM    OPENROWSET 
        (    BULK ‘C:\etlEAMPC_EXT_ExtractRawDataSW.dtsx’, SINGLE_CLOB)  AS xmlData  ) AS PackageXML (xmlData)
GO

SELECT * FROM tblPackageXML

The result looks like this in SQL Management Studio:

image

3. I can use the xp_cmdshell SQL system procedure to recursively look for all dtsx files on my hard drive, and load them all into a SQL table. I only have 5 SSIS packages, so the first 2 ways are enough for me.
4. Now I can use XQuery to retrieve exactly what I need from the PackageXML column on tblPackageXML

Here I am using the Query function to retrieve only the task names. There are a few things you need to understand in this script.

  • The declare part is needed because the the XML is a typed column (in other words, there is a schema DTS registered for this column). Once DTS namespace is declared, all elements need to be prefixed with DTS:.
  • // indicate that this is the root element. What is inside [@….] are the expressions I am telling Query function what kind of attributes I need or do not need. Here I am telling the Query function to look for (or not look for) under attribute ExecutableType.
  • /DTS:Property indicates that the element I am looking for under DTS:Executable is DTS:Property.
  • Again [@DTS:Name=”ObjectName”] indicates that the only attribute I need is Name=”ObjectName”. 

image

Here is the result from the above query:

image

If I click on XML link, I see this. Notice that now I only see the task names.

image

5. Now I need to create a new table with the extracted data from the XML column. I’ll use the Nodes method to create a new table tblTask where each Task Name becomes a row in that table.

Along with the Nodes function, I need to use the CROSS APPLY statement with the above temp table #task to achieve this.

image

The result looks like this. Notice that each package now expanded to multiple rows with each row representing one task in the package.

image

6. Finally I need to convert XML data type to nvarchar and use REPLACE function to strip off the characters I don’t need.

image

Result:

image 

7. Now I can copy the above result to a spreadsheet and complete the rest of my metadata document manually there.

Continue to my next blog “Metadata – What Is It (5)?”

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: