Home > ETL Best Practices > Metadata – What Is It (3)? – ETL Package/Task Metadata

Metadata – What Is It (3)? – ETL Package/Task Metadata

From my last blog: Metadata – What Is It (2)? – ETL-Generated Metadata

image

In this blog, I’ll put down my notes on the ETL Package/Task Metadata.

ETL Task Metadata

What is exactly ETL task metadata? I define ETL task as the smallest unit in my SSIS package. What does each task do? Each task is really a source-to-target mapping that contains programming code. This can be very cryptic to business users or nontechnical end users. However, source-to-target mapping at task level is crucial for understanding the true data lineage of the data in the data management system. When credibility of the data is in question or its integrity needs to be proven, source-to-target mapping becomes the most important document.

SSIS Task and Component Naming

Before I set out to get task metadata, I want to get my naming convention straight. There are many proposals about the naming in for SSIS tasks and components. Most proposals are based on the naming of tasks and components in the Integration Services. Here are some popular examples.

image

I am personally not in favor of this kind of naming. The visual clue in the SSIS designer is enough to tell me if it’s a sequence container or it’s a data flow task. The abbreviations in the proposed prefixes also lose the meanings to me very soon.

In the data warehouse world, it’s popular to use Dim (Dimension) or Fact as either prefix or suffix to name  SSIS tasks and components.

I particularly like some of the naming proposals on the ETL Toolkit book. On a high level, ETL tasks generally fall into one of three categories, extraction, staging, and target.

image

In my ETL process, I introduced two more categories, AUDIT and VARIABLE.

  • AUDIT. AUD_<table name>. It indicates that the task’s purpose is to create audit entry.
  • VARIABLE. VAR_<variable name>. This is very specific to implementation in SSIS.

Here are a few examples.

  • A package. etlEAMPC_STG_CleanPCAssetSources.dtsx. STG indicates that this package does not touch the target, and it stays in the staging area only.
  • A task. image
  • A sequence container. image

Transformation Naming

Transformations are the components of an ETL task. The ETL Toolkit book recommended
the following naming conventions while building your ETL transformations:

image

image

In my ETL process, in addition to the prefix, I would add the stored procedure name if I am using a stored procedure for transformation.

Continue to my next blog “Metadata – What Is It (4)? – ETL Package/Task Metadata”

Categories: ETL Best Practices

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: