Metadata – What Is It (3)? – ETL Package/Task Metadata
From my last blog: Metadata – What Is It (2)? – ETL-Generated Metadata
In this blog, I’ll put down my notes on the ETL Package/Task Metadata.
ETL Task Metadata
What is exactly ETL task metadata? I define ETL task as the smallest unit in my SSIS package. What does each task do? Each task is really a source-to-target mapping that contains programming code. This can be very cryptic to business users or nontechnical end users. However, source-to-target mapping at task level is crucial for understanding the true data lineage of the data in the data management system. When credibility of the data is in question or its integrity needs to be proven, source-to-target mapping becomes the most important document.
SSIS Task and Component Naming
Before I set out to get task metadata, I want to get my naming convention straight. There are many proposals about the naming in for SSIS tasks and components. Most proposals are based on the naming of tasks and components in the Integration Services. Here are some popular examples.
I am personally not in favor of this kind of naming. The visual clue in the SSIS designer is enough to tell me if it’s a sequence container or it’s a data flow task. The abbreviations in the proposed prefixes also lose the meanings to me very soon.
In the data warehouse world, it’s popular to use Dim (Dimension) or Fact as either prefix or suffix to name SSIS tasks and components.
I particularly like some of the naming proposals on the ETL Toolkit book. On a high level, ETL tasks generally fall into one of three categories, extraction, staging, and target.
In my ETL process, I introduced two more categories, AUDIT and VARIABLE.
- AUDIT. AUD_<table name>. It indicates that the task’s purpose is to create audit entry.
- VARIABLE. VAR_<variable name>. This is very specific to implementation in SSIS.
Here are a few examples.
- A package. etlEAMPC_STG_CleanPCAssetSources.dtsx. STG indicates that this package does not touch the target, and it stays in the staging area only.
- A task.
- A sequence container.
Transformations are the components of an ETL task. The ETL Toolkit book recommended
the following naming conventions while building your ETL transformations:
In my ETL process, in addition to the prefix, I would add the stored procedure name if I am using a stored procedure for transformation.
Continue to my next blog “Metadata – What Is It (4)? – ETL Package/Task Metadata”