Home > ETL Best Practices > ETL – Two Simultaneous Threads, Planning & Design thread and Data Flow thread

ETL – Two Simultaneous Threads, Planning & Design thread and Data Flow thread

It is suggested in the ETL Toolkit book that ETL team needs to work simultaneously on two thread: the Planning & Design thread and the Data Flow thread.

image

image

1) Planning & Design thread

The first step in the Planning & Design thread is accounting for all the requirements and realities. These include:

  • Business needs
  • Data profiling and other data-source realities
  • Compliance requirements
  • Security requirements
  • Data integration
  • Data latency
  • Archiving and lineage
  • End user delivery interfaces
  • Available development skills
  • Available management skills
  • Legacy licenses

The second step in the first thread is the architecture step. These decisions include:
Hand-coded versus ETL vendor tool

  • Batch versus streaming data flow
  • Horizontal versus vertical task dependency
  • Scheduler automation
  • Exception handling
  • Quality handling
  • Recovery and restart
  • Metadata
  • Security

The third step in the Planning & Design thread is system implementation.This step includes:

  • Hardware
  • Software
  • Coding practices
  • Documentation practices
  • Specific quality checks

The final step sounds like administration, but the design of the test and
release procedures is as important as the more tangible designs of the preceding
two steps. Test and release includes the design of the:

  • Development systems
  • Test systems
  • Production systems
  • Handoff procedures
  • Update propagation approach
  • System snapshoting and rollback procedures
  • Performance tuning

2) Data Flow thread

The first extract step includes:

  • Reading source-data models
  • Connecting to and accessing data
  • Scheduling the source system, intercepting notifications and daemons
  • Capturing changed data
  • Staging the extracted data to disk

The clean step involves:

  • Enforcing column properties
  • Enforcing structure
  • Enforcing data and value rules
  • Enforcing complex business rules
  • Building a metadata foundation to describe data quality
  • Staging the cleaned data to disk

This step is followed closely by the conform step, which includes:

  • Conforming business labels (in dimensions)
  • Conforming business metrics and performance indicators (in fact
    tables)
  • Deduplicating
  • Householding
  • Internationalizing
  • Staging the conformed data to disk

Data delivery from the ETL system includes:

  • Loading flat and snowflaked dimensions
  • Generating time dimensions
  • Loading degenerate dimensions
  • Loading subdimensions
  • Loading types 1, 2, and 3 slowly changing dimensions
  • Conforming dimensions and conforming facts
  • Handling late-arriving dimensions and late-arriving facts
  • Loading multi-valued dimensions
  • Loading ragged hierarchy dimensions
  • Loading text facts in dimensions
  • Running the surrogate key pipeline for fact tables
  • Loading three fundamental fact table grains
  • Loading and updating aggregations
  • Staging the delivered data to disk

The basic four-step data flow is overseen by the operations step. Operations includes:

  • Scheduling
  • Job execution
  • Exception handling
  • Recovery and restart
  • Quality checking
  • Release
  • Support
Categories: ETL Best Practices
  1. jamel FEKI
    January 6, 2011 at 11:37 am

    I would like to receive new issues

    Like

    • Sherry Li 李雨
      January 21, 2011 at 3:21 pm

      Jamel, I’ve added an Email Subscription widget on my blog. If you’d like, you can sign up with your email to you will receive my new posts. Thanks for your interest.

      Like

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: