Home > ETL Best Practices > Change data capture (CDC) in ETL Process

Change data capture (CDC) in ETL Process

Google “Change data capture” you will get many hits, most of which are about the new CDC features that are new in SQL Server 2008, or in Oracle’s data warehouse solutions.

What about us who must design a manual CDC process for a daily incremental loading where the various data sources do not have the CDC features enabled? I guess the good news is that the team has long ago decided on a daily batch loading strategy rather than an Instantaneous real-time ETL process. 

So, if we are not going to use triggers on source tables or rely on log scanning on source databases, what are our options in terms of the methodology of capturing change data in data sources?

Some suggestions, see below, from Wikipedia. I think the CDC process really consists of two steps, the Extract and Reconciliation steps.

image

1) Step 1: Source Data Extraction – The above discussion, I believe, is really just about Extraction. I extract the source data into a physical storage where the data auditing trace must be maintained. I used two pieces of data in the auditing trace, i.e. the Last Update timestamp from the data sources, and the timestamp of the extraction.

2) Step 2: Data Reconciliation – This is done through the natural key from the data sources, comparing against the initial historical data. I will also need to maintain the data auditing, compliance tracking in this step.

Categories: ETL Best Practices
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: