Home > ETL Best Practices > ETL Toolkit– Error Event Table and Audit Dimension

ETL Toolkit– Error Event Table and Audit Dimension

Chapter 4 “Cleaning and Conforming” from Kimball’s The Data Warehouse ETL Toolkit takes about 50 pages, not an extremely long chapter. Here is a warning from the beginning of the chapter.

Please stay with us in this chapter. It is enormously important. This chapter
makes a serious effort to provide specific techniques and structure for an
often amorphous topic. The chapter is long, and you should probably read
it twice, but we think it will reward you with useful guidance for building
the data cleaning and conforming steps of your ETL system.

I have to confess that I read it twice, but never finished reading the entire chapter each time. That doesn’t mean that I totally ignored the data quality issue in my ETL design. Although I didn’t finish reading the entire chapter, I focused on these three sections:


Without the effort and hard work during the data profiling phase (or I called it data discovery phase), cleansing and auditing are not going anywhere. Here is an example from the IT Asset Management system from my previous project.

Network auto-discovery software are installed on servers. Multiple such software can be installed on the same server, collecting duplicate data or conflicting data. Only after lots of hard work of data profiling, we were able to create a strategy to resolve duplicate or conflicting data.

Looking at the Error event table schema suggested from the ETL Toolkit, it doesn’t not take long to come with a Screen Type of “Duplicate Composite Key” as a Screen Type for my data exception table. I called it data exception, instead of error event table.

By the way, the suggested schema here is a generic schema. I’d think that you can design your table any way you want to fit into your specific project.


As for the #2 Cleaning Deliverable “Audit Dimension”, my understanding is that it is just a summary from your data exception screens, with a score for each exception category. I’ve designed an audit table to summarize all the important measures for data exception and also for normal row counts, but I’ve never created scores for them. I guess I really didn’t know how to score them.


  1. Mark Wojciechowicz
    March 28, 2011 at 1:59 am

    Hi Sherry,
    This post may have been at my prompting from my question a few days ago, regarding this chapter from the Kimball ETL — So thank you for outlining these thoughts. In pracitical appliation, I am wondering if you apply the ETL screens with some dynamic SQL, some logic in SSIS or perhaps a script task?

    Also, for creating the Audit dimension, what challenges have you faced in summing up the count of issues with data? That is, if I have a fact table and a related dimension and each have a quality problem related to a specific fact row and dimension key, how difficult has it been to associate all the errors to that particular fact row?

    When you load new fact rows that are related to the dimension key above, do they always get an error count for that data quality issue? When a new quality issue arises for that dimension key, does that mean new rows with that dimension key are getting an increased error count?

    Thank you for your insight,


    • Sherry Li 李雨
      March 29, 2011 at 6:19 am

      Hi Mark,

      For implementing most of the data exception screens, I simply used T-SQL script to write the records into my exception table, called from appropriate places inside the SSIS pacakges. For simple row counts, I would use the row count transformation in a data flow. I prefer not to write any dynamic SQL.

      For the audit dimension, I’ve created a separate SSIS package to read from my exception even table, summarize it, record them in the audit table, and create an email message in HTML format to send to myself everyday.

      The answer to your question about different dimension keys and how to count them is really project specific. In my previous project, I logged many data exceptions, but I started with very minimum auditing report. I would discuss with my supervisor to see what are the most important issues. So my advice would be to start small, don’t overwhelme yourself, and grow your data quality checking and auditing gradually.

      Hope this helps.



  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: