Productivity Reporting #3 – Back fill transactional data or not? Slowly Changing Dimension (SCD)?
Fact tables that involve employee productivity (which drives the Incentive program) are mostly transactional. The word transactional is not exactly accurate in our world. The granularity in many of data elements is per day, meaning that certain data elements will only store the last value from each day, rather than every changed values (transactional) throughout the day.
This “per day” approach also coincides with our batch ETL process, which is also per day.
However, business has constantly asked us to back fill those per-day transactional data with data that were entered many days later. Here are some scenarios:
1. Managers have forgotten to assign team members to a task until the tasks have been completed.
2. Team members have forgotten to check certain indicators until days after the decisions have been made.
As a developer, we are always happy to comply with business requirements, with a few occasional exceptions. I think this should be one of the exceptions.
Here is the reason. If the transactional data can be updated many days later, imagine what would happen to the record counts. Your record counts (for certain measures) on 5/20 will be slightly different when you count it again one 5/25, if you have back filled the transactional data between 5/20 and 5/25.
Does this issue sound familiar to you? What about the famous Slowly Changing Dimension (SCD) issue in star schema ETL process? Do they have anything in common?
I think this is the same issue as the SCD issue. I need to confess that when I read about the SCD issues, it really didn’t make a lot of sense to me until I have a real business case.
A real business case also helped me to realize that there is no right or wrong solution for the SCD issue, and the solution largely depends on what you can work out with the business.