A Few Big Steps in the ETL Process Development
Actually the first step would be to read the following documents:
1) Business Request Specification (BRSs): these are very high level documents, which help to identify various data sources and the scope for the ETL process. In addition, these BRSs also defines, typically,
- primary contact(s) and business analyst(s) for each data source
- data elements and also functionalities that the master data management system needs to map out and replicate
- whether the data elements and also functionalities are from upstream or for the downstream systems
- reporting requirements
2) Ideally I should have access to all the upstream data sources now by working with the technical contact for each data source. A typical serve access matrix document should looks like this.
3) A Global BRS Field Listing document: this document should help to narrow down the scope of the ETL process by identifying only the fields required in the BRSs. A typical Global BRS Field Listing document should looks like this.
4) Field Mapping documents: these documents should further narrow down the scope of the ETL process.
5) Field Value Translation documents: these documents are field value standardization documents. What fields need to be standardized (i.e. from a defined list) and how the translation should be done need to be defined early during the ETL process development.
A typical translation table looks like this.
6) Filtering: the translation table can also serve as a filter, by adding an Exclude indicator. Very often, filters that are based on statuses, or item types need to be defined.
Other important items during the ETL process development include: the unique key(s) from the source systems, how non-standard values should be captured through an exception process.