Home > ETL Best Practices > ETL: Daily Refreshing Rate Analysis – Volume manageable?

ETL: Daily Refreshing Rate Analysis – Volume manageable?

After the historical loading of the assets into the Asset Manager, I did an initial analysis to see how much volume need to be refreshed daily. The rate of 80% seemed too high for assets that come from the network auto-discovery tools. So I did one more analysis, excluding certain attributes. The 10% refreshing rate seems to be more reasonable and more manageable now.

The software installation data still needs to be analyzed.

ETL developer’s note:

  • There are several candidates for the comparison key. The last modify date, the sequential number (primary key) from the source, or the natural key. I picked the natural key, i.e. the Device Name. The last modify date from source doe snot give me options to choose which attributes to compare. The sequential number that serves as the primary key from the source is not a good option either. Devices in the network can be dropped from the auto-discovery source, and come back with a different sequential number.
  • Once the decision is made to use the natural key for comparison, the SQL function Checksum comes to the picture naturally. I’ve created a calculated column colChecksum for the tables I need to compare. See details in the next blog.

From: Sherry
Sent: Friday, August 27, 2010 12:32 PM
To: B
Subject: Computer Hardware Daily Refreshing Rate Analysis

My initial analysis included all the attributes, and the refreshing rate seemed very high.

I did one more analysis. Here are my findings so far. The numbers seems to be manageable now. But now I am not clear why some of the attributes would change on a daily basis, such as Serial Number, CPU Core Number, etc.

Analysis Basis:

  • Marimba auto-discovery data;
  • data are cleansed before the analysis (duplicates are removed, key values are standardized, only needed attributes are retrieved etc.);
  • previous date – 8/26; 133,471 assets
  • current date – 8/27; 133,549 assets
  • Device Name is used as the comparison key
  • Certain attributes are excluded in the comparison, such as last logon date, IP address, disk free space etc.
  • Attributes included in the comparison is listed below

Findings:

  • The current date has 247 new assets, less than .2%
  • 169 assets are no longer reporting since the previous date
  • 13,031, about 10% assets have some of the following attributes changed since the previous date. Some attributes contribute to the changes more than others. The breakout is listed below.
    •             Mac
    •             StandardPCBrand
    •             StandardPCModel
    •             SerialNo
    •             DomainName
    •             TcpIpDomain
    •             DefaultGateway
    •             DNS
    •             IPMask
    •             WINS
    •             StandardPCType
    •             StandardPCCategory
    •             StandardPCStatus
    •             OS
    •             OSEdition
    •             OSVersion
    •             OSServiceLevel
    •             OSBuildNumber
    •             OSDirectory
    •             OSLocale
    •             OSInstallDate
    •             BIOS
    •             BIOSSource
    •             BIOSAssetTag
    •             BIOSDate
    •             StandardCPUBrand
    •             StandardCPUType
    •             CPUSpeed
    •             CPUCores
    •             CPUProcessor
    •             CPULogicalProcessor
    •             Memory
    •             DiskStorageMB
    •             username
    •             Logonuser

Changes Breakout by Attributes

  •       DefaultGateway: 3197
  •       Logonuser: 1483
  •       username: 1264
  •       CPUSpeed: 1834 (might have to do with the rounding)
  •       IPMask: 1150
  •       WINS: 2893
  •       DNS: 130
  •       DomainName: 126
  •       TcpIpDomain: 23
  •       OS: 35
  •       BIOS: 22
  •       SerialNo: 34 (not sure why SerialNo would change if it’s still the same PC)
  •       PCModel: 26
  •       TcpIpDomain: 23
  •       CPUBrand: 32
  •       CPUType: 47
  •       CPUCores: 6
Categories: ETL Best Practices
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: