ETL Process Outside of HP Asset Manager API or any Software API – The Last Id Dilemma
I bet anybody who has developed an ETL process outside of the target software API has encountered the following issues at some point. In the past, I’ve worked on Sage SalesLogix, and this time, HP Asset Manager. HP AM has it’s own data mapping/loading tool (I am not sure if I want to call it an ETL tool), called Connect-It. Here is an introduction of Connect-It. Ok, it’s an EAI type integration platform.
Here are some of the issues:
1) Performance issue with the connect-It tool: according to HP, we can expect a rate of 10 minutes for loading 5k records. If so, loading 200k records (this is how many assets I need to load initially) will take about 7 hours, and loading 2.5M records (this is how many software installations I need to load) will take 3 and a half days. I don’t think anybody can convince me that the performance is reasonable.
2) Due to the performance issue, we’ve decided to design an ETL process outside of the API, and PK values will be generated outside of the API, and the final data loading will be through a SQL push.
3) This avoids the performance issue that is inherent in the software API, but creates another issue, the Last ID issue. Apparently HP Asset Manager has kept an internal Last ID, which is now out of sync with the actual last used ID. Creating a new record in the AM client will generate a conflicting key error.
We’ve talked about possible solutions, and so far, none looks good.
1) Can we find a way to increase the internal Last ID through some API functions? This seems to be a dead end so far, because the Last Id doesn’t seem to be exposed anywhere, not in any API functions, not within the Connect-It tool either.
2) Can we find a way to increase the internal Last ID through SQL? This also seems to be a dead end so far. None of the tables, stored procedures or functions expose that mysterious Last Id.
3) That leaves us the only option, that is, to use Connect-It to generate the PK values AND also load the data. My boss is aware of this issue. So I guess he can make the final call.
4) As a temporary solution, one of my co-workers suggested that we can start from a high enough ID, rather than the lowest possible seed value, and use it as the seed value to generate all our PK values outside of the API. For example, with an internal Last ID of 400K, we can use 9,000,000 as the seed value for the PK values. It will take a while for AM to reach from 400,000 to 9,000,000 when the conflicting key error will eventually occur. I would take this approach for now for the UAT phase, hoping that this will buy us some time to work out a permanent solution.
I’d love to hear from other people who had dealt with this kind of Last Id issue in a commercial software.