August 12, 2013
Posted by on
image copyright vladtenu.com
I have been guilty of severely underestimating the time required to integrate software applications – think order of magnitude. This post lists questions that I would have asked, in hindsight, to flesh out integration estimates, since the more line-items an estimate contains, the higher, and consequently more accurate, the estimate.
These questions originated when developing a greenfield application that communicated with an existing system. There are additional considerations when integrating multiple greenfield or multiple existing systems, but many of the questions in this post still apply. The questions below refer to the greenfield application as “our application”, and the existing application as the “external system”.
- Is the business on the latest version of the external system?
- Is the vendor scheduled to end support the business’ version? If so, when?
- How frequently does the version change?
- Has the business already considered upgrading? If not, they should consider upgrading to prevent integration re-work.
- Is up-to-date, sufficiently detailed technical documentation available for the external system?
- Has a system architect been identified who knows how data flows through the external system and what integration capabilities the system has? Finding the expert can take time.
- Is there a test instance of the external system we can test against?
- Have we been granted sufficient permissions to test against this instance?
- Have we verified our access?
- Is sample data from the external system available?
- Is a sufficiently large set of sample data available to provide complete coverage of all variations that could affect the integration?
- Is the sample data accurate? Specifically, is the sample data human generated, or is it generated by the external system in the same way the system will generate data that will be used by the integration?
- What is the quality level of the data in the external system?
- Does the external system implement validation to ensure data quality? If so, which validation methods does the system use?
- Database constraints?
- Application validation? Do not assume that application-level validation rules are reflected in the data store, since application validation rules can change over time.
- How will our application correlate records with the external system? What unique identifiers will be used?
- Are these identifiers truly unique in the external system, or are there duplicates?
- Does the external system support pushing data to our application when events occur?
- On what interval – real time? Hourly? Daily?
- Is the frequency sufficient for the needs of the integration?
- Will we be responsible for writing code to extract data from the external system’s data store? This introduces significant risk.
- Has the business granted us needed access to the data store?
- How many distinct entities (types of data) will be integrated?
- Will all entities be transmitted in the same manner?
- How many distinct business scenarios will use these entities? Will each scenarios require its own mapping logic?
- What are the temporal requirements of the integration?
- Real time?
- Semi real-time (< 1 minute)?
- On an interval or schedule?
- What interval configurability is required?
- What scheduling flexibility is required?
- Will the integration be required to store any data?
- Do the systems have any different or conflicting data constraints?
- What error handling and reporting features are needed?
- To whom does our application need to display failed records and errors? (choose many)
- Business users?
- The external system vendor?
- Where do errors need to be communicated to the client? (choose many)
- Errors in log files?
- Notifications via email, etc.?
- A work-list of bad records for business to review and take action on? Which actions?
- Will the integration be responsible for storing integrated data in a way that will enable re-submission of failed transactions?
- What throughput is required of the integration?
- What peak data volume does the integration need to support?
- What is the maximum individual transfer size that needs to be supported?
- What security methods are required?
- Will our application authenticate against the external system? What authentication mechanism will be used?
- Will our application have to store credentials?
- Does our application need to store credentials in configuration files? Do those configuration files need to be encrypted?
- Is there any sensitive data involved in the integration (SSNs, birth dates, personal health information, personal financial information, etc.)? This introduces security and liability concerns.
- For testing, will we have access to real data?
- How will we secure the data in our test environment?
- How will health-monitoring be performed once the integration goes live?
- Is any support documentation required? Are instruction for locating and managing failed records needed?
- Will the business require our assistance to verify the integration during user-acceptance testing?
This post is intended to be a starting point, spurring one’s imagination to the range of possible considerations when estimating effort levels in software integration. I still strongly recommend including additional estimate buffer, since Hofstadter’s_Law certainly applies.
Advice: Don’t re-invent the wheel. Integration services like NServiceBus (simple, low-cost, yet powerful), BizTalk (complex, high-cost), and others include out-of-the-box capabilities for publish-subscribe, asynchronous messaging, automatic retries, failed-message re-submission, and health monitoring.