Two Factor Integration
Two factor integration is a design principle inspired by two factor authentication. The basic concept is the same; use a second independent system to validate the first system. The general idea is to get priorities straight. An integration that moves data around with no way of knowing that the data is actually valid is worst than useless, it is dangerous and can drive significant risk into the business.
Integration developers often overlook the importance of the data because of their technology focus. When I ask developers how they know if the data in a target system is valid, they universally respond with assurances about various error handling and logging systems. And when I ask about data errors or omissions that occur in the endpoints either upstream or downstream of the integration technology, they reply that this is "outside of their scope" or "how can they know if data is actually correct?" or "data validation is an application and governance issue". Occasionally I hear from those with deeper experience that integration platforms are "stateless" and what I am asking about requires "state-fulness". When I ask how they respond to a system outage or disruption that de-synchronizes a large volume of records, their answer is "manually...but this doesn't happen very often". When it does, it's a lot of work to fix.
Business users also have a singular focus too, but is very different from the developers they rely on to build and maintain their data integrations. Business users care about the Data, period. They want to know, "is it all correct?", "is any of it out of sync", "when was the last time is was synced?", "how can I be sure it is in sync?"....etc. Business users know from direct experience that data is super slippery stuff and it will de-sync all by itself without a watchful eye at all times. Moreover, they know that developers don't have effective technical solutions either. When I ask Salesforce admins, for example, how they know if the data is in sync they tell me that their data teams perform manual match-and-compare operations on a daily basis and then notify the technical team. This creates ongoing and expensive administrative overhead that drives questions about the true viability of cloud-based applications that need shared data.
Somewhat unexpectedly most integration vendors don't have much to offer either. They tend to focus on data quality solutions at the recordset level for ensuring semantic consistency or validity against a standard such as postal codes or telephone numbers. Webinars boasting the capability to compress and move big data from point to point abound but when the question of Validation arises, there is a general acknowledgement that more needs to be done to address this issue.
Some database vendors have understood this problem for quite some time and now offer a variety of approaches for matching and comparing data across nodes in distributed databases. But API based cloud applications do not typically expose their databases so pure database solutions are not an option. Moreover, the data models in a distributed application environment will not be the same and this drives an additional requirement for transformations during match-and-compare operations.
The good news is that despite the many challenges, it is possible to build cloud application integrations that deliver a high level of data confidence with a correspondingly lower level of administrative overhead by prioritizing data ahead of technology. The trick is that the architects need to consider all of the elements in the system including the people, data, processes, and technology and how they best fit together to deliver the desired performance outcomes.