The next generation of data architectures – where next?
Chief architect at IOTICS addresses how a new generation of data architectures have evolved to overcome their former limitations
Part 2 – Where next for data architectures
In my previous article I looked at what the state of play is for current data architectures. Where the frustrations lie, and why those current data architectures are failing to address some of the new and emergent challenges. This time I’ll address how a new generation of data architectures have evolved to overcome their former limitations and address the challenges of today and tomorrow.
Evolving data architecture to tackle new enterprise’s challenges
First a quick recap, organisations face new challenges. Current data architectures look inwards. The “enterprise” is the boundary delimiting the data architecture. The new challenges are around those applications that need to share data across organisational boundaries – I call them “shared problems”. Solving shared problems requires sharing knowledge safely. What characteristics should data architectures have to tackle such challenges?
Data centric / data fabric
I have already discussed the approach proposed by the proponents of the data centric revolution.
Conceptually, the data centric / data fabric model seems most appropriate. Not only because it promotes separation between application layer and data layer (“applications are ephemeral, data stays” using the words of the proponents of the “data centric revolution”) but also because the infrastructure and architecture supporting the fabric enables data to be efficiently accessed for processing and reuse.
Carrying on the fabric analogy, applications are, in essence, ways to untangle and comb data threads in the fabric to make new clothes and value is created by allowing users to fill up their wardrobes.
Furthermore, if applications are decoupled from the underlying data fabric, it’s far more efficient to ship applications across boundaries and provide interoperability without the need of sharing data.
Enable autonomous interoperability
Automation is key to successfully scale and automation is achieved by enabling autonomous interoperability. In practice, this means that data should be automatically processable by software without manual intervention.
Mark Wharton, IOTICS co-founder, has written a post on why he hates APIs. The gist of his argument is that, so long systems needing to share data require developers to integrate APIs, scaling is limited to the ability to code point to point integrations.
Autonomous interoperability can be enabled by adopting semantics to model data and having such models shared, with bonus points, if the models are machine processable.
Semantic data modelling techniques have been around for a long time. This comparison paper dates 1988. And they have seen a resurgence in the past few years with the advent of the Semantic Web and, more recently, of Graph databases.
Separation between data and metadata
Effective metadata management is fundamental to enable efficient access to data. Without a solid approach to it data can’t really be findable, accessible, interoperable and ultimately reusable (as promoted by the FAIR data principles).
Data and metadata should sit in separate planes to allow diversification of security policies and governance.
If, on one hand, centralisation creates gains by allowing optimization of infrastructure utilisation and economy of scale, on the other hand it creates bottlenecks and single points of failure (either in trust or in infrastructure). This is relevant when privacy, security and trust issues are prominent for the enterprise or consortia.
The data architecture needs to support a hybrid model of centralisation within security contexts to leverage the economy of scale but decentralised in nature to enable formations of consortia where each party can self regulate and manage without interference from other parties or having to rely on a third party to guarantee service.
What constitutes a security context is dependent on the use case; Examples could be: “the enterprise”, “the regulated department”, etc.
One can imagine the architecture built as a set of overlays (much like the Internet): every “security context” is a node of a network of decentralised and intercommunicating services.
Decentralisation of infrastructure and data fabric: fostering creation of consortia with applications connected to security contexts.
Each node has two interfaces: an internal interface for systems to exchange data with each other, and a public interface that securely links nodes to allow remote systems to interoperate securely and selectively.
For those cases when data can’t be shared outside the security context, vetted applications can be “allowed in” to interoperate with other systems.
Decentralised identity and stewardship/governance
Identity should be decentralised so that it’s possible to identify data uniquely and globally. The W3C DiD specification provides the mechanics of how to implement identity in a decentralised fashion; this spec is already supported by many providers.
Governance and, more in general, data stewardship should be decentralised too. Especially in the case of consortia, governance benefits from being localised and reduced in scope. Agreeing on a data model for an enterprise or on access control policies to parts of (meta)data should be left within the security context of the decentralised consortium.
Many projects are running to address this problem. For example, in the UK, within the National Digital Twin programme the IMF looks into providing a framework for information management and, in other areas, Project 13 and The Virtual Energy System.
The ambition is for the architecture to support the integration of heterogeneous data sources, policies and governance frameworks by overlaying technology that on one hand doesn’t disturb the need to deliver value locally but on the other hand, enables the formation of ecosystems where parties exchange value incrementally using concepts and schemas that can evolve over time.
Same shape, different size
By decentralising and enabling selective data sharing over a common semantic model of data, it’s possible to construct ecosystems of heterogeneous parties that cooperate and compete on a level playing field. Parties join and leave independently to participate in the ecosystem by selectively sharing when and whom with.
The data – insight – action loop can then work at a local or global scale.
Data-insight-action loop working at a local or global level for heterogeneous parties
Virtualisation and asset focus view of the data
The notion of “sharing data” is somehow misleading. Sharing data is of limited use if a model underpinning that data isn’t available or if the data is out of context. In data silos, where there’s only one context (the application or service) and producers and consumers share the same model, raw data is more than enough.
But, for broader use cases, meaningful data sharing implies some understanding of the domain and context within which the data is provided. In other words, raw data may not be enough because it can’t be used to derive knowledge: data is produced by real assets (realas in meaningful and relevant for the producer), therefore there must be a way to bring it back and link it to the real asset when it’s consumed.
The architecture should indeed support the ability to virtualise real assets into one or more digital assets.
The asset itself is the model and context is a snapshot of the status of such asset as seen by the application. A consumer can understand the data being shared when it’s linkable to an asset that is meaningful in the domain. This also implies that the asset should be accessible and uniquely identifiable. Consequently, limited access to the data of the asset may provide limited knowledge of the context.
Bringing it all together, each one of these virtual digital assets is the combination of:
- Unique and global identity
- Semantically modelled metadata describing the underlying real asset
- Data, providing a view of the current status of the asset, or historical data sets
- Access control rules and governance policies linked to the asset determining who can access what part of the asset
In other words, this approach makes the real assets FAIR by means of their digital twins.
A data fabric providing access to asset focused view of data, making assets FAIR
State of the art data architectures are geared towards supporting analytics use cases and not well equipped to enable interoperability across boundaries.
Progress has been made to increase efficiency on how data is captured, governed and consumed but the current models fail to respond to the new challenges: need to increase business velocity and sharing data and insights across boundaries to enable collaboration and competition.
A shift in mindset is required where data and assets are at the core of the enterprise without needing to be locked in technologies or vendor specific applications. Data architecture needs to leverage rich semantic data modelling, decentralisation of the infrastructure, a new model of trust. With these characteristics the enterprise is inherently evolvable over time irrespective of the underlying implementation of its various components.
At IOTICS we encourage this mindset shift and we enable it with IOTICSpace, a concrete implementation of the next generation data architecture described in this post. If you’re interested in exploring how to evolve your architecture for the challenges of the future, get in touch.
Join Our Community
We enable the world’s data to interact safely and securely with other data, of all types, in all places, dynamically.