The Muddle Without A Middle

How embracing decentralisation is the best way to approach digital ecosystems

Educational 16 Aug 2021 by Mark Wharton

“Eppur si muove” is a phrase attributed to Galileo Galilei. “And still it moves”. He had just finished being tried as a heretic for daring to suggest that the earth was not at the centre of the universe but did, in fact, go around the sun – not the other way around. That his theory explained a lot of “troublesome” observations, such as the planets suddenly reversing their courses in the sky, was lost on the powers that be. The powers wanted the earth to be the centre.

What is it with centralisation? It seems to be a natural human tendency, but the world we live in is almost never centralised. It might have local centres (we’ll come to that), but the earth is not the centre of the universe, the sun might be the centre of the solar system (a local centre), but it’s not the centre of the galaxy, and the black hole at the centre of a galaxy (there’s another one) is not the centre of the universe, and who knows where the centre of the universe is because what we can see is only the observable universe owing to the speed of light and um… Physics.

I’ll cut and paste a paragraph from my previous post “The muddle in the middle” (It saves writing new material…)

“In an RTOS and in DDD application there’s a strong sense of domain – i.e. there’s one model. An RTOS application has to be built so it can be loaded on to, say,  a washing machine, a DDD application will typically run as one executable. In these single models everything can know about everything else. But how does this work when the model domain is chopped up into heterogeneous parts, distributed across system, department, enterprise or geographic boundaries? That messy, real-world muddle where we started.”

In some of these single-domain cases, it’s possible to know everything about everything, data formats, entities, types and the meaning of words… but only in some use cases. Even in the same organisation, different names are used for the same thing, meanings are unclear and sources of data are dispersed. The common solution to this problem is to centralise: install a data lake and dump all the data in there. Sorted! Perhaps, but what if you want to collaborate with others up and downstream in the value chain? Would you want to allow customers and suppliers access to your data lake? Not to mention the problems that consumers have in understanding the data in the lake.

My solution? Don’t store it. Shout it into the void. Wait, what? Bear with me…

Those of you with even a cursory acquaintance with me will know that I’m a strong advocate of Semantic Digital Twins, that stream status updates via feeds. Read my other posts on the advantages of the Digital Twin model regarding the FAIR principles. So when I say “shout it into the void” what I mean is create a digital twin and share your updates. If nobody finds your twin or follows its feeds, so what? Maybe they will, in the future. If you’re going along with the shouting/void paradigm, who are you to say what, when or how they’ll do anything with your twin’s (meta)data?

There’s a few important things implicit in the previous paragraph. 1. There’s somewhere in the void to “put” your twin. 2. once you’ve put your twin in the “somewhere” other people can find it and 3. Your twin has some agency in deciding who or what can find it and follow it (who can hear the shouting).

Let’s look at the spectrum of possible implementations of this paradigm. At one end there’s a central database of all twins and a hub you connect to get their data. At the other end, each twin is one node connected to a massively decentralised network of twins and is responsible only for itself.

I find both ends of the spectrum problematic. The central end is bad because of the “central” database and central hub. Who’s going to run it? How will it be funded? What if it goes down? Isn’t it going to get a bit large, given the projections for the number of potential sources of data in the world? The opposite end isn’t that much better, as it exposes the poor twins to potential DOS attacks and makes searching the most extreme example of federated search I can imagine. The twin will have to respond to all searches itself. The twin would also have to respond to requests for data itself. It would have to remember what other twins were following it itself. How would it find time to do what it was supposed to do, you know, like being a virtual pumping station?

Is there somewhere along the spectrum that is a good compromise? The folks at Go-Fair say “As distributed as possible, as centralised as necessary”. What’s that you say? “Local centralisation”? I was wondering when we were going to get back to that. 

Why don’t we bunch some twins together and get something to “do the admin” for them? Cluster them locally and let the cluster join a peer-to-peer network of clusters, for example on a distributed hash table (DHT). Let the cluster “do the admin” of registering twins, answering search requests, sharing data to other clusters, handling access control, leaving the twins to get on with being pumping stations, or whatever. Aha! Now that’s not a bad idea…

The concept of the local centralisation into clusters provides a lot of other benefits in addition to the “admin”. It scales nicely, for example. Horizontally by adding more clusters; vertically by having a bigger cluster. Its decentralised nature adapts to different topologies: cluster on the edge; cluster in the cloud – both on the same network. Clustering naturally allows for sharding – by ownership (my stuff; your stuff); by function (pumps, pipes, tanks); by geography (twins in Edinburgh; twins in Glasgow). Clusters in consortium ecosystems to allow enterprises to collaborate.

Hold on! Ok, now the void doesn’t look so much like a void, but what about the “shouting” bit? How will someone/something else understand what my twin(s) are about? The answer is in the semantics. You knew I was going to say that, but it’s true. The semantic web technologies were designed to be decentralised. The clue, as ever, is in the name: semantic web.  

But what about cross-domain understanding? (I hear you cry). Different domains will have different concepts. How can they interrelate? Semantics are good at disambiguation: They solve the  “sending goldfish into battle” problem by disambiguating a military:tank from an aquarium:tank; but what about the thornier problem of the same thing being thought of in more than one way? Here’s such a thorny problem: a Train.

A train is surprisingly difficult to define. What passengers think of as a “train” – as in “I’m on the train” is not how the Train Operating Companies (TOCs) see it (they probably think of it as a collection of locomotive units and rolling stock) and their view is distinct from the Network Providers (they probably think of it as a “service” e.g. the 14:56 from CBG to KGX). In our decentralised cluster model, the digital twin of the passenger version would live in one cluster and be related semantically to the digital twin of the TOC twin in their cluster which, in turn, would be linked to another twin in the Network cluster. 

All these clusters might share the abstract concept of “Train” (let’s call this abs:Train) and can all be defined as subclasses of that using semantics like

passenger:Train rdfs:subClassOf abs:Train.

Other similar concepts in the different domains (clusters) can be linked semantically by things like owl:equivalentClass and owl:sameAs. You can read the following semantic “triple”:

passenger:Train owl:sameAs nwp:Service

…as: What passengers call “trains” is the same as what Network Providers call “services”. (Apologies to experts in semantics and the rail industry – I’m just trying to make a point).

 

Phew! Let’s try to bring this all together. (Apologies to Dr Seuss.) We started by saying that the world is a muddle so we should accept the complexity and “model the muddle”. Then we thought that the model had a centre, so we were “modelling the muddle in the middle”. It turns out that the “middle” depends on your point of view. So now we have “modelling my muddle in my middle” but allowing it to interoperate with your model of your muddle in your middle. (Coming soon! The decentralised alternative to data lakes – “The puddle in the model of the muddle in the middle”)

What we’re really talking about are network overlays. You can overlay a network by clustering it for technical reasons, or overlay with semantics for domain interoperability and overlay with dataflows at run time. All of these overlays share one thing in common: There’s no need for a middle. And still it moves…

line2

Join Our Community

We enable the world’s data to interact safely and securely with other data, of all types, in all places, dynamically.