Digital Twins – FAIRest of them all?

How FAIR principles enable Digital Twins to operate in ecosystems

Educational 21 Jun 2021 by Mark Wharton

There are times in your life where other people put into words exactly what you’re thinking but haven’t been able to express clearly, yourself. For me, the FAIR data principles set out, in just 4 words, one of the core principles behind what we had already been building for a few years before the principles were first expounded in March 2016. A consortium of scientists and organizations laid out the 4 main tenets of FAIR in Scientific Data magazine – that data should be:

  1. Findable
  2. Accessible
  3. Interoperable
  4. Reusable

Their goal was to have scientific datasets able to be found and used by machines with minimal human intervention.

What relevance does this have now, 5 years later, in a seemingly unrelated field of Digital Twins? How can these principles help to create collaborative, data-sharing ecosystems up and down supply chains, in complex operational landscapes such as transport and allow servitisation models in multi-party ownership scenarios such as utilities?

Digital Twin is a term that has been abused but now the most widely accepted interpretation is of the term is the digital data twin, where the twin is a virtual representation of the data of something in the real world*. It’s this version of the digital twin that we will discuss in this article.

One of the first uses of the digital-twin approach is to make a “single pane of glass” abstraction of an asset. In reality, the data for the asset will probably be stored in incompatible ways in disparate systems for multiple, unrelated purposes. Focus access of all this data through the twin and then the twin becomes the single point where authorised people, applications – and even other twins – can go to find out about the asset, its current state and possibly to subscribe to its updates. A big advantage of this model is that as the owner of the asset you are in control of what goes in the “pane of glass” and who’s allowed to look at it. You choose what to share and with whom.

The single pane approach is ok for purpose-built applications that have knowledge about the twin’s data built into their logic. For example they may know that the dimensions of the asset are recorded in the twin as centimetres and the weight in kilograms. This in-built knowledge has 3 main disadvantages

  1. It has to be programmed by people
  2. It’s then fixed to that type of twin.
  3. It’s difficult for the parties with whom you want to share the data to understand it.

I’d say that this has only partially addressed the problem (the multiple sources part) but has moved the interpretation of the data into application logic, and if that application logic is in the parties with whom you’re sharing, that’s a big problem.

Let’s imagine the application was data-centric, in that it reacted to data and metadata (e.g., in our previous example, the weight data would “say” it was in kilograms). The application would use the metadata to interpret the data and respond accordingly. Does this sound far-fetched? Well no, as we all use web browsers hundreds of times a day, every day. 

Imagine how the “browser” model would work with digital twin data… You could search for twins and choose which to show and the application would react to their metadata and display their data in the best way. You could show data from more than one twin and compare them. You could write code snippets to do this automatically. You could even take data from multiple twins, run synthesising algorithms on it and publish the results as more twins.

Wait! (SFX vinyl record scrape noise). Search for twins? Choose twins? React to their metadata? How could this be possible? Well it would be, if digital twins were made FAIR. Let’s look at those key verbs one by one:

Search implies that the twins were made findable by their creator.

Choose implies that the twins are accessible to me. (Or not – if I’m not authorised)

React and Compare imply that the data I received was understandable to me, hence interoperable.

Code snippets, synthesis and algorithms all imply that the data can be used for reasons that it was not originally created – hence reusable.

How do you start to make data FAIR? With metadata – data about the data. The folks at Go-FAIR like to write it as “(meta)data” as they believe data and metadata are so intertwined. In the internet world, HTML is used to “mark up” data to tell browsers how to render it. Tags like <table>, <tr>, <td>, etc tell the browser that this is tabular data and to render it so. The FAIR principles don’t stipulate what method is to be used to specify metadata, but they do say that:

“I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation”

There aren’t too many of these languages. RDF and the Semantic Web technologies are the de-facto standard so it makes sense to use them for this job.

The digital twin browser application we imagined above is not the end of the story. The originators of the fair principles had autonomous machine interoperability as one of their goals. Let’s apply this thinking to digital twins in an ecosystem. Twins’ agents (the behavioural part of a twin) could search for twins near or related to them, interact with their data and then maybe drop the connection when they’d moved on. For example the twin of a train could search for nearby twins of pollen count data as it was “moving” – because pollen clogs the filters, but only when the engine is running. Twins of Engines on the train would know when they were running and update their metadata to reflect that they had been affected. Service engineers could look at the twins of the engines in the train to see which one’s filters would need to be changed.

One last thing about FAIR that only struck me recently is that each new letter builds on the foundations of the previous. You can’t reuse if you can’t interoperate; you can’t interoperate if the data is inaccessible; you can’t try to access data if you can’t find it.

Saying “The FAIR principles are for scientific datasets” is like saying “Amazon is for books”. Amazon started with books but branched out into everything. Five years ago, the prescient instigators of the FAIR principles must have had a good idea that the principles are applicable to very many things – including Digital Twin ecosystems. The best principles work like that. They give you yardsticks and guidance but don’t limit where you apply them.


* For example AMRC:,

TechUK & CDBB:

MOD <>


Join Our Community

We enable the world’s data to interact safely and securely with other data, of all types, in all places, dynamically.