FAIR data interactions and digital twins

Mark Wharton, IOTICS’ co-founder & inventor, looks at how the ‘if’ statement could change the world

Educational 14 Oct 2021 by Mark Wharton

In plays and films, books and music there is often a key moment where everything in the story comes together. Software and data engineers know these moments, when after days of work you get everything together, your code and your data, so you can finally write and run the ‘if’ statement. An “if” statement such as – if river.level > X and rainfall.forecast > Y, and then….

When it comes to rainfall and rivers, the ‘then’ part could involve millions of pounds of damage, weeks of transport disruption and possible loss of life.

The ‘if’ statement is a kind of data interaction. A computer algorithm brings two pieces of data together so they can be compared, and some insight gained. But what those pieces of data are and how they get to the “if” statement is more complex than you might think.

There’s no search engine for data: not publicly and rarely within enterprises. There are attempts at searchability such as data.gov.uk, but they are intended for people, not algorithms. It’s said that data scientists spend at least 50% of their time looking for data rather than looking at data. This epic waste of time is because data is hidden, deliberately or unintentionally, in silos, in datasets, behind APIs or in program-unfriendly formats, such as PDF. This is not findable by machines, but what if it was?

The challenges of access and interoperability are, invariably, linked. A computer may be able to find some data, but can’t understand it. To interoperate with data, it would help if the data had some metadata – indicating that the river level was measured in metres, or that rainfall was measured in millimetres, for example. We now able to Find, Access and Interoperate and the data interaction in the “if” statement is Re-using that data for a new purpose.

These four aspects are the building blocks of the FAIR data principles, conceived by a consortium of leading scientists and organisations to ensure that scientific data sets could be found and used by machines, with minimal human intervention. FAIR stands for Findable, Accessible, Interoperable and Reusable – and it’s going mainstream.

In a FAIR world, computers can find and understand data, but we still can’t program them with that “if” statement when the data is in large datasets. In our flooding scenario, what our algorithm also needs is the river level at a specific location and the rainfall forecast at a different location, probably well upstream from the place where the flood is likely to occur. So, even if our algorithm can find the right dataset, it still needs to know how to run a query against the dataset to find the data it wants.

The granularity of the data is important – channelling the right data at the right time from source to consumer. That’s where digital twins come in. Digital twins are a virtualisation of an asset’s data. The asset itself is a useful level of granularity here. An algorithm needs to choose the appropriate rainfall forecasts and required river levels. Metadata about the assets beyond their location provides essential context. Knowing who operated the asset, who maintained it, would help the algorithm to assign weight to the readings if some operators’ data proved more reliable and accurate than others. Having provenance of the data as actually coming from that twin and the twin really being the one operated by the Environment Agency, for example, builds trust in the output of the algorithm. The exchange of metadata between twins to establish trust and access is our second data interaction.

The final step to get to the ‘if’ statements that can change the world is timeliness. Homeowners won’t appreciate being told on Wednesday that a flood would occur on Tuesday when their houses are already knee-deep in muddy water. The data needs to flow between the twins and the algorithm as close to real time as possible so that the predictions are available in a timely way. This is not just important in our flooding scenario; it’s important in business, where latency between something happening and the business reacting to it can, and frequently does, cost millions.

We have reached a point where we have an algorithm running, exchanging data with digital twins. But what does the algorithm do in the ‘then’ part of the ‘if’ equation? Create a warning on a dashboard? Update a database? Send an email? What if it could share the data back with other digital twins, or create new twins of the likely flood locations and have them share into a growing ecosystem of cooperative twins?

What if the algorithm itself has its own digital twin? It brings greater security, simplifying the model – everything is a twin. The twin of the algorithm interacts with the twins of the data sources. Data interactions are twin interactions, and twin interactions are the exchange of data and metadata between twins. If data and twins of anything, anywhere could securely interact and cooperate – what transformation could we achieve?


Join Our Community

We enable the world’s data to interact safely and securely with other data, of all types, in all places, dynamically.