FAIR Digital Objects
The global integrated data space will be populated by standardized, autonomous and persistent entities, which contain the information needed about different kinds of digital objects (data, metadata, documents, software, semantic assertions, etc.), to enable both humans and machines to Find, Access, Interoperate, and Reuse (FAIR) these digital objects in highly efficient and cost-effective ways. These entities are independent of continuously changing technologies and the many different ways that are and in future will be organized and structured. In addition, they have built-in mechanisms to support data sovereignty. All of this will help to manage data in a more sustainable and secure way.
These entities are called FAIR Digital Objects (FDOs). The Digital Object Interfacing Protocol (DOIP) is an extant minimal unifying mechanism that supports the interaction with FDOs, and it can be compared to the effect the TCP/IP protocol had for the Internet. The systematic introduction of FDOs into daily practice will stepwise lead to optimized management and use of digital data without the need to replace existing big software systems such as repositories. Many of the basic specifications for FDOs are already in existence and have been in use for some time. The FDO Forum is currently extending these specifications to make FDOs machine actionable and the German Institute for Standardisation (DIN) is engaged in turning these specifications into standards.
Larger and larger amounts of data are being generated every day. The complexity of the data space is increasing exponentially. According to surveys, it is estimated that already, about 80 percent of the effort in data driven projects is spent on data wrangling rather than on analysis and understanding, and the situation is getting worse. Therefore, there is an urgent need to change practices drastically.
Internet pioneer George Strawn expects that by 2030 we will have built one global integrated data space, all of which will efficiently support those data driven projects in industry, research, and public services that require data to be seamlessly integrated from different sources. His prediction derives from a historical perspective and the belief that intensified discussion about data management standards to overcome the heterogeneity of data will converge quickly. Since the 1950s computers made their way into all areas of life, but they were as isolated as the data stored on them. In the 1990s the slogan “the network is the computer” became popular as the many isolated computers were integrated into a virtual computer space thanks to the unifying TCP/IP standard. Subsequently, additional protocols such as HTTP enabled access to globally distributed documents and other data but not yet in a way that allowed different types of information to be effectively recombined and reused.
An extension of these existing mechanisms to a virtually integrated data space is urgently needed and will be developed. It is now time to address the chal- lenge of efficiently reusing relevant digital data in the pursuit of knowledge. This must include the use of suitable rights management solutions using mechanisms such as blockchains to securely manage FDO transactions. Globally harmonized data infrastructures based on FDO standards will allow the extraction of new know- ledge from large data collections, supporting efforts to maintain a stable society,
a healthy natural environment, and a flourishing data economy.
• Improve the sustainability of the societal and scientific memory and simplify data management by a transparent structuring of the continuously expanding data space into modular hierarchies in a unified space of machine- actionable objects.
• Manage the increasing complexity of the data space by using persistent and inherently typed relations between digital objects and in doing so opening a path to structuring the emerging knowledge clusters and their numerous relations.
• Open the path towards automatic processing of huge amounts of data by associating procedures with types of data.
• Support efficient and persistent public, administrative, and analytic workflows and documentation of data provenance by systematically applied mechanisms, thus achieving a higher degree of transparency and ease for users or citizens.
• Increase the trust of users in data sovereignty by built-in mechanisms to increase data security.
• Safe storage of critical legal documents such as contracts, certificates, and verdicts in FDO repositories, documented in registries, and recorded in transaction registers.
• Data will be semantically categorized. Semantic metrics will quantify the qualitative via physical measurements, defined as higher order data and context correlations between the agent and the data. This will enable true semantic search, as opposed to filtering for combined keywords only.