Advancing Ecosystem-Based Management through Hybrid Blue-Grey Infrastructures in Marine and Coastal Areas

What Are FAIR Data and Why They Matter

The FAIR data principles (Findable, Accessible, Interoperable, Reusable) are essential for maximizing the utility and impact of research data. The TRANSEATION project exemplifies these principles by integrating blockchain technology to enhance data management and security, fostering collaboration and accelerating scientific discovery across Europe.

By Juan Carlos Sanz González | Marine Scientist | Ocean Data Scientist

30 January, 2025

The Data Value

In today’s world, data is becoming an increasingly valuable asset. As technology advances and the digital landscape expands, the importance of data in driving innovation and decision-making continues to soar. However, the true potential of data can only be unleashed when it is accessible and usable by a wide range of stakeholders. This is where the principles of FAIR data come into play. First introduced in 2016, these principles advocate for data to be Findable, Accessible, Interoperable and Reusable, ensuring that data sets can deliver maximum benefit[1].

The implementation of FAIR data principles has been strongly established in Europe through the Horizon Europe framework. This program mandates the application of FAIR principles to all research projects it funds, aiming to foster an environment where data sharing and collaboration are simplified across borders. This alignment with FAIR principles demonstrates Europe’s commitment to advancing science and innovation responsibly and efficiently, setting a standard for research practices globally.

What Are FAIR Data?

FAIR is an acronym representing a set of principles for data sharing, as described by Wilkinson, M., et al. in 2016. These principles are crucial for effectively making data shareable for both humans and machines, ensuring that data are Findable, Accessible, Interoperable and Reusable.

Findable: Data should be easy to locate by both people and computers, relying on a mandatory description of the metadata that enables the discovery of the datasets. For enhancing data findability, citation, and reuse, having comprehensive metadata is crucial. The more detailed and complete the metadata, the clearer the context of the data becomes, leading to more precise interpretation of the information. Data should be described with a detailed set of metadata in an online repository (archive, catalog, data portal, etc.) so others can find it. Additionally, unique identifiers are essential as they uniquely identify your data and support its citation.

Accessible: Data should be easily accessed with a well-defined license and clear access conditions, whether at the level of metadata or the actual data content. Your (meta)data should be accessible to everyone using the online repository. Not all data needs to be openly available, but if it is accessible, it should be easily downloadable without the need for specialized protocols.

Interoperable: Prepared to be combined with other datasets, by both humans and computer systems. The (meta)data should be readable and understandable by everyone. Data should be compatible with other applications, which requires the use of standard formats, a common language, controlled vocabulary, keywords, etc.

Reusable: Ready to be used for future research and to be further processed using computational methods. The conditions for data re-use should be clearly stated, and data provenance should be fully provided. Besides, ensure your data is understandable by providing detailed documentation which should clarify how to correctly interpret the data by describing the content, outlining the processes applied to the data, and including any additional necessary information. It should also cover the licensing details transparently and explain the methodology behind the data’s creation.

Why Do They Matter?

It is important to talk about what benefit individual scientists get from making data FAIR, as aligning data with FAIR principles demands more effort than simply sharing data among close colleagues.

One advantage is that by making your data available to the broader scientific community, you enable more scientific advancements. Other researchers can incorporate your data into their studies, expanding on your discoveries. However, in today’s era of massive data production, your data could quickly become lost unless it is structured so that computers can locate it through web services based on human-set search parameters. Data also risks becoming obsolete unless it is standardized and annotated clearly enough for anyone to understand its context, meaning, production and manipulation processes, and how it can be reused. Adhering to the FAIR principles helps ensure your data remains useful beyond the specific research or experiment that generated it, thereby amplifying the potential scientific output society can achieve.

Another benefit is that if your data is reused, it will likely be cited, along with any associated publications. Crafting a data paper that details how your data were generated and quality checked, and linking this paper to your dataset, can lead to citations for both your data and the paper, thus boosting your overall citation count. This is particularly beneficial since citations are a key metric of scientific influence.

Lastly there is a significant push within many European countries – and increasingly worldwide – to adopt FAIR data practices. Access to research funding is often contingent on demonstrating robust data management plans, with a growing emphasis on how your data adheres to FAIR and open access principles.

Figure 2. Data FAIR diagram. Source: ARDC (https://ardc.edu.au)

In essence, making your data FAIR can offer several key benefits:

  • Maximizing the utility and potential of your data assets.
  • Increasing the visibility, impact and citation rate of your research.
  • Enhancing the reproducibility and reliability of your findings.
  • Aligning with international standards and approaches.

Fostering new collaborations not just with other researchers, but also with industry, policymakers and broader societal groups.

Who Is Responsible for Making Data FAIR?

The responsibility for adhering to the FAIR data principles is a collaborative effort that involves both the originators of the data and the custodians of data repositories. This multifaceted responsibility ensures that data are optimally organized and maintained to be Findable, Accessible, Interoperable and Reusable.

Firstly, data creators – scientists conducting experiments, field sampling and data analysis – are tasked with several critical responsibilities. They must enrich their data with standardized, comprehensive and precise metadata. Additionally, it is essential for them to publish their data in formats that are common within their community and are non-proprietary, ensuring that the data are accessible and interpretable by a broad audience. They are also responsible for providing detailed provenance information which enhances the data’s utility and trustworthiness. Once these steps are completed, data creators must then make both the data and the accompanying metadata available on an accessible online platform.

Secondly, the online resources, which include articles, catalogues and portals, hold the responsibility of ensuring that data and metadata submitted by creators can be seamlessly integrated into their systems. These platforms must guarantee that both data and metadata are machine-accessible, facilitating efficient retrieval and use by automated systems and researchers alike. By fulfilling these roles, both data creators and repository administrators play integral parts in advancing the global research community’s ability to harness the full potential of data assets. This cooperative approach not only promotes the dissemination of knowledge but also significantly enhances the reproducibility and transparency of scientific research.

Figure 3. TRANSEATION Data Sharing Platform

The data sharing platform is actively being integrated with ERDAPP, a well-established scientific data platform that significantly enhances the accessibility and usability of research data across various fields. This integration is aimed at transforming how data is stored, accessed, and utilized within the scientific community. ERDAPP provides a robust framework for data management, which facilitates collaborative research efforts, fosters data transparency, and supports the replication of scientific studies. By linking our platform to ERDAPP, we are ensuring that all data handled are not only aligned with the FAIR principles but also seamlessly accessible to a broader scientific audience, thereby accelerating scientific discovery and innovation.

Moreover, the blockchain-enabled platform is tailored to support the diverse needs of the various work packages within the TRANSEATION project. Each dataset’s unique code links directly to specific tasks and deliverables, facilitating efficient project management and seamless integration across different research activities. This approach not only streamlines workflow but also fosters synergy among disparate research teams, enhancing collaborative efforts and ensuring that all project components are interconnected.

[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Horizon Europe | Natured Based Solutions | TRANSEATION project

#TranseationProject #EuFunded #NatureBasedSolutions #AquacultureInfrastructures #Horizon2020 #CTN