Services offered by Data Terra
Data Terra main services
The Data Terra research infrastructure offers services relating to Earth system data. Its objective is to deliver services that are interoperable and inter-disciplinary at all levels.
1) FAIR data discovery, access and stewardship services
Data Terra’s vocation is to harmonize data services across Earth system domains and extend their field of application. Services will be structured around the following main elements: a shared catalogue detailing collections, data and associated services; vocabularies and ontologies guaranteeing that data can be reused; a federation of data warehouses; assignment of permanent identifiers allowing datasets to be cited unambiguously; and a statistics department evaluating data use. Systematic use of machine-actionable data management plans (DMPs) will facilitate and improve data stewardship and notably anticipate requests for resources early enough while enabling governance structures to make the necessary decisions.
2) Data exploitation and routine production services
These services operated by the data hubs aim chiefly to regularly convert observations and measurements into derived data and products for a range of scientific applications, going as far as generating environmental indicators. They interface with and support observation infrastructures under jointly established DMPs. Services proposed will enable sets and series of derived data and products that are structured and harmonized in terms of their description (metadata), format and quality, and may serve as a reference for all users.
3) On-demand analytics and processing services
To meet the need to remotely analyse and process large volumes of diverse data and with the required computing power, host platforms called Earth System Analytics Labs (ESALs) and Virtual Research Environments (VREs) will be developed close to where data are stored. These ESALs and VREs will support predefined processing tools that can be configured to need (e.g. geographic and temporal areas of interest), run in sequence and their results analysed and even backed up. Further, they will offer a Web-type interface to ease guided implementation of algorithms for geostatistical analysis, modelling, image analysis and processing, self-learning methods (classification, machine learning), cartographic data representation (previsualization) or environmental genetics processing, for example. ESALs provide a programming interface aimed at users with programming skills, while VREs offer a graphic interface for defining workflows without necessarily having to code.
4) Tier2 HPC data centres
Data Terra’s IT infrastructure is built around eight Tier2 HPC data and processing centres combining computing and storage capacity dedicated to hosting and exploiting large volumes of data. These data and services centres (DSCs) are attached to the hubs and may be managed by Data Terra’s institutional partners (CNES, Ifremer, BRGM), subject-matter mesocentres (ESPRI/IPSL, S-CAPAD/IPGP, ICARE/University of Lille, IGN), pooled regional mesocentres (GRICAD, Unistra) hosting DSC activities, or national centres (CINES, IDRIS).
Cross-cutting services
Cross-cutting services leverage generic data services available in France, such as long-term archiving for example, consolidating and adapting them where necessary. Such services offer:
- Access to data via standard web services (INSPIRE, OGC, CEOS, etc.)
- Data description (metadata models, shared vocabularies, aligned ontologies)
- A unified vision of distributed storage capacity via an iRODS grid
- The ability to share processing across DSCs, providing interoperability through abstraction of the data access layer and technologies such as containers, openStack or Kubernetes (currently being studied by Data Terra and space agencies within the framework of CEOS/WGISS)
GAIA Data project
The GAIA Data project is dedicated to developing and implementing such operational services.
Data Terra will be building on this project selected for the EquipEx+ call for expressions of interest under the PIA3 future investments programme. Graded A+ and with 21 partners, the project will be funded by the French research agency ANR to develop a distributed data and services infrastructure/platform supported by its own DSCs.