Connecting data in an intelligent way: a semantic data model developed in BigDataGrapes project

First of all, data management in the field of the vine and wine section is a major challenge. Large amount of heterogeneous data is collected at several stage of the process like in the field, at harvest, during winemaking, sensory analysis or in the lab. Diverse formats are used, and diverse teams are gathering these data. This shows how difficult it is to link these data together.

Semantic data linking: the use of ontologies

In this context, semantic data linking is a way to integrate highly heterogeneous data. Web semantic enables to achieve this goal. The goal of the Semantic Web is to make Internet data machine-readable. It is based on ontologies which are formal specifications that provide sharable and reusable knowledge representations. An ontology contains descriptions of concepts and properties in a domain, relationships between concepts, constraints on how the relationships can be used and individuals as members of concepts. It can also contain links which provide ways to declare relationships between multiple ontology concepts. The use of ontologies enables a common understanding of information, explicit domain assumption and ensures future interoperability with other data.

Semantic data linking in BigDataGrapes project

As part of BigDataGrapes project, to develop data integration process which covers all data from the vine to the wine, INRAE (MISTEA unit) and ONTOTEXT, both partners of BigDataGrapes project, are collaborating on the development of a data integration pipeline which produce a knowledge graph. Indeed, these teams are working on a harmonized semantic data model. The collected data are stored in an RDF database for knowledge graphs.

In order to do this, we use different ontologies. First of all a domain ontology, Agri-Food Experiment Ontology (AFEO) which is composed of 136 concepts which cover various viticulture practices, as well as winemaking operations and products (Buche et al., 2017), has been used to represent domain concepts and the different links (processes) between them.

The modeling of measured observations (data values collected at each vine to wine stage) is achieved with QB (RDF Data Cube Vocabulary) for multidimensional data representation and SOSA (Sensor Observation Sampling Actuator) which is an ontology for describing sensors and their observations. These two previous ontologies are combined with QUDT (Quantities, Units, Dimensions and Data Types Ontologies). Specific vocabularies needed by other project partners are also used in the data model. With these main ontologies, we can obtain a flexible and structured data model which can be used to represent and link all data and steps from the vine to the wine.

Conscusion

As an output of BigDataGrapes project, the new data model designed will be published and accessible. By applying domain knowledge to data, our data model is a functional example for the wine sector and will promote data sharing and reuse to discover knowledge and help companies to become more competitive. In addition, from a research point of view, data integration guided by an ontology can provide information to researchers to address extended research questions and to know which ontologies are recommended to represent their data in the wine and vine sector. It also makes your data more accessible and provide the ability to query the data and ask questions that you have not anticipated while modeling your data.

References

Buche, P., Charnomordic, B., Muljarto, A.R., Neveu, P., Salmon, J.M., Tireau, A. A generic ontological network for Agri-food experiment integration–Application to viticulture and winemaking. Computers and electronics in agriculture, 2017, vol. 140, p. 433-442.

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016.Presentation, HTML, PDF, Video..

Edited by Coraline DAMASIO and Arnaud CHARLEROY, INRAE, France
Image: