Episode 2: How to select the right data for effective predictive analytics

Predictive analytics sound very promising and provide an asset to build a robust proactive strategy and prevent food safety incidents that can negatively affect brand trust and reputation. This innovative technology can assist in easing the food company’s decision-making process, by reducing the stress of uncertainty.

Completely understanding how predictive analytics work and can practically help food safety leaders is still a brainteaser. But our latest short video series is aiming to end this struggle.

BigDataGrapes following with great devotion its plan to achieve the fundamental goal of the project (to help European companies in the vine, wine, and natural cosmetics sectors to use, exploit and benefit from big data tools) continues the short video series, releasing the 3rd video with title "How to select the right data subsets".

In this episode, Giannis Stoitsis, CTO, and partner at Agroknow (Coordinator partner of the BigDataGrapes project), presents the right data subsets, in order to address the questions raised in our previous episode.

Is it possible to predict with high confidence, how many food safety incidents we are going to have for each ingredient category in 2020?

There are some data selection decisions to be made:

  • Are we interested in the global picture, using a very large global dataset to build a generic food safety predictor?
  • Or do we need a data subset that is only associated with specific geographical regions of interest?
  • Furthermore, should we split data according to the product categories that they refer to, removing irrelevant product recalls and rejections, so that they do not influence the prediction?

So, get ready for some serious data preparation, splitting and re-combination work.

This is typically done by splitting the real, historical data into training and testing subsets. The algorithm is parameterized using the training data, then its predictive capabilities are evaluated using the testing data.

How do we generate reliable predictions?

There are several data preparation and combination techniques to ensure that the model will be able to generate reliable predictions. Then, data problems such as the handling of missing or inconsistent data comes to play. Again, there are plenty of techniques to help address issues such as missing data values.

The takeaway to keep in mind is that some serious data processing and management needs to take place before we can deploy algorithms.

How are the AI algorithms used over all these data combinations to deliver a reliable and efficient food safety prediction service? Find out in our next video!

These short video series are a part of the BigDataGrapes project. The project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement number 780751.