In the past decade, the amount of data that is produced in the food production chain has increased significantly. The generated data ranges from safety and product specifications of production sources and data generated during the production process to consumer data that is obtained from consumer feedback and market evaluation.
The growth of data is driven by cost-effective improvements in sensory technology and miniaturisation. The costs for determining genomic sequences of microorganisms have for example decreased by a factor of 10,000 in the past decade. Next to that, consumer data are increasingly being generated through social media and advanced algorithms that structure these data into consumer liking and product perception.
In addition, there is a trend in the academic world towards open research and data exchange. The EU has recently decided to reserve €2 billion for making data from EU-sponsored research open to the general public by the “Fair” principles: data should be findable, accessible, interpretable and reusable. In addition, more and more scientific journals are adapting an open-access publishing model, in which the results of cutting edge scientific research become immediately available to the scientific community.
Subsequent with the increase in data production and sharing, the costs for data storage, internet bandwidth and computing power are decreasing fast.
These developments result in large amounts of data that can now be obtained and analysed at affordable cost. Efficient use of this wealth of data in research and product development can provide a competitive advantage for food companies. The key drivers for success in this arena are the following: in the first place, data storage and cleaning of the data so that it can be read and interpreted by computers. It has been estimated that companies may not use up to 60% of their internal data because of improper annotation and storage. Secondly, a clear definition of the research hypothesis or business question is essential. Especially when a large number of data sets are available, the analysis of the data without a clear objective will yield a lot of potentially interesting observations without a clear path for selecting the ones that will lead to actionable knowledge. Thirdly, an efficient prediction-validation cycle is necessary to allow constant updating and refinement of the models by critically analysing the gaps between model derived predictions and experimentally obtained data.
Tackling these issues, at Nizo we are currently applying data analytics in open data access and in optimising food processing.
Because more and more data is available in the public domain in an open-access format, Nizo joined the Odex4All (Open Data Exchange for All) consortium. The Odex4All consortium aims to develop novel algorithms to efficiently integrate and analyse data in over 50 life science databases. By applying these algorithms to combined data on human diseases, nutritional data and the metabolic capacity of the micro-organisms in the human microbiome, novel health benefits of these microorganisms were discovered.
Safety and quality control in-line, real-time analysis of chemical residues and genomic sequences can immediately contribute to a quality assurance of food sources before entering the production process. With proteomics techniques it is now possible to analyse the composition of protein hydrolysates at high resolution. Information on the thousands of small peptides making up these hydrolysates yields product specific fingerprints that can reveal the production sources, proteolytic enzymes and temperature regimens that were used to produces these hydrolysates. Also the real time sequencing of genomes of organisms present in food sources, products and ingredients, combined with high-performance computing clusters analyzing these data, can generate immediate information on the presence of pathogenic or food spoiling bacteria in these food sources.
In processing, raw food sources undergo multiple processing phases. The processing conditions in all of these phases greatly influence the quality of the final product. Systematic monitoring of a large number of processing parameters and relating them to the characteristics of the final product provides numerous leads for process improvement, either in cost reduction and/or consistent product quality. Analysing these processing data, mixed models in which mechanistic knowledge and kinetic details of the underlying reactions are combined with machine learning approaches have proven to be very successful.
Finally, to be successful in this field and go to the next level in cost-effectively developing innovative, safe and healthy products, the creation of teams in which data-scientists work closely together with mathematicians, statistical experts and domain experts is an absolute requirement.
© FoodBev Media Ltd 2024