Boolean:	`(bicycle AND helmet) OR (head AND protection)` (always group AND in parenthesis)
Title:	`title:(climate change)`
Author:	`author:("bohr niels" OR "bohr n")` (avoid only full first name)
Phrase:	`"water pump control"` (does not work with wildcards)
Wildcards:	`wom?n pharm*`

Journal article

Improving hazard characterization in microbial risk assessment using next generation sequencing data and machine learning: Predicting clinical outcomes in shigatoxigenic Escherichia coli

In International Journal of Food Microbiology — 2019, Volume 292, pp. 72-82

By Njage, Patrick Murigu Kamau^1,2; Leekitcharoenphon, Pimlapas^1,2; Hald, Tine^1,2

From

National Food Institute, Technical University of Denmark¹

Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark²

Abstract

The ever decreasing cost and increase in throughput of next generation sequencing (NGS) techniques have resulted in a rapid increase in availability of NGS data. Such data have the potential for rapid, reproducible and highly discriminative characterization of pathogens. This provides an opportunity in microbial risk assessment to account for variations in survivability and virulence among strains.

A major challenge towards such attempts remains the highly dimensional nature of genomic data versus the number of isolates. Machine learning-based (ML) predictive risk modelling provides a solution to this "curse of dimensionality" while accounting for individual effects that are dependent on interactions with other genetic and environmental factors.

This pilot study explores the potential of ML in the prediction of health endpoints resulting from shigatoxigenic E. coli (STEC) infection. Accessory genes in amino acid sequences were used as model input to predict and differentiate health outcomes in STEC infections including diarrhea, bloody diarrhea, hemolytic uremic syndrome and their combinations.

Outcomes severity was also distinguished by hospitalization. A matrix of percent similarity between accessory genes and the E. coli genomes was generated and subsequently used as input for ML. The performances of ML algorithms random forest, support vector machine (radial and linear kernel), gradient boosting, and logit boost were compared.

Logit boost was the best model showing an outcome prediction accuracy of 0.75 (95% CI: 0.60, 0.86), an excellent or substantial performance (Kappa = 0.72). Important genetic predictors of riskier STEC clinical outcomes included proteins involved in initial attachment to the host cell, persistence of plasmids or genomic islands, conjugative plasmid transfer and formation of sex pili, regulation of locus of enterocyte effacement expression, post-translational acetylation of proteins, facilitation of the rearrangement or deletion of sections within the pathogenic islands and transport macromolecules across the cell envelope.

We propose further studies are proposed on the proteins with undefined or unclear functionality. One protein family in particular predicted HUS outcome. Toxin-antitoxin systems are potential stress adaptation markers which may mediate environmental persistence of strains in diverse sources. We foresee the application of ML approach to the set-up of real-time online analysis of whole genome sequence data to estimate the human health risk at the population or strain level.

The ML approach is envisaged to support the prediction of more specific STEC clinical endpoints type by inputting isolate sequence data.

Language:	English
Year:	2019
Pages:	72-82
ISSN:	18793460 and 01681605
Types:	Journal article
DOI:	10.1016/j.ijfoodmicro.2018.11.016
ORCIDs:	Njage, Patrick Murigu Kamau , Leekitcharoenphon, Pimlapas and Hald, Tine

Keywords

Hazard characterization Hazard identification Infection outcome Logit boost Risk characterization SDG 3 - Good Health and Well-being STEC Whole genome sequencing

Improving hazard characterization in microbial risk assessment using next generation sequencing data and machine learning: Predicting clinical outcomes in shigatoxigenic Escherichia coli

DTU Library

Address

Shortcuts

Log in?

Improving hazard characterization in microbial risk assessment using next generation sequencing data and machine learning: Predicting clinical outcomes in shigatoxigenic Escherichia coli

DTU Library

Address

Shortcuts