Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity

View article
Bioinformatics and Genomics

Main article text

 

Introduction

Materials and Methods

Dataset

Motivation and problem definition

  • - Sub-Challenge 1 (SC-1): Prediction of viral shedding, i.e., whether the individual is infected or not. A binary outcome to evaluate infection prediction rate. Aims to find out predictors that cause infection.

  • - Sub-Challenge 2 (SC-2): Prediction of symptomatic response to exposure. In other words, predicting whether or not the subject will become symptomatic after exposure. Aims to find out predictors that cause severe symptoms.

  • - Sub-Challenge 3 (SC-3): Continuous-valued prediction of symptom score. Since the discrete-valued symptom score is calculated using the Jackson score, this task includes the direct prediction of the log-transformed version of the Jackson score. Aims to find out predictors that cause severe symptoms.

Prediction algorithms

Hyper-parameter optimization

Data preprocessing

Single time point and experiment (STPE) approach

Average of features (AF) approach

Virus merge (VM) approach

Feature selection

Results

Discussion

Conclusion

Supplemental Information

Hyper parameter space of machine learning algorithms.

DOI: 10.7717/peerj.15552/supp-1

Z-TEST Significance comparison between our best results and the DREAM Challenge results.

DOI: 10.7717/peerj.15552/supp-2

Selected common genes from different experiments.

DOI: 10.7717/peerj.15552/supp-3

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Yunus Emre Işık conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Zafer Aydın conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The data are available at GEO: GSE73072. The matrix files (SOFT, Minimal, etc.) were generated using only the gene information. In our experiment, we used probes rather than genes. Therefore, the data should be regenerated using the correct CDF (annotation file) and raw files (.CEL, files are available on GEO).

Detailed information about the re-generation and the regenerated dataset are available in GitHub: https://github.com/yeisik/respiratory_infection_prediction.

Funding

The authors received no funding for this work.

2 Citations 1,617 Views 98 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more