Interpreting and integrating big data in the life sciences
- Published
- Accepted
- Subject Areas
- Computational Biology, Genomics, Science and Medical Education, Computational Science
- Keywords
- omics, NGS, big data, computational algorithms, command line interface
- Copyright
- © 2019 Mangul
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2019. Interpreting and integrating big data in the life sciences. PeerJ Preprints 7:e27603v2 https://doi.org/10.7287/peerj.preprints.27603v2
Abstract
Recent advances in omics technologies have led to the broad applicability of computational techniques across various domains of life science and medical research. These technologies provide an unprecedented opportunity to collect omics data from hundreds of thousands of individuals and to study gene-disease association without the aid of prior assumptions about the trait biology. Despite the many advantages of modern omics technologies, interpretations of big data produced by such technologies require advanced computational algorithms. Below I outline key challenges that biomedical researches are facing when interpreting and integrating big omics data. I discuss the reproducibility aspect of big data analysis in the life sciences and review current practices in reproducible research. Finally, I explain the skills which biomedical researchers need to acquire in order to independently analyze big omics data.
Author Comment
We have updated the manuscript as follows:
- We have updated figures 1 and 2
- We have added a definition of big data
- We have discussed which datasets are qualified as large datasets