The impact of Docker containers on the performance of genomic pipelines

Comparative Bioinformatics, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Barcelona, Spain
Universitat Pompeu Fabra (UPF), Barcelona, Spain
Bioinformatics Research, National Marrow Donor Program, Minneapolis, Minnesota, United States
DOI
10.7287/peerj.preprints.1171v2
Subject Areas
Bioinformatics, Computational Biology, Genomics, Computational Science
Keywords
workflow, pipelines, docker, virtualisation, Bioinformatics
Copyright
© 2015 Di Tommaso et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. 2015. The impact of Docker containers on the performance of genomic pipelines. PeerJ PrePrints 3:e1171v2

Abstract

Genomic pipelines consist of several pieces of third party software and, because their experimental nature, frequent changes and updates are commonly necessary thus raising serious distribution and reproducibility issues. Docker containers technology offers an ideal solution, as it allows the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.

Author Comment

In this work we present the results of a benchmark we ran on genomic pipelines by using Docker containers technology in order to assess the impact of container virtualization on the performance of bioinformatics tools and data analysis workflow. In the second version of this preprint a minor error in table 1 was fixed (the slow down value for the first experiment is 0.999 instead of 1.001). This work will be submitted to PeerJ for review in the near future.