Efficient "pythonic" access to FASTA files using pyfaidx

Center for Computational Genomics, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
SoftGenetics, LLC., State College, PA, USA
Eccles Institute of Human Genetics, University of Utah School Of Medicine, Salt Lake City, Utah, USA
DOI
10.7287/peerj.preprints.970v1
Subject Areas
Bioinformatics, Computational Biology
Keywords
fasta, python, bioinformatics, api
Copyright
© 2015 Shirley et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. 2015. Efficient "pythonic" access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1

Abstract

The pyfaidx Python module provides memory and time-efficient indexing, subsetting, and in-place modification of subsequences of FASTA files. pyfaidx provides Python classes that expose a dictionary interface where sequences from an indexed FASTA can be accessed by their header name and then sliced by position without reading the full file into memory. pyfaidx includes an extensive test suite to ensure correct and reproducible behavior. A command-line program (faidx) is also provided as an alternative interface, with significant enhancements to functionality, while maintaining full index file compatibility with samtools. The pyfaidx module is installable from PyPI (https://pypi.python.org/pypi/pyfaidx), and development versions can be found at Github (https://github.com/mdshw5/pyfaidx).

Author Comment

This will be submitted to PeerJ for review.