Efficient "pythonic" access to FASTA files using pyfaidx
- Subject Areas
- Bioinformatics, Computational Biology
- fasta, python, bioinformatics, api
- © 2015 Shirley et al.
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Efficient "pythonic" access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1 https://doi.org/10.7287/peerj.preprints.970v1
The pyfaidx Python module provides memory and time-efficient indexing, subsetting, and in-place modification of subsequences of FASTA files. pyfaidx provides Python classes that expose a dictionary interface where sequences from an indexed FASTA can be accessed by their header name and then sliced by position without reading the full file into memory. pyfaidx includes an extensive test suite to ensure correct and reproducible behavior. A command-line program (faidx) is also provided as an alternative interface, with significant enhancements to functionality, while maintaining full index file compatibility with samtools. The pyfaidx module is installable from PyPI (https://pypi.python.org/pypi/pyfaidx), and development versions can be found at Github (https://github.com/mdshw5/pyfaidx).
This will be submitted to PeerJ for review.