Efficient "pythonic" access to FASTA files using pyfaidx

Matthew D Shirley; Zhaorong Ma; Brent S Pedersen; Sarah J Wheelan

doi:10.7287/peerj.preprints.970v1

Efficient "pythonic" access to FASTA files using pyfaidx

Matthew D Shirley ^1,2, Zhaorong Ma³, Brent S Pedersen⁴, Sarah J Wheelan^1,2

1 Center for Computational Genomics, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA

2 Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, USA

3 SoftGenetics, LLC., State College, PA, USA

4 Eccles Institute of Human Genetics, University of Utah School Of Medicine, Salt Lake City, Utah, USA

DOI: 10.7287/peerj.preprints.970v1

Published: 2015-04-08
Accepted: 2015-04-08

Subject Areas: Bioinformatics, Computational Biology
Keywords: fasta, python, bioinformatics, api

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. 2015. Efficient "pythonic" access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1 https://doi.org/10.7287/peerj.preprints.970v1

Abstract

The pyfaidx Python module provides memory and time-efficient indexing, subsetting, and in-place modification of subsequences of FASTA files. pyfaidx provides Python classes that expose a dictionary interface where sequences from an indexed FASTA can be accessed by their header name and then sliced by position without reading the full file into memory. pyfaidx includes an extensive test suite to ensure correct and reproducible behavior. A command-line program (faidx) is also provided as an alternative interface, with significant enhancements to functionality, while maintaining full index file compatibility with samtools. The pyfaidx module is installable from PyPI (https://pypi.python.org/pypi/pyfaidx), and development versions can be found at Github (https://github.com/mdshw5/pyfaidx).

Author Comment

This will be submitted to PeerJ for review.