AMAS: a fast tool for alignment manipulation and computing of summary statistics

Department of Entomology and Nematology, UC Davis, Davis, United States
DOI
10.7287/peerj.preprints.1355v1
Subject Areas
Computational Biology, Genomics, Statistics
Keywords
phylogenomics, bioinformatics, alignment properties, concatenation
Copyright
© 2015 Borowiec
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Borowiec ML. 2015. AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ PrePrints 3:e1355v1

Abstract

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, and creation of replicate data sets. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It performs better at concatenation and summarizing alignments than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Computing times for all benchmarks performed for this study

DOI: 10.7287/peerj.preprints.1355v1/supp-1