Using a fast clustering method for viral segment lineage determination, applied to the H9 influenza hemagglutinin.

Department of Biomedical Sciences, University of Westminster, London, United Kingdom
Pirbright Institute, Pirbright, Woking, None, United Kingdom
Avian Influenza Group, Pirbright Institute, Pirbright, Woking, None, United Kingdom
DOI
10.7287/peerj.preprints.3166v1
Subject Areas
Bioinformatics, Virology
Keywords
viral lineages, clustering, influenza, H9, hemagglutinin
Copyright
© 2017 Dalby et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Dalby A, Tinworth L, Sealy J, Iqbal M. 2017. Using a fast clustering method for viral segment lineage determination, applied to the H9 influenza hemagglutinin. PeerJ Preprints 5:e3166v1

Abstract

Lineage determination is an important part of the analysis of viral sequence data. Previously this has depended on phylogenetic analysis in order to identify distinct clades within the phylogenetic trees. This method is time consuming and dependent on a set of empirical rules for clade identification. An alternative approach is to use clustering. Clustering is commonly used to identify operational taxonomic units in next generation sequencing data. In this paper we use clustering in order to rapidly identify viral segment lineages and clades without the need for tree construction.

Author Comment

This paper was submitted to Virus Genes in July 2017. This paper represents a clear example of the widespread heterogeneity of influenza subtypes within the H9 hemagglutinin phylogenetic tree. This disagrees with the conventional wisdom used in tree building where only a single subtype is considered and calls into question the WHO guidelines for the H5 nomenclature which assume homogeneity of subtypes within clades.