GenHap: A novel computational method based on genetic algorithms for haplotype assembly

Department of Informatics, Systems and Communication, University of Milan - Bicocca, Milan, Italy
SYSBIO.IT Centre of Systems Biology, Milan, Italy
Institute of Molecular Bioimaging and Physiology, Italian National Research Council, Cefalù (PA), Italy
Institute of Biomedical Technologies, Italian National Research Council, Segrate (MI), Italy
Department of Human and Social Sciences, University of Bergamo, Bergamo, Italy
Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
DOI
10.7287/peerj.preprints.3246v1
Subject Areas
Bioinformatics, Computational Biology, Artificial Intelligence, Distributed and Parallel Computing, Optimization Theory and Computation
Keywords
Haplotyping, Genetic Algorithms, Evolutionary Strategy, Combinatorial Optimization, Next Generation Sequencing, Chromosome Conformation Capture, Bioinformatics, Computational Biology, High Performance Computing
Copyright
© 2017 Tangherloni et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Tangherloni A, Spolaor S, Rundo L, Nobile MS, Merelli I, Cazzaniga P, Besozzi D, Mauri G, Liò P. 2017. GenHap: A novel computational method based on genetic algorithms for haplotype assembly. PeerJ Preprints 5:e3246v1

Abstract

The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composed of randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.

Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.

Author Comment

This is an abstract which has been accepted for the NETTAB 2017 Workshop