GenHap: A novel computational method based on genetic algorithms for haplotype assembly
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Artificial Intelligence, Distributed and Parallel Computing, Optimization Theory and Computation
- Keywords
- Haplotyping, Genetic Algorithms, Evolutionary Strategy, Combinatorial Optimization, Next Generation Sequencing, Chromosome Conformation Capture, Bioinformatics, Computational Biology, High Performance Computing
- Copyright
- © 2017 Tangherloni et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. GenHap: A novel computational method based on genetic algorithms for haplotype assembly. PeerJ Preprints 5:e3246v1 https://doi.org/10.7287/peerj.preprints.3246v1
Abstract
The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composed of randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.
Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.
Author Comment
This is an abstract which has been accepted for the NETTAB 2017 Workshop