Comparison of genome sequences via projection extractor upon virtual mixer

Department of Mathematics, School of Information and Network Engineering, Anhui Science and Technology University, Bengbu, Anhui, People Republic of China
DOI
10.7287/peerj.preprints.27333v1
Subject Areas
Bioinformatics, Computational Biology, Genomics, Mathematical Biology, Data Mining and Machine Learning
Keywords
k-mer, Virtual mixer, Projection extractor, Independent component analysis, Similarity analysis, Sequences comparison
Copyright
© 2018 Yu et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Yu H, Zhang Y, Fang W. 2018. Comparison of genome sequences via projection extractor upon virtual mixer. PeerJ Preprints 6:e27333v1

Abstract

To compare multiple genome sequences, we transform each primary genome sequence into corresponding k-mer-based vectors. According to the principle of independent component analysis (ICA), the operation can be regarded as mixing multiple source genomic signals via several sensors, through which we can obtain the mixed vectors with equal-length from the corresponding genome sequences with different length. However, this mixing operation is performed by counting all the k-mer-based frequencies, instead of using real hardware of sensors. Thus, we name this preprocessing operation as virtual mixer (VM). Using ICA-based transformation, we projected all the vectors upon their independent components to capture the coefficients-based feature vector through the projection extractor (PE), which has been proved to have a property of distance preserving. Then, we used the proposed VMPE model upon three representative real datasets of genome sequence to test the efficiency for the model. The contrastive analysis results indicate that the proposed VMPE model performs well in similarity analysis.

Author Comment

This is a submission to PeerJ for review.