Comparison of genome sequences via projection extractor upon virtual mixer
Author and article information
Abstract
To compare multiple genome sequences, we transform each primary genome sequence into corresponding k-mer-based vectors. According to the principle of independent component analysis (ICA), the operation can be regarded as mixing multiple source genomic signals via several sensors, through which we can obtain the mixed vectors with equal-length from the corresponding genome sequences with different length. However, this mixing operation is performed by counting all the k-mer-based frequencies, instead of using real hardware of sensors. Thus, we name this preprocessing operation as virtual mixer (VM). Using ICA-based transformation, we projected all the vectors upon their independent components to capture the coefficients-based feature vector through the projection extractor (PE), which has been proved to have a property of distance preserving. Then, we used the proposed VMPE model upon three representative real datasets of genome sequence to test the efficiency for the model. The contrastive analysis results indicate that the proposed VMPE model performs well in similarity analysis.
Cite this as
2018. Comparison of genome sequences via projection extractor upon virtual mixer. PeerJ Preprints 6:e27333v1 https://doi.org/10.7287/peerj.preprints.27333v1Author comment
This is a submission to PeerJ for review.
Sections
Supplemental Information
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Hongjie Yu conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Yuan-Ting Zhang performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables.
Wei Fang performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables.
Funding
This work was supported by grants of Anhui Provincial Natural Science Foundation under 1508085MC55, and the Key Project from Education Department of Anhui Province (No. KJ2013A076). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.