Detection and variability analyses of CRISPR-like loci in the H. pylori genome
- Published
- Accepted
- Subject Areas
- Bioinformatics, Evolutionary Studies, Genetics, Genomics
- Keywords
- variability CRISPR-like, VlpC gene, phylogenetic marker., Helicobacter pylori
- Copyright
- © 2018 Garcia-Zea et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Detection and variability analyses of CRISPR-like loci in the H. pylori genome. PeerJ Preprints 6:e27196v1 https://doi.org/10.7287/peerj.preprints.27196v1
Abstract
Helicobacter pylori is a human pathogenic bacterium with a high genomic plasticity. Although the functional CRISPR-Cas system has not been found in its genome, CRISPR like loci have been recently identified. In this work, 53 genomes from different geographical areas are analyzed for the search and analysis of variability of this type of structure. We confirm the presence of a locus that was previously described in the VlpC gene in al lgenomes, and we characterize new CRISPR-like loci in other genomic locations. By studying the variability and gene location of these loci, the evolution and the possible roles of these sequences are discussed. Additionally, the usefulness of this type of sequences as a phylogenetic marker has been demonstrated, associating the different strains by geographical area.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using NC_000915 strains as reference genome (firstline). Repeated direct sequence (DR). Pairwise % Identity in the gene: 92%, CRISPR-like loci 86%.
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using J99 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 92%, CRISPR-like loci 79%. Two large deletions are observed that affect the spacer number one as well as the sequence DR number two. It also highlights the complete deletion of spacer number two.
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using J99 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 91%, CRISPR-like loci 77%. A duplication of the sequence DR number one as of the spacer number one is observed, which is traded in a deletion in the sequence used as a reference
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using J99 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 90%, CRISPR-like loci 85%. it is observed that the spacers number one and number two present different points with deletions.
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using J99 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 90%, CRISPR-like loci 54%. Two duplications of DR number two are observed as well as of spacer number two in the Puno135 strain.
Analysis by groups of the CRISPR-like loci inserted in the VlpC gene
Alignment performed with Muscle software, using J99 strains as reference genome (first line).Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 94%, CRISPR-like loci 85%. Different points are observed with deletions along the CRISPR-like loci
Alignment of the sequences obtained from blastn for the CRISPR-like loci detected in Shi417 and Shi112 strains
The color scale in the alignment indicates the degree of variation in both the gene (hypothetical protein) and its CRISPR-like loci. Dark > Pairwise % Identity, clear < Identity % Pairwise, respectively. Alignment performed with Muscle software, using Shi417 and Shi112 strains as reference genome (first and second line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 72%, CRISPR-like loci 56%. Strains Aklavik86, Aklavik117 and P12 showed the truncated 5 'region.
Phylogenetic tree constructed CRISPR-like detected in the gene coding for a hypothetical protein
A phylogeographic differentiation of CRISPR-like loci is observed. Analysis executed with the MEGA7 software. The evolutionary distance scales in 0.02 Jukes-Cantor model. (A) Group of African and European geographical origin. (B) Amerind geographic group.
Analysis of strains Shi470 and BM012A
(A) The alignment with Mauve revealed that, the Poly-E rich protein gene was in a region near the breaking point of an inversion that affects these strains. (B) Alignment performed with Muscle software. Repeated direct sequence (DR). Continuous line indicates the presence of gaps. In the alignment it is observed that, the differences observed can be explained by the number of DR sequences and spacers in which they differ
Alignment and blastn for the CRISPR-like loci detected in Shi470 and BM012A strains
The color scale in the alignment indicates the degree of variation in both the Poly-E rich protein gene and its CRISPR-like loci. Dark > Pairwise % Identity, clear < Pairwise % Identity, respectively. Alignment performed with Muscle software, using Shi470 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 80%, CRISPR-like loci 60%. The alignment revealed an average location of the CRISPR-like loci, showing a high variability for it
Phylogenetic tree constructed with the CRISPR-like detected in the Poly-E rich
Phylogenetic tree constructed with the 35 CRISPR-like detected in the gene coding for a Poly-E rich protein. A phylogeographic differentiation of CRISPR-like loci is observed. Analysis executed with the MEGA7 software. The evolutionary distance scales is 0.02 Jukes-Cantor model. (A) Group o fAfrican and European geographical origin.(B) Asia geographic group.(C) Amerind group.
Alignment and blastn for the CRISPR1-like loci detected in SJM180 strain
Alignment performed with Muscle software, using SJM180 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaeps. Pairwise % Identity in the gene: 85%, CRISPR-like loci 63%. The alignment revealed an average location of the CRISPR-like loci. In the alignment the degeneration of the CRISPR1-like loci is observed and while the 5 'and 3' regions show a high degree of variability
Phylogenetic tree constructed with the CRISPR1-like detected in the SJM180 strain
Phylogenetic tree constructed with the 39 CRISPR1-like detected in the SJM180 strain with the gene coding for a hypothetical protein. A phylogeographic differentiation of CRISPR-like loci is not observed. Analysis executed with the MEGA7 software.The evolutionary distance scales is 0.01 Jukes-Cantor model
Alignment and blastn for the CRISPR3-like loci detected in SJM180 strain
Alignment performed with Muscle software, using SJM180 strains as reference genome (first line). Repeated direct sequence (DR). Continuous line indicates the presence of gaps. Pairwise % Identity in the gene: 84%, CRISPR-like loci 61%. The alignment revealed an average location of the CRISPR-like loci. In the alignment the degeneration of the CRISPR1-like loci is observed and while the 5 'and 3' regions show a high degree of variability.
Phylogenetic tree constructed with the CRISPR3-like detected in the SJM180 strain
Phylogenetic tree constructed with the 39 CRISPR3-like detected in the SJM180 strain with the gene coding for a hypothetical protein. A phylogeographic differentiation of CRISPR-type loci is not observed. Analysis executed with the MEGA7 software. The evolutionary distance scales in 0.01 model of Jukes-Cantor
Table summary spacer and repeated sequences
Characteristics of the repeated direct sequences (DRs) consensus and of the spacer sequences of the 22 CRISPR-like identified with CRISPRFinder
Analysis CRISPRTarget
Analysis of the spacers identified by CRISPRFinder to determine their similarity with foreign genetic elements.
Blast cDNA for the VlpC gene
Blast analysis with the cDNA for the VlpC gene. Analysis of the cDNA showed that this gene was expressed in 50 of the 52 strains that had this gene. This gene was not expressed only in the South Africa20 and Shi470 strains. The e-value used was 10e-5
Identification of cas domains
The e-value used for the searches was 10e-5 through hmmscan. E-value: domain reliability; c-Evalue: reliability for this particular domain; acc: average probability of the aligned residuals. Measure of reliability of the alignment from 0 to 1, where 1.00 indicates that the alignment is completely reliable