Identification of high-efficiency 3’GG gRNA motifs in indexed FASTA files with ngg2
- Published
- Accepted
- Subject Areas
- Bioinformatics
- Keywords
- gRNA, motif discovery, python, open-source, CRISPR/Cas9, 3'GG
- Copyright
- © 2015 Roberson
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Identification of high-efficiency 3’GG gRNA motifs in indexed FASTA files with ngg2. PeerJ PrePrints 3:e969v1 https://doi.org/10.7287/peerj.preprints.969v1
Abstract
CRISPR/Cas9 is emerging as one of the most used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3’GG motif, which substantially increases the efficiency of editing at all sites tested. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3’GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I identified more than 24 million single match 3’GG motifs in these reference genomes. Greater than 87% of all protein coding genes in the six reference genomes had at least one overlapping unique 3’GG gRNA site. In particular, more than 96% of mouse and 99% of human protein coding genes have at least one unique, overlapping 3’GG gRNA. These identified sites can be used as a starting point in gRNA design, and the ngg2 tool provides an important ability to identify high-efficiency editing sites in non-model species.
Author Comment
This is the early version of a manuscript detailing the utility of a short python script that will identify gRNA motifs in the form of NGGNGG for increased editing efficiency. This preprint will be submitted for review at PeerJ, and I look forward to any comments to improve the utility and clarity of both the manuscript and the python script.