Motif clustering with implications for transcription factor interactions
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Molecular Biology, Computational Science
- Keywords
- motif, ChIP-seq, de Buijn sequence, motif similarity, clustering, network
- Copyright
- © 2015 Grau et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Motif clustering with implications for transcription factor interactions. PeerJ PrePrints 3:e1302v1 https://doi.org/10.7287/peerj.preprints.1302v1
Abstract
High-throughput data, for instance ChIP-seq data, measure binding of transcription factors (TFs) or other proteins to DNA and have become a widespread data source for de-novo motif discovery. Often, several ChIP-seq data sets study the same TF under different conditions resulting in several, potentially redundant motifs, which demands for identification and clustering of similar motifs. Here, we propose a refined measure of motif similarity based on the correlation between score profiles on de Bruijn sequences. We demonstrate the utility of the proposed measure in benchmark studies on artificial motifs and motifs discovered from ENCODE ChIP-seq data. We use this measure to cluster motifs discovered from 757 different ENCODE ChIP-seq data sets for 166 TFs and RNA-polymerase II and III. Based on this clustering, we derive a TF interaction network that reflects many known TF-TF interactions, but also reveals novel putative interaction partners.
Author Comment
This work has been presented at the German Conference on Bioinformatics 2015.