NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Supplemental Information

Process of merging individual MACS-called peaks

Cartoon shows the process of merging individual MACS-called peaks with the objective of finding approximate locations of time persistent ER-α bindings. In the process MACS-detected time varying peaks from [0], 5, . . . , 320 min time points (0 is optional and by default not included) which co-occur at least twice across time points are merged by union operation to produce the approximate consensus locations of a single binding. The single occurrences of peaks are discarded.

DOI: 10.7287/peerj.preprints.3093v1/supp-1

Joint clustering of PolII and ER with AP

Figure shows the clustering of the joint time course of Pol II and ER-α at enhancers with Affinity Propagation. The clustering involves only the time series which individually possess a sum of at least 200 tags across all time point.

DOI: 10.7287/peerj.preprints.3093v1/supp-2

Histograms of positive and negative features between genes (300bp upstream + gene) and enhancers - all chromosomes

The graphs (a, b, c, d) show positive (green) and negative (yellow) distributions of correlations between time series of 300bp-upstream-extended- gene regions and enhancer bodies for ER-α, PolII, H2AZ and H3K4me3 collected across all 23 chromosomes. The figure (e) shows the distribution of genomic distances between centres of distal enhancers and 300bp-upstream-shifted-TSS of genes. The set of positive and negative pairs was constructed using 300bp- upstream-extended-genes and distal enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-3

Histograms of positive and negative features between genes (1500bp upstream + gene) and enhancers - odd chromosomes

The graphs (a, b, c, d) show positive (green) and negative (yellow) distributions of correlations between time series of 300bp-upstream-extended- gene regions and enhancer bodies for ER-α, PolII, H2AZ and H3K4me3 collected across all odd chromosomes. The figure (e) shows the distribution of genomic distances between centres of distal enhancers and 1500bp-upstream-shifted-TSS of genes. The set of positive and negative pairs was constructed using 1500bp- upstream-extended-genes and distal enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-4

Performance of the enhancer-gene (300bp upstream + gene) model on odd chromosomes - all combinations

Figure shows the comparison of performance of the NB model on odd chromosomes (training data) measured by Precision-TPR and MAP scores. The Precision-TPR curves show the accuracy of the predictions with the highest 10%, 20%, 30% scores i.e. posterior probabilities. The second and the third rows stratify predictions at each of the thresholds into those which take place within domains and those involving inter-domain contacts. The set of positive and negative pairs for the first model was constructed using 300bp-upstream- extended-genes and distal enhancers. The correlation-based attributes of the two models were estimated using signals (time series) aggregated over 300bp- upstream-extended-genes, and distal enhancer bodies. The separation-based feature was estimated from 300bp-upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-5

Performance of the enhancer-gene (300bp upstream + gene) model on even chromosomes - all combinations

Figure shows the comparison of performance of the NB model on even chromosomes (test data) measured by Precision-TPR and MAP scores. The Precision-TPR curves show the accuracy of the predictions with the highest 10%, 20%, 30% scores i.e. posterior probabilities. The second and the third rows stratify predictions at each of the thresholds into those which take place within domains and those involving inter-domain contacts. The set of positive and negative pairs for the first model was constructed using 300bp-upstream- extended-genes and distal enhancers. The correlation-based attributes of the two models were estimated using signals (time series) aggregated over 300bp- upstream-extended-genes, and distal enhancer bodies. The separation-based feature was estimated from 300bp-upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-6

Performance of the enhancer-gene (1500bp upstream + gene) model on odd chromosomes - all combinations

Figure shows the comparison of performance of the NB model on odd chromosomes (training data) measured by Precision-TPR and MAP scores. The Precision-TPR curves show the accuracy of the predictions with the highest 10%, 20%, 30% scores i.e. posterior probabilities. The second and the third rows stratify predictions at each of the thresholds into those which take place within domains and those involving inter-domain contacts. The set of positive and negative pairs for the first model was constructed using 1500bp-upstream- extended-genes and distal enhancers. The correlation-based attributes of the two models were estimated using signals (time series) aggregated over 300bp- upstream-extended-genes, and distal enhancer bodies. The separation-based feature was estimated from 1500bp-upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-7

Performance of the enhancer-gene (1500bp upstream + gene) model on even chromosomes - all combinations

Figure shows the comparison of performance of the NB model on even chromosomes (test data) measured by Precision-TPR and MAP scores. The Precision-TPR curves show the accuracy of the predictions with the highest 10%, 20%, 30% scores i.e. posterior probabilities. The second and the third rows stratify predictions at each of the thresholds into those which take place within domains and those involving inter-domain contacts. The set of positive and negative pairs for the first model was constructed using 1500bp-upstream- extended-genes and distal enhancers. The correlation-based attributes of the two models were estimated using signals (time series) aggregated over 300bp- upstream-extended-genes, and distal enhancer bodies. The separation-based feature was estimated from 1500bp-upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-8

Performance on odd chromosomes for alternative MACS parametrisation; (1e-07, λ local off ) vs (1e-05, λ local on)

The first column of the figure shows the performance of the NB model on all odd chromosomes. The model was trained on the stringent time persistent merged MACS-called peaks (i.e. distal ER-α bindings) from the scan with the p- value of 1e-07 and the local control switched off, in which case the search is done with λ BG . In the second column we see the performance under the alternative peak calling with the p-value of 1e-05 (MACS’ default), no control and the local control flag on. The set of positive and negative pairs for the first model was constructed using 300bp-upstream-extended-genes and distal enhancers. The correlation-based attributes of the model were estimated using pairs of 300bp- upstream-extended-genes, and enhancers (merged distal MACS-called peaks). The separation-based feature was estimated from 300bp-upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-9

Performance on even chromosomes for alternative MACS parametrisation; (1e-07, λ local off ) vs (1e-05, λ local on)

The first column of the figure shows the performance of the NB model on all even chromosomes. The model was trained on the stringent time persistent merged MACS-called peaks (i.e. distal ER-α bindings) from the scan with the p-value of 1e-07 and the local control switched off, in which case the search is done with λ BG . In the second column we see the performance un- der the alternative peak calling with the p-value of 1e-05 (MACS’ default), no control and the local control flag on. The set of positive and negative pairs for the first model was constructed using 300bp-upstream-extended-genes and distal enhancers. The correlation-based attributes of the model were estimated using pairs of 300bp-upstream-extended-genes, and enhancers (merged distal MACS-called peaks). The separation-based feature was estimated from 300bp- upstream-shifted TSS to the centres of the ER-α enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-10

Histograms of positive and negative features between genes (300bp upstream + gene) and enhancers (lambda on, 10e-05) - odd chromosomes

The graphs (a, b, c, d) show positive (green) and negative (yellow) distributions of correlations between time series of 300bp-upstream-extended- gene regions and enhancer bodies (MACS: λ local on, p-value 10e-05) for ER-α, PolII, H2AZ and H3K4me3 collected across all odd chromosomes. The figure (e) shows the distribution of genomic distances between centres of distal enhancers and 300bp-upstream-shifted-TSS of genes. The set of positive and negative pairs was constructed using 300bp-upstream-extended-genes and distal enhancers.

DOI: 10.7287/peerj.preprints.3093v1/supp-11

Complete list of predictions of the model

DOI: 10.7287/peerj.preprints.3093v1/supp-12

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Tomasz Dzida analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper, wrote code implementing the computational method.

Mudassar Iqbal analyzed the data, prepared figures and/or tables, reviewed drafts of the paper.

Iryna Charapitsa performed the experiments, reviewed drafts of the paper.

George Reid conceived and designed the experiments, reviewed drafts of the paper.

Henk Stunnenberg conceived and designed the experiments, reviewed drafts of the paper.

Filomena Matarese performed the experiments, reviewed drafts of the paper.

Antti Honkela conceived and designed the experiments, reviewed drafts of the paper.

Magnus Rattray conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

Data Deposition

The following information was supplied regarding data availability:

Code: https://github.com/ManchesterBioinference/EP_Bayes

Data available from GEO: accession GSM2467201

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2467201

Funding

The work was funded by European ERASysBio+ Initiative Project Systems Approach to Gene Regulation Biology Through Nuclear Receptors (SYNERGY) (Biotechnology and Biological Sciences Research Council Grant BB/I004769/2 (to M.R.), Academy of Finland Grant 135311 (to A.H.), and Bundesministerium fur Bildung und Forschung Grants ERASysBio+ P#134 (to G.R.) and 0315715B (to K.G.). M.R. and K.G. were further supported by European Union Seventh Framework Programme Project RADIANT (Rapid Development and Distribution of Statistical Tools for High-Throughput Sequencing Data) (Grant 305626), and A.H. was further supported by Academy of Finland Grants 252845, 259440, and 251170. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
  Visitors   Views   Downloads