"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Supplemental Information

The consensus map of NMF and K-means methods run on the HSC vs. MPP1 dataset

The columns and rows are samples. The brightness indicates the confidence of the method to assign the samples in the same group.

DOI: 10.7287/peerj.preprints.1839v2/supp-1

Comparison of clustering methods on the mouse dendritic cell scRNA-Seq data

(A) t-SNE two-dimensional scatter-plots. Colors indicate the most favorable labeling that can be assigned to the clustering result generated by each method. The correctly and incorrectly labeled samples are marked by dot (•) and cross (x), respectively. (B) Rand measures of the methods in comparison, before and after t-SNE. Rand measure ranges from 0 to 1, where a higher value indicates a greater clustering accuracy.

DOI: 10.7287/peerj.preprints.1839v2/supp-2

PCA plot of the mouse epithelial cell data set

The groups that are most difficult to separate (E14.5 vs. E16.5) are circled out.

DOI: 10.7287/peerj.preprints.1839v2/supp-3

Characteristics of important genes calling

(A) The kernel density estimation (KDE) plot showing the frequency of log expression values of “important genes” that separate E14.5 vs. E16.5, as detected by the various methods in comparison. (B) KDE plot of frequency of genes appear in the 71 Jackknife runs. For a certain x-value (frequency), a higher y-value (density) means that a higher percentage of genes appear around this frequency among the 71 runs. The blue block is the top 500 genes selected by NMF and the red block is all the genes in the filtered data used by NMF.

DOI: 10.7287/peerj.preprints.1839v2/supp-4

The heatmap of the characteristic genes (E14.5 vs. E16.5) found in common pair-wise by the various methods

The dendrogram at the bottom shows the hierarchical clustering results using the distance measured by the inverse of the number of overlapping genes.

DOI: 10.7287/peerj.preprints.1839v2/supp-5

Using NMF to identify subpopulations in a single glioblastoma tumor from Patient MGH31

(A) The consensus heat map generated from NMF. The two subpopulation clusters are the evident 2 red squares, marked out by number 1 and 2. The brightness indicates the confidence level of two subpopulations. (B) The PCA plot of scRNA-Seq samples from patient MGH31, the discovered subpopulations are coded in red and blue colors. (C) The results of KEGG/BioCarta Pathway enrichment analysis. The line of significance (to the right of which meaning the FDR less than 0.05) is shown. (D) The protein interaction diagram of the KEGG pathway “Pathogenic E. Coli infection”. The proteins coded by the genes detected by NMF are highlighted yellow, with the gene names marked below.

DOI: 10.7287/peerj.preprints.1839v2/supp-6

The FPKM table for HSC vs. MPP1 scRNA-Seq dataset

DOI: 10.7287/peerj.preprints.1839v2/supp-7

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Xun Zhu performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Travers Ching analyzed the data, reviewed drafts of the paper.

Xinghua Pan contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Sherman Weissman contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Lana Garmire conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Data Deposition

The following information was supplied regarding data availability:

The raw data has been supplied as a Supplemental Dataset


This research was supported by grants K01ES025434 awarded by NIEHS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (, P20 COBRE GM103457 awarded by NIH/NIGMS, and Medical Research Grant 14ADVC-64566 from Hawaii Community Foundation to L.X. Garmire. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
By posting this you agree to PeerJ's commenting policies
  Visitors   Views   Downloads