The consensus map of NMF and K-means methods run on the HSC vs. MPP1 dataset
The columns and rows are samples. The brightness indicates the confidence of the method to assign the samples in the same group.
Comparison of clustering methods on the mouse dendritic cell scRNA-Seq data
(A) t-SNE two-dimensional scatter-plots. Colors indicate the most favorable labeling that can be assigned to the clustering result generated by each method. The correctly and incorrectly labeled samples are marked by dot (•) and cross (x), respectively. (B) Rand measures of the methods in comparison, before and after t-SNE. Rand measure ranges from 0 to 1, where a higher value indicates a greater clustering accuracy.
PCA plot of the mouse epithelial cell data set
The groups that are most difficult to separate (E14.5 vs. E16.5) are circled out.
Characteristics of important genes calling
(A) The kernel density estimation (KDE) plot showing the frequency of log expression values of “important genes” that separate E14.5 vs. E16.5, as detected by the various methods in comparison. (B) KDE plot of frequency of genes appear in the 71 Jackknife runs. For a certain x-value (frequency), a higher y-value (density) means that a higher percentage of genes appear around this frequency among the 71 runs. The blue block is the top 500 genes selected by NMF and the red block is all the genes in the filtered data used by NMF.
The heatmap of the characteristic genes (E14.5 vs. E16.5) found in common pair-wise by the various methods
The dendrogram at the bottom shows the hierarchical clustering results using the distance measured by the inverse of the number of overlapping genes.
Using NMF to identify subpopulations in a single glioblastoma tumor from Patient MGH31
(A) The consensus heat map generated from NMF. The two subpopulation clusters are the evident 2 red squares, marked out by number 1 and 2. The brightness indicates the confidence level of two subpopulations. (B) The PCA plot of scRNA-Seq samples from patient MGH31, the discovered subpopulations are coded in red and blue colors. (C) The results of KEGG/BioCarta Pathway enrichment analysis. The line of significance (to the right of which meaning the FDR less than 0.05) is shown. (D) The protein interaction diagram of the KEGG pathway “Pathogenic E. Coli infection”. The proteins coded by the genes detected by NMF are highlighted yellow, with the gene names marked below.