This is an example of clustering, marker identification and pseudotime inference on a set of skin cells.

Load and Preprocess Data

A matrix of counts with rows as genes and columns as cells must be provided. Gene and cell ids, and [optionally] a vector of labels e.g. cell types are provided in separate text files. Here we load example data from NCBI GEO (GSE67602) originally published in Joost et al 2016.

Remove Spike-in RNA

Apply number of features and exclusion threshold

Compute the Similarity Matrix

Run L2R2 on the processed data, outputting the number of iterations and value of the objective.

Compute distances on the similarity matrix. If the truncated graph has multiple components, join them to improve inter-cluster distance estimation.

Cluster the Cells

Infer cluster number using spectra of the ensemble Laplacian.

In this analysis, we obtain bounds on the number of clusters, where the lower bound is equal to the number of zero eigenvalues and the upper bound is the index of the eigenvalue preceding the largest eigengap. We take this value as the cluster number here.

Factor the similarity matrix such that W = HxH’, using the computed cluster number for the rank of H.

Assign cluster labels to the cells.

Visualize the Clusters

Cells are labeled by cluster membership and plotted on low-dimensional embedding (t-SNE) of the similarity matrix.

As described in Joost et al (2016), Subpopulations including Basal, IFE-DI, IFE-DII, IFE-KI, IFE-KII were identified by standard techniques.

Find cluster marker genes.

Previously, we clustered the cells using L2R2/Sym-NMF, now we analyze the markers of the clusters. We look at the normalized expression values of all genes using pre-selection parameters similar to the pre-clustering selection parameters. The top six markers selected as described in Wang et. al. As expected, markers Krt14, Krt10, and Lor are inferred for clusters overlapping with IHC-based Basal, Differentiated (IFE-DI, IFE-DII), and Keratinized (IFE-KI, IFE-KII) cell annotations.

Pseudotime Visualization

Compute the directional minimum spanning tree on the cluster-cluster graph for visualization.

cluster_predecessors <- GetPredecessors(cluster_ptime$cluster_mst, cluster_ptime$root_cluster)
cluster_dtree <- GetDominatorTree(cluster_predecessors, cluster_ptime$graph_cluster)
PlotLineage(cluster_dtree)

Compute the path-length vector from root, and plot the cells in the previously computed low-dimensional embedding.

Plot Gene Expression Levels in Cells Grouped by Cluster

Violin Plot of Krt14 expression. Note strong expression of the basal compartment:

Violin Plot of Krt10 expression. Note uniformly high expression in the IFE-DI/DII compartments:

Violin Plot of Lor expression. Note strong expression in the terminally differentiated IFE-KI/KII compartments:

Analyze Signaling Mediated by a Ligand-Receptor Family

In order to analyze signaling, we can load in tables which describe a pathay in detail. There are two tables needed, one that specifies all unique ligand receptor combinations and another that specifies receptor/target/up|down combinations. Example tables for the TgfB pathway are provided in extdata/ and loaded here for the current skin data set. Using the ImportPathway function, an inner join over the two tables tabulates all ligand/receptor/target/direction 4-tuples. We then compute the probability of intercellular interaction based on this pathway.

receptor ligand target direction
Tgfbr1 Tgfb1 Zeb2 up
Tgfbr1 Tgfb1 Smad2 up
Tgfbr1 Tgfb1 Wnt4 up
Tgfbr1 Tgfb1 Wnt11 up
Tgfbr1 Tgfb1 Bmp7 up
Tgfbr1 Tgfb1 Sox9 up
Tgfbr1 Tgfb1 Notch1 up
Tgfbr1 Tgfb1 Dnmt3b down
Tgfbr1 Tgfb1 Dnmt1 down
Tgfbr1 Tgfb2 Zeb2 up
Tgfbr1 Tgfb2 Smad2 up
Tgfbr1 Tgfb2 Wnt4 up
Tgfbr1 Tgfb2 Wnt11 up
Tgfbr1 Tgfb2 Bmp7 up
Tgfbr1 Tgfb2 Sox9 up
Tgfbr1 Tgfb2 Notch1 up
Tgfbr1 Tgfb2 Dnmt3b down
Tgfbr1 Tgfb2 Dnmt1 down
Tgfbr2 Tgfb1 Smad2 up
Tgfbr2 Tgfb1 Wnt4 up
Tgfbr2 Tgfb1 Zeb2 up
Tgfbr2 Tgfb1 Sox9 up
Tgfbr2 Tgfb1 Notch1 up
Tgfbr2 Tgfb1 Wnt11 up
Tgfbr2 Tgfb1 Bmp7 up
Tgfbr2 Tgfb1 Dnmt1 down
Tgfbr2 Tgfb1 Dnmt3b down
Tgfbr2 Tgfb2 Smad2 up
Tgfbr2 Tgfb2 Wnt4 up
Tgfbr2 Tgfb2 Zeb2 up
Tgfbr2 Tgfb2 Sox9 up
Tgfbr2 Tgfb2 Notch1 up
Tgfbr2 Tgfb2 Wnt11 up
Tgfbr2 Tgfb2 Bmp7 up
Tgfbr2 Tgfb2 Dnmt1 down
Tgfbr2 Tgfb2 Dnmt3b down

Circos Plot of Intercellular Network

Now plot the previous result as a circos plot on the top five ligand-producing and top five receptor-bearing cells from every cluster. The upper hemisphere of the plot shows receptor-bearing cells. The chords of the plot are colored by the ligand-producing cells in the lower hemisphere.

Circos Plot of Intercluster Network

Directional plot of cluster-to-cluster signaling. The root of the directional chord is raised.

Heatmap of Signaling Marker Expression

Visualize the expression of ligand/receptor and up/down target expression across clusters