Scientific questions

How can I identify dead cells

Dead cells generally have high mitochondrial or ribosomal protein content. Using SCHNAPPs those can be identified before and after removing of such genes from the data set.

before removing genes

Set before-Filter variable

Set before-Filter variable

The regular expression in the image above is used to count UMIs in the original non-normalized data before any filters are applied. Then we have this count for the cells that have not been removed and can use this in a 2D plot:

Before filter applied

Before filter applied

Here, we show the UMI count on the x-axis and the before filter on the y-axis. We also devide the UMI count by the before filter column, which emphasises the effect and allows us to clearly identify dying cells. In this case the dying cells fall into clusters 1, 2, and 3. So we can select those clusters and copy/paste the cells in the “cells to be removed field”.

Select cells for copy/paste

Select cells for copy/paste

remove cells

remove cells

Don’t forget to click on apply changes!

Identify cells from violin plot

The cells in the violin plot are not selectable (for the moment). In order to identify those cells one has to use the 2D plotting function, preferably under Co-expression - Selected

Can MARS-Seq data be analyzed?

Yes, additional annotations can be introduced either directly into a SingleCellExperiment object that is loaded in the app as colData, or using a CSV file with corresponding information.

Use

colData(scEx) <- myAdditionalInformation

to set the relevant information.

How to normalize by ribosomal proteins?

Once you have identified the gene names you want to use, this list of genes can be used under Parameters - Normalization - gene_norm:

normalize by ribosomal proteins

normalize by ribosomal proteins

You can get this list by temporarily setting the regular expression for selection of genes to be removed. Then click apply changes.

regular expression for selecting ribosomal/mitochondrial genes

regular expression for selecting ribosomal/mitochondrial genes

Then, in the lower area of the same screen the genes that have been removed are shown:

table of removed genes

table of removed genes

What is the best representation to use to compare gene expression level (UMI/gene) ?

Gene/UMI expression can be visualized in different plots:

  • The panel plot gives an overview of which cells express different genes projected in 2 dimensions. The dimensions can be chosen freely. This allows for a general overview of the regions in the PCA/tSNE/UMAP space where differences are expressed. It can be difficult to compare the expression of individual cells.
Panel plot

Panel plot

  • Using the parameters UmiCountPerGenes and UmiCountPerGenes2 allow plotting the expression of one or more genes (i.e. the sum of all UMIs) that are set with these variables. These two parameters can be found under X/Y when selecting the projections to be used for the X/Y axes. With these parameters, the expression of two sets of genes can be compared and the information of individual cells can be retained. Zooming and coloring allows drilling down and identifying individual cells. Using “more options” these can then be annotated. Following such annotation, this information can be used in other plots as a projection. Those projections are currently binary, i.e. have the values False and True or 0 and 1, indicating if a cell belongs to a group or not.

The figure below shows the combined expression of marker genes for B and T cells in an PBMC experiment. MS4A1 is an mRNA marker for B-cells, whereas cd8a is expressed in T-cells. Note that the first gene name is “B-cells”. This will allow you to annotate the selection. It will give rise to a warning, but that will disappear after a few seconds.

2D plot CD8 vs CD4 cell expression

2D plot CD8 vs CD4 cell expression

How to evaluate if the clusters are biologically relevant?

Please set the speed of the video to your liking. It be too fast… ;)

How to identify cells in a cluster of a Tsne plot and list them ?

Using the 2D plot, select tsne1 and tsne2. Using the mouse select the cells of interest. Check “more options”, then check “show cell names”.

selecting cells

selecting cells

What is the best representation to use to compare gene expression level

Gene/UMI expression can be visualized in different plots:

The panel plot gives an overview which cells express different genes projected in 2 dimensions. The dimensions can be chosen freely. This allows for a general overview of the regions in the PCA/tSNE/UMAP space where differences are expressed. It can be difficult to compare the expression of individual cells.

Using 2D plots, the parameters UmiCountPerGenes and UmiCountPerGenes2 allow plotting the expression of one or more genes (i.e. the sum of all UMIs) that are set with these variables. These two parameters can be found under X/Y when selecting the projections to be used for the X/Y axes. With these parameters, the expression of two sets of genes can be compared and the information of individual cells can be retained. Zooming and coloring allows drilling down and identifying individual cells. Using “more options” these can then be annotated. Following such annotation, this information can be used in other plots as a projection. Those projections are currently binary, i.e. have the values False and True or 0 and 1, indicating if a cell belongs to a group or not.

Can the analysis be performed on a subset of selected cells and how?

Analysis can be narrowed to a selection of cells copying cell names in the tab “Cell selection - Cells to keep, remove others”. The number of cells will be updated in the summary statistics on the sidebar and clicking on “apply changes” will apply cell selection.

What are the possible trajectory inferences approaches that could be used ?

Contributions for trajectory inference already available are SCORPIUS, ElPiGraph.R, and DCA (see GitHub)[https://github.com/baj12/SCHNAPPsContributions]

Can new packages be implemented ?

The software architecture of the application makes it easy for the bioinformatician to integrate new visualizations and analyses. Please see https://github.com/baj12/SCHNAPPsContributions

Common problems

Nothing happens!!!!!

Patience, please. There are quite a few occasions when the user has to wait. Specifically, when opening the app on a fresh start. There are some 100 packages or so to be loaded. This takes time.

Other places that take long waiting times are after loading data (then most calculations will be initialized), changing the parameters for normalization, PCA, … (everything that touches the normalized data and projections).

UMAP is notoriously slow, that is why there is to check box to activate it.

Please see if the processors on your computer related to R are active, this usually means that you should wait.

There is no clean way stopping a running task other than killing it.

Reports take even longer. Wait times of an hour or more are not uncommon. Then after a screen for saving the file appears and Ok is pressed the same time will be used again to redo the report. I haven’t found a way yet to overcome this. Any ideas are welcome.

New windows pop up

When saving files or generating a report Shiny opens a new window and doesn’t close it. You can safely close it after the operation is completed.