Dead cells generally have high mitochondrial or ribosomal protein content. Using SCHNAPPs those can be identified before and after removing of such genes from the data set.
The regular expression in the image above is used to count UMIs in the original non-normalized data before any filters are applied. Then we have this count for the cells that have not been removed and can use this in a 2D plot:
Here, we show the UMI count on the x-axis and the before filter on the y-axis. We also devide the UMI count by the before filter column, which emphasises the effect and allows us to clearly identify dying cells. In this case the dying cells fall into clusters 1, 2, and 3. So we can select those clusters and copy/paste the cells in the “cells to be removed field”.
Don’t forget to click on apply changes!
The cells in the violin plot are not selectable (for the moment). In order to identify those cells one has to use the 2D plotting function, preferably under Co-expression - Selected
Yes, additional annotations can be introduced either directly into a SingleCellExperiment object that is loaded in the app as colData, or using a CSV file with corresponding information.
Use
colData(scEx) <- myAdditionalInformation
to set the relevant information.
Once you have identified the gene names you want to use, this list of genes can be used under Parameters - Normalization - gene_norm:
You can get this list by temporarily setting the regular expression for selection of genes to be removed. Then click apply changes.
Then, in the lower area of the same screen the genes that have been removed are shown:
Gene/UMI expression can be visualized in different plots:
The figure below shows the combined expression of marker genes for B and T cells in an PBMC experiment. MS4A1 is an mRNA marker for B-cells, whereas cd8a is expressed in T-cells. Note that the first gene name is “B-cells”. This will allow you to annotate the selection. It will give rise to a warning, but that will disappear after a few seconds.
Please set the speed of the video to your liking. It be too fast… ;)
Using the 2D plot, select tsne1 and tsne2. Using the mouse select the cells of interest. Check “more options”, then check “show cell names”.
Gene/UMI expression can be visualized in different plots:
The panel plot gives an overview which cells express different genes projected in 2 dimensions. The dimensions can be chosen freely. This allows for a general overview of the regions in the PCA/tSNE/UMAP space where differences are expressed. It can be difficult to compare the expression of individual cells.
Using 2D plots, the parameters UmiCountPerGenes and UmiCountPerGenes2 allow plotting the expression of one or more genes (i.e. the sum of all UMIs) that are set with these variables. These two parameters can be found under X/Y when selecting the projections to be used for the X/Y axes. With these parameters, the expression of two sets of genes can be compared and the information of individual cells can be retained. Zooming and coloring allows drilling down and identifying individual cells. Using “more options” these can then be annotated. Following such annotation, this information can be used in other plots as a projection. Those projections are currently binary, i.e. have the values False and True or 0 and 1, indicating if a cell belongs to a group or not.
Analysis can be narrowed to a selection of cells copying cell names in the tab “Cell selection - Cells to keep, remove others”. The number of cells will be updated in the summary statistics on the sidebar and clicking on “apply changes” will apply cell selection.
Contributions for trajectory inference already available are SCORPIUS, ElPiGraph.R, and DCA (see GitHub)[https://github.com/baj12/SCHNAPPsContributions]
The software architecture of the application makes it easy for the bioinformatician to integrate new visualizations and analyses. Please see https://github.com/baj12/SCHNAPPsContributions
Patience, please. There are quite a few occasions when the user has to wait. Specifically, when opening the app on a fresh start. There are some 100 packages or so to be loaded. This takes time.
Other places that take long waiting times are after loading data (then most calculations will be initialized), changing the parameters for normalization, PCA, … (everything that touches the normalized data and projections).
UMAP is notoriously slow, that is why there is to check box to activate it.
Please see if the processors on your computer related to R are active, this usually means that you should wait.
There is no clean way stopping a running task other than killing it.
Reports take even longer. Wait times of an hour or more are not uncommon. Then after a screen for saving the file appears and Ok is pressed the same time will be used again to redo the report. I haven’t found a way yet to overcome this. Any ideas are welcome.