Single-cell analysis (v0.0.1)

Overview 🎯

This tool identifies differentially expressed genes (DEGs) in single-cell RNA-seq data by comparing experimental conditions. It uses robust pseudobulk analysis and Wilcoxon test methods to accurately detect DEGs across different cell types, ensuring reliable biological insights. _images/overview.png

Inputs 📥

As input for this method a metadata file and a folder with single-cell count data should be provided. A cell-typing model is mandatory if cell types are not provided in the metadata file. All files should be uploaded as generic files.

  • Metadata File (CSV) Sample metadata should be provided in the format of a CSV file. The obligatory columns include:

    • Sample identifiers.

    • An experimental group column named group or condition with the values ‘control’ or ‘experiment’.

    • An optional cell_type column for annotations. If missing, the tool will perform automatic cell typing.

_images/metadata_example.png

  • Folder with Single-cell RNA Sequencing Data The tool accepts two formats:

    • 10X Genomics MTX Format: Requires three files per sample, sharing a common prefix.

      barcodes.tsv.gz features.tsv.gz matrix.mtx.gz

    • CSV Format: A matrix with genes as rows and cells as columns (or vice-versa). The first row/column must contain identifiers.

  • Thresholds for up/down regulated genes A floating pointer number which is used for filtering up and down regulated genes

  • Model for Cell-Typing This is a required input only if your metadata file lacks a cell_type column. Select a pre-trained CellTypist model to automatically annotate cell types.


Workflow ⚙️

The tool follows a sequential workflow from data loading to analysis.

  • Data Loading: Reads the input single-cell data (MTX or CSV) and metadata file.

  • Data Preparation: A quality control pipeline filters low-quality genes and cells, calculates QC metrics (mitochondrial/ribosomal percentages), performs doublet detection with Scrublet, normalizes counts, and log-transforms the data.

  • Cell Typing (Optional): If the cell_type column is missing, the tool uses the selected CellTypist model to predict cell types.

  • Pseudobulk Aggregation: Gene counts are aggregated for all cells belonging to the same sample and cell type, creating pseudobulk profiles.

  • Differential Expression: For each cell type, the tool performs DEG analysis.

    • Primary Method: Uses PyDESeq2 on pseudobulk profiles to compare ‘experiment’ vs ‘control’.

    • Fallback Method: If PyDESeq2 fails (e.g., due to low sample counts), it defaults to Scanpy’s rank_genes_groups (Wilcoxon test) on the original single-cell data.

  • Output Generation: The final DEG results are saved into separate CSV files for each cell type. The tool also generates a filtered csv file and a report.csv file which contains found cell types, number of cells corresponding to each type, and the type of analysis which was used.


Outputs 📤

The tool returns a single folder containing the analysis results.

Inside the zip file there is one CSV file per cell type. The filenames indicate the cell type and the analysis method used (e.g., T_cells_pseudobulk.csv or B_cells_rank_groups.csv). The tool also generates a filtered csv file and a report.csv file which contains found cell types, number of cells corresponding to each type, and the type of analysis which was used.

_images/output_example.png

Each CSV file includes:

  • Gene names

  • Log2 fold changes

  • P-values

  • Adjusted p-values