Getting Started¶
This chapter gives an introduction into the preprocessing steps of RNA-Seq expression data from different public repositories. It documents how to download data and metadata, setup a conda environment or Docker container to use the scripts written in bash, nextflow, python and R.
Features¶
Download RNA-Seq expression data from repositories
Convert BAM to FASTQ and use nf-core/rnaseq
Download metadata from TCGA, ICGC, GTEx, SRA
Extract metadata into a table in csv format
Merge TPM values from nf-core/rnaseq/stringTieFPKM
Merge raw featureCounts from nf-core/rnaseq/featureCounts
Dimensionality reduction with PCA, t-SNE and UMAP
Batch correction in R with ComBat, CombatSeq, removeBatchEffect
Supvervised Classification Machine Learning: LinearSVM, SVM, RandomForest, MultiLayerPerceptron
Main Workflow Overview¶
Prerequisites¶
See also
Assure nextflow, docker, singularity are installed
Setting-up conda environment¶
Requires
condaRequires
python version 3.8.8Requires
python_scripts/environment.yml
conda env create -f environment.yml
Activate the environment to run the scripts
conda activate python_scripts
Alternative set up with docker container¶
Requires
python_scripts/DockerfileRequires
python_scripts/environment.yml
# folder structure within the container
├── app/
│ ├── tools.py
| ├── ...
├── data/
│ ├── ...
├── Dockerfile
├── environment.yml
└── results/
# copy the code to run in the container to ``app/`` and data to ``data/``
# from within folder containing scripts, Dockerfile and environment.yml
docker build -t <name> .
docker run -it --rm -w <work_dir> -v <host_dir>:<container_dir> <container_name>
# run script from command-line
# note that the conda env is already activated