1. Setting up

1.1. Bioinformatics Tools

We are going to be using the following tools throughout the course:

  • fastqc
  • multiqc
  • trimmomatic
  • bwa
  • samtools (version > 1.0)
  • bcftools (version > 1.0)

The easiest way to install all of the above is through conda as follows:

  1. Download and install the latest version of miniconda:
curl -O -L https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
  1. Add the correct channels in the conda (so that it knows where to look for packages):
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  1. Install the required tools:
conda install -c bioconda fastqc multiqc trimmomatic bwa samtools bcftools

You might also want to download and install IGV, as it may be useful in several visualizations.

1.2. Base Tools

Finally, the following tools are in all likelihood already installed in most major OSs, but you should also check for them:

  • git
  • curl
  • gunzip
  • java

1.3. R / RStudio

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows Install R by downloading and running the correct installer file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select “Run as administrator” instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Linux You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.

1.3.1. Install the required packages

We will install some specific packages as well as all packages that might be needed / useful.

install.packages(c("tidyverse", "ggplot2", "rafalib", "caret", "ggpubr", "GGally", "ROCR", "Boruta", "party", "earth", "mlbench",  "glmnet", "e1071", "randomForest", "neuralnet"), dependencies=c("Depends", "Suggests") );

Also install bioconductor, and some bioconductor-related libraries.

source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite("factoextra")
biocLite("fpc")
biocLite("knitr")
biocLite("kableExtra")
biocLite("TxDb.Hsapiens.UCSC.hg38.knownGene")
biocLite("BSgenome.Hsapiens.UCSC.hg38")
biocLite("DiffBind")
biocLite('MotifDb')
biocLite('methylKit')
biocLite('genomation')
biocLite('ggplot2')
biocLite('TxDb.Mmusculus.UCSC.mm10.knownGene')
biocLite("AnnotationHub")
biocLite("annotatr")
biocLite("bsseq")
biocLite("DSS")

That’s it, all done! You have now all the tools in place!