NGS data analysis

From BITS wiki
Jump to: navigation, search


NGS.png

This wiki page is dedicated to the series of trainings that will lead you through the various workflows for the analysis of next generation sequencing data.
Have fun solving the exercises!


[ Main_Page ]

Technical.png Because most of you have used or will use the Illumina platform to generate their data, we will use Illumina data sets in all exercises

 

Training 1: Introduction to the analysis of NGS data

Periodically repeated Sessions (Janick Mathys)

Slides

Exercises

This training gives you the background knowledge you need to follow the more advanced trainings on variant analysis, RNA-Seq and ChIP-Seq.

Download the data sets for this training:

Now you can try the exercises.

Archive

FAQ

Q&A added during the intro to NGS data analysis

File formats


Training 2: NGS variant analysis

Session of November 2018 (Stéphane Plaisance)

Session of 2018 using GenePattern

Session of 2020 using GenePattern

Training archive

Q&A pages

HowTo Pages related to this training


 

Training 3: RNA-Seq analysis

Bulk RNA-Seq analysis for differential expression

Tools

Install the latest version of R and RStudio. List of R packages used in the training:

  • ggplot2
  • ggrepel
  • gplots
  • pheatmap
  • plyr
  • RColorBrewer
  • reshape2
  • Bioconductor
  • Bioconductor: airway
  • Bioconductor: DESeq2
  • Bioconductor: GenomicAlignments
  • Bioconductor: GenomicFeatures
  • Bioconductor: org.Hs.eg.db
  • Bioconductor: Rsamtools
  • Bioconductor: tximeta
  • Only for Mac users: Bioconductor: Rsubread

Slides

Exercises

Files

Extra links


Single cell RNA-Seq analysis

Tools

Install the latest version of R and RStudio. List of R packages used in the training:

  • dplyr
  • gridExtra
  • rgl
  • Seurat
  • stringr
  • Bioconductor: scater

Slides

Exercises

Files

  • aggregated data: output of CellRanger aggregate to be used as input of the script for Seurat analysis of aggregated brain data sets

Extra links

Summer school 2018

Scenic

Experimental design

Integration of omics data

What after the summer school ?

Bulk RNA-Seq - from raw reads to counts:

  • We have two GenePattern servers running that contain all the tools discussed in the training. Send an email to bits@vib.be to get an account
  • We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.

Bulk RNA-Seq - finding DE genes:

  • You can do the R analysis on your own computer: see this section for the list of packages you need to install.
  • We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.

Single cell RNA-Seq:

  • You can do the Seurat analysis on your own computer: see .this section for the list of packages you need to install.
  • We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.
  • In the future you can get support from Niels and Liesbet. Contact scRNAseq@irc.vib-ugent.be for more information.
  • We will check if cell ranger is installed on KULeuven vsc (accessible by people from KULeuven and UHasselt).


Handicon.png A GIT page has been started to post your issues and share with us, you can reach it at https://github.com/BITS-VIB/Summer_school_2018

  • NGS_data_analysis_tools A page listing tools found during the day and that you may want to install on your computer

Archive

Session of March 20th and 23rd, 2015 (Stéphane Plaisance)

repeated September 25, 2015

Hands-on_introduction_to_NGS_RNASeq_DE_analysis - the pages of the actual training
containing a hands-on workflow of RNA-Seq analysis for differential expression using command line tools.


creating ENV variables for the training

Create a new file with "sudo /etc/profile.d/bits.sh" and paste the following content

# system wide ENV variables to ease path in training exercises
export SUMMER=/usr/summer
export SOFT=$SUMMER/software
export REFS=$SUMMER/refs
export DATA=/mnt/userdata/$(whoami)

source (=execute) the file by typing ". /etc/profile.d/bits.sh"

You now have shortcuts (env variables) that can be typed to reach the very long exercise locations as fololws:

  • $SUMMER leads to /usr/summer
  • $SOFT leads to $SUMMER/software
  • $REFS leads to $SUMMER/refs
  • $DATA leads to /home/<yourhome>/data

edgeR / DESeq2

Exercises
Slides

Archive

Session of January 20th and 27th, 2014 using Galaxy (Joachim Jacob)

Training 4: ChIP-Seq analysis

Introduction

The aim of this session is to :

  • Have an understanding of the nature of ChIP-Seq data
  • Perform a complete analysis workflow including QC, read mapping, visualization in a genome browser and peak-calling
  • Use the GenePattern platform for each step of the workflow and feel the complexity of the task
  • Have an overview of possible downstream analyses
  • Perform a motif analysis with online web programs

This training gives an introduction to ChIP-seq data analysis, covering the processing steps starting from the reads to the peaks. Among all possible downstream analyses, the practical aspect will focus on motif analyses. A particular emphasis will be put on deciding which downstream analyses to perform depending on the biological question. This training does not cover all methods available today. It does not aim at bringing users to a professional NGS analyst level but provides enough information to allow biologists understand what DNA sequencing practically is and to communicate with NGS experts for more in-depth needs.

For this training, we will use a dataset produced by Myers et al [1] involved in the regulation of gene expression under anaerobic conditions in bacteria. We will focus on one factor: FNR. The advantage of this dataset is its small size, allowing real time execution of all steps of the dataset.

Suggested Reading :

  • Bailey et al. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data. PLoS Comput Biol 9, e1003326 (2013) [2].PDF
  • Thomas-Chollier et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nature Protocols 7, 1551–1568 (2012)[3]. PDF

raw Data :


additional files:

Exercises

Same training in command line instead of GenePattern


Links


Archive

Session of June 1st, 2015 by Morgane Thomas-Chollier
Session of February 24th, 2014 by Morgane Thomas-Chollier

HowTo Pages related to this training

 

Training 5: metagenomics

Slides

Data files

Tools

  • Lotus pipeline
  • Download usearch version 8 and copy into /usr/bin/tools/ folder (you need to be superuser for this)
    Make executable:
    sudo chmod +x /usr/bin/tools/usearch8.1.1861_i86linux32

    Create a symbolic link into the folder where Lotus will search for it:

    sudo ln -s /usr/bin/tools/usearch8.1.1861_i86linux32 /usr/bin/tools/lotus_pipeline/bin/usearch_bin
  • You also need R with the vegan package installed


Exercises



References:
  1. Kevin S Myers, Huihuang Yan, Irene M Ong, Dongjun Chung, Kun Liang, Frances Tran, Sündüz Keleş, Robert Landick, Patricia J Kiley
    Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding.
    PLoS Genet: 2013, 9(6);e1003565
    [PubMed:23818864] ##WORLDCAT## [DOI] (I p)

  2. Timothy Bailey, Pawel Krajewski, Istvan Ladunga, Celine Lefebvre, Qunhua Li, Tao Liu, Pedro Madrigal, Cenny Taslim, Jie Zhang
    Practical guidelines for the comprehensive analysis of ChIP-seq data.
    PLoS Comput Biol: 2013, 9(11);e1003326
    [PubMed:24244136] ##WORLDCAT## [DOI] (I p)

  3. Morgane Thomas-Chollier, Elodie Darbo, Carl Herrmann, Matthieu Defrance, Denis Thieffry, Jacques van Helden
    A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs.
    Nat Protoc: 2012, 7(8);1551-68
    [PubMed:22836136] ##WORLDCAT## [DOI] (I e)

  4. http://www.ncbi.nlm.nih.gov/geo/
  5. http://www.ebi.ac.uk/ena/
  6. http://bowtie-bio.sourceforge.net/
  7. http://rsat.eu
  8. http://www.cbrc.kaust.edu.sa/hmcan/



[ Main_Page ]