Basic bioinformatics concepts, databases and tools
This wiki page is dedicated to the training course "Basic bioinformatics concepts, databases and tools". You can find more information about the covered topics, software (installation instructions) and databases, additional useful links and recommended literature. Have fun!
Contents
Slides
- Module 1: Sequence databases.
- Module 2: Sequence comparison.
- Module 3: Sequence analysis.
- Module 4: Other biological data.
- Module 5: Integration of data.
Data
Group Exercises on:
- Motif and domain databases, phylogenetic trees, multiple sequence alignment, finding homologs
- Gene regulation using Ensembl, Transfac, JASPAR, RSAT, Contra, Physbinder, PScan
- Design primers for qPCR using Ensembl, Primer3, BLAST, OligoAnalyzer, In Silico PCR
- Functional analysis of a list of genes
Files for the exercises:
- BLAST - low complexity protein: chicken histone
- Pairwise alignment - Human MYOD protein sequence
- Pairwise alignment - Mouse MYOD protein sequence
- Pairwise alignment - Fruit fly MYOD protein sequence
- Pairwise alignment - Human MYOD CDS
- Pairwise alignment - Fruit fly MYOD CDS
- MSA - Garfield sequences
- Group exercise 1 - protein sequences of indy and its orthologs in fasta format
- Group exercise 1 - phylogenetic tree of clean Indy alignment made with MrBayes in Ugene and saved in Newick format
- Group exercise 2 - protein sequence of talin in fasta format
- Group exercise 2 - protein sequences of talin and its orthologs from OrthoMCL
- Group exercise 2 - multiple sequence alignment of talin and its orthologs made by Muscle in Ugene
- Group exercise 3 and 4 - protein sequences of plant histones from UniProt
- Group exercise 3 - multiple sequence alignment of plant histones made by ClustalW in Ugene
- Group exercise 3 - HMM of multiple sequence alignment of plant histones made by HMMER in Ugene
- Group exercise 3 - protein sequences of plant histones and the novel ortholog that was found by HMMER
- Group exercise 3 - multiple sequence alignment of plant histones and the novel ortholog made by ClustalW in Ugene
- Group exercise 4 - protein sequences of plant histones and the novel ortholog that was found by PSI-BLAST
- Group exercise 4 - multiple sequence alignment of plant histones and the novel ortholog made by ClustalW in Ugene
- Group exercise 5 - protein sequences of sequences ending in YRGS from Prosite
- Group exercise 5 - multiple sequence alignment of sequences ending in YRGS made by Muscle in Ugene
- Group exercise 5 - HMM of multiple sequence alignment of sequences ending in YRGS made by HMMER in Ugene
- Group exercise 6 - HMM of multiple sequence alignment of rice cyclins obtained from pFam
- Group exercise on gene regulation - PSM of AP1 motif obtained from Transfac
- Group exercise on gene regulation - sequence of HCST promoter in fasta format obtained from RSAT
- Group exercise on gene regulation - sequences of a set of 10 random promoters in fasta format obtained from RSAT
- Group exercise on primer design' - sequences of the CDS of INS in fasta format obtained from Ensembl
- Group exercise on functional enrichment - HGCN symbols of genes downregulated in pituitary cancer after treatment
- Group exercise on functional enrichment - Ensembl gene IDs of genes downregulated in pituitary cancer after treatment
- Group exercise on functional enrichment - PSM of TP53 motif obtained from JASPAR
- Group exercise on functional enrichment - Ensembl Gene IDs of a set of potential targets of TF TP53
- Group exercise on functional enrichment - RefSeq IDs of a set of potential targets of TF TP53
- Group exercise on functional enrichment - sequences of the promoters of the potential TP53 targets in fasta format obtained from RSAT
- Group exercise on variants - vcf file containing variants of a patient with hemolytic anemia
FAQ
Q&A added during the Basic bioinformatics training
Exercises during the training
During these three days you will make exercises using public web sites and software (freeware type) running locally on your PC. Because most people use Windows we will use a Windows installation. For reasons of convenience we will perform some of the exercises on published sequences that are already stored in files, which you can find in the list above.
Module 1 Searching sequence databases
- Searching Genbank
- Searching NCBI's Proteins database
- Searching NCBI databases using Entrez
- Ensembl
- Uniprot
- Archive for Module1
Module 2 Sequence alignment
- Sequence similarity searches: BLAST, OrthoMCL...
- Pairwise sequence alignments
- Multiple sequence alignment
- Protein motifs and domains
- Archive for Module 2
Module 3 Sequence analysis: motifs, structure and function prediction
- Protein sequence analysis
- DNA sequence analysis: Gene Regulation
- DNA sequence analysis: Primer Design
- RNA sequence analysis
- Archive for Module 3
Module 4 Beyond sequences: other relevant biological data sources
- Exercises on Pubmed
- Exercises on Protein Structure
- Exercises on Gene Expression
- Functional annotation and enrichment analysis
- Variation data
Module 5 Biological data integration and interpretation
Get social !
Perhaps the best tip of this course... Knowledge nowadays is not so much anymore stored in databases. Instead, if flows on the internet. Ask your colleagues world-wide! Very valuable bioinformatics resources are:
You can ask or check bioinformatics-related questions on these fora.Also science seeker can be helpful. It aggregates many science blogs and allows you to search all them. And of course, Nature wouldn't be nature if they hadn't made a ranking of 50 popular science blogs.
Speaking of science blogs: there are many valuable bioinformatics blogs on the web:
- Open Helix, my personal favourite, covers tutorials and FAQs for common bioinformatics tools. Also check out the Friday SNPpets (collection of popular weekly twitter feeds)
- Mass genomics, a medical genomics blog with great article reviews
- Getting genetics done, a well-maintained blog with a focus technical issues
Solutions
- Solutions to group exercises on phylogenetic trees, finding homologs, prier design, gene regulation
- Solutions to group exercise on variant characterization
- Solutions to group exercise on functional characterization
Additional useful links
- Another wiki dedicated to bioinformatics, hosted by Bioinformatics.org
- Lecture slides on Bioinformatics as taught at the Max Planck Institute
Recommended literature
Introductionary books on bioinformatics are listed here.
Obsolete
- obsolete - Ex 4 BLAST, fastA, siRNA design