Exercises on ENA
Go to parent Basic bioinformatics concepts, databases and tools#Exercises_during_the_training
Contents
The ENA database
The European Bioinformatics Institute (EBI) hosts the ENA (European Nucleotide Archive) database: one part of ENA is called EMBL-bank, containing annotated primary sequence data. The other two parts are the Trace Archive and the Short Read Archive (SRA), containing batch-submitted primary sequence data.
EBI has multiple search portals:
Note: information is liquid. Records change all the time: info is removed and added. Therefore, screenshots may not be up-to-date.
The ENA Browser
Go to the ENA Browser. You see two text field, the upper one for "Text search", the lower for "Sequence search". We will concentrate on the text search, sequence searches will be covered in Module 2. You can search using free text (e.g. species names, disease names, feature names,...) or using an accession number.
Exercise 1: caspase
Perform a search for 'caspase complete cds'.
The ENA search returns records from the EMBL-bank part of ENA, divided into "Update" and "Release". "Update" contains records that were recently updated. Clicking the "+" sign expands the corresponding section, revealing the individual search results.
Each record can be further expanded by clicking on the "+" sign to see more details of the record. Do this for the first record of the "Update" list (JX912275 : Spodoptera frugiperda initiator caspase mRNA, complete cds).
Which data class does this ENA record belong to? |
---|
The record belongs to the STD class. |
The most useful entries with the most relevant annotations are from the 'STD' (standard) data class. See more info on ENA database structure.
Download the record in FASTA format? |
---|
In the "Download" section click on "FASTA" This will create a file called "ena.fasta" in the "Downloads" folder of your computer. Open the file in WordPad. |
Can you tell the major difference between a sequence stored in FASTA and a sequence in EMBL text format? |
---|
FASTA has been stripped of all annotations: it is basically just the sequence, and one description line (corresponds to the 'DE' line in the EMBL text file). |
Exercise 2: kinase
The nicest thing about ENA Browser search, is the fact that the results are categorized by the part of ENA from which they originate. This becomes clear when you do a text search with "kinase".
The results page groups the entries according to type of sequence.
The text searches that you can perform using the ENA Browser are very 'crude'. For example, when you search for "kinase", every record containing somewhere the word "kinase" is shown, even non-kinase sequences just as in Genbank. Be aware of this because this is often not what you want!
EBI Search, cross-database search at EBI
EB-eye is a cross-database search tool for EBI databases similar to Entrez for NCBI databases. You can access it on EBI Search.
Exercise 1: AF24735
This redirects you to the EBI summary record of this gene
EBI provides very nice overview pages, with links to many other databases. A good place to start.