Exploring the Protein Data Bank exercises

Exercises created by Joost Van Durme

Part of Protein Structure Analysis training

Search for a structure

Via UniProt

The way of searching for a specific protein structure depends on the data you already have. You might already have the PDB ID (a unique identifier), that's an easy one. But mostly you have the protein name or you just have a sequence. In the last cases I recommend to start from the UniProt website at http://www.uniprot.org, which is the best annotated protein database in the world. Our first model protein will be the molecular chaperone DnaK from E. coli.

Below is an image of the UniProt search box where you can start your search for proteins.

Go to the UniProt website and search for the DnaK protein

The UniProt search engine returns a list of DnaK protein sequences from a variety of organisms. An entry with accession code P0A6Y8 and entry name DNAK_ECOLI should be near the top of this list. Click on the accession code (column Entry) to view the protein page of this DnaK from the model organism Escherichia coli. Click on Structure in the left-side menu and then look at the 3D structure databases table.

Which structures (give the 4-character PDB ID) of the C-terminal domain of DnaK should preferentially be use for analysis and why?

Try it by yourself before expanding on the right!

Usually, the recommended selection criteria are using an X-ray structure with low resolution and low Rfree factor. Furthermore, the PDB database has pre-calculated a validation report for all of the structures.

As an example, have a look at https://www.ebi.ac.uk/pdbe/entry/pdb/4EZX under the section 'Experiments and Validation'. For many PDB structures, there is also a re-done structure available with a vast amount of informaton on the quality of the X-ray structure and suggested 'better' models e.g. (https://pdb-redo.eu/db/4ezx). In our case, we could opt for the structures 1DKX and 4EZX.

This is a difficult example since there are so many high resolution structures available. So, it is recommended to study the articles and compare the available structures to find your favorite structure for further analysis.

Select 'RCSB PDB' on the left side of the table and then click on the first interesting candidate in the list e.g. 1DKX. This should open the structure page on the RCSB website.

Via the Protein Data Bank by PDB ID

You can find structural information directly at the PDB database. The web site of the PDB consortium is located at http://www.wwpdb.org. This web site provides links to all members of the PDB (left side). It is a question of taste which resource you start off with. For X-ray structures, it is currently PDBe, RCSB PDB, PDBj. For NMR structres, you find the BMRB. In today's course, we focus on the PDB resources only.

Below is an image of the RCSB search box (http://www.rcsb.org/pdb/home/home.do) where you can start your search for structures.

The PDB file with ID 1DKX contains the atomic coordinates of the molecular chaperone (DnaK) from E. coli.

Go to the PDB website and type 1DKX in the search box

This will lead you to the same page we got earlier through UniProt.

Via the Protein Data Bank by sequence

In lots of cases we only have a sequence of which we would like to find out if there is structural information. The PDB can be searched using a sequence as input. Here is the sequence of the C-terminal substrate binding domain of DnaK:

DVKDVLLLDVTPLSLGIETMGGVMTTLIAKNTTIPTKHSQVFSTAEDNQSAVTIHVLQGE
RKRAADNKSLGQFNLDGINPAPRGMPQIEVTFDIDADGILHVSAKDKNSGKEQKITIKAS
SGLNEDEIQKMVRDAEANAEADRKFEELVQTRNQGDHLLHSTRKQVEEAGDKLPADDKTA
IESALTALETALKGEDKAAIEAKMQELAQVSQKLMEIAQQQHAQQQTAGADASANNAKDD
DVVDAEFEEVKDKK

The PDB allows sequence searches through the same search box we used before.

There is also an Advanced Search section, with a Blast/Fasta option in the Sequence Features section. Please select 'Sequence BLAST/PSI-BLAST' in the Query type drop down. This method allows you to change some parameters for the search.

Copy and paste the sequence in the ''Sequence'' field and press ''Submit query''. 
You should see the same structures popping up as you saw in the UniProt page of DnaK.

The PDB file

Introduction

A PDB (Protein Data Bank) file is a plain text file that contains the atom coordinates of a solved 3D structure of a protein or even DNA. Such coordinate files can be obtained at the Protein Data Bank at http://www.rcsb.org/pdb. Each PDB file has a unique identifier (ID) consisting of 4 characters, the first one is always a number. Note: It has been announced that the 4 character code will change in the future (https://www.wwpdb.org/news/news?year=2017#5910c8d8d3b1d333029d4ea8).

The PDB file with ID 1DKX contains the atomic coordinates of the molecular chaperone (DnaK) from E coli.

Go to the PDB website at http://www.rcsb.org/pdb and type 1DKX in the search box and answer these questions mainly by investigating the main page
and the sequence page of the 1DKX entry.

How many molecules were solved in this PDB file? What kind of molecules are these (proteins, peptides, DNA, ...)?
* Two, called polymers or chains: they are polypeptides (see Type)

Does the structure represent the full protein? If not, how many residues are missing? Hint: Click on the UniProt KB link in the Sequence tab to see the full sequence.
* You can go the sequence tab at the top
* On this page, we can switch between SEQRES view and UniProt view
* Investigating the displayed sequence reveals that the first N-terminal 387 residues are missing from the structure.
* For further comparison, we need to go to the corresponding uniprot entry and compare with the C-terminal residues. File:Uniprotlink from pdb.png
* Summary: a large chunk of the N-terminus is missing from the structure, the C-terminus is virtually complete.

Was this structure solved by X-Ray or NMR?
* X-RAY diffraction, as shown by Experimental Details: $Pdb xraydiffraction 1dkx.png$

What is the atomic resolution and R-factor?
* Atomic resolution: 2.00 Ångstrom and R-factor of 0.206.

Downloading the structure

The file that holds the 3D coordinates can be downloaded by clicking on Download files in the top right corner and then choosing PDB file (text). For convenience, save this file on your Desktop. The filename is the 4-character unique PDB ID.

Open this file with a text editor, e.g. WordPad is an excellent tool for that.
Do you see the different sections in the PDB file? Analyse some ATOM lines and try to explain what kind of data is in each column.

Additional exercises on searching PDB can be found on the basic bioinformatics exercises page.

Exploring the Protein Data Bank exercises

Search for a structure

The PDB file

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox