Implementing a basic PDB parser
As you know, by now the Bio.PDB parser is not complete. Here, we will develop a framework that allows you to parse other records on PDB files. Although we can expect a migration from PDB to the mmCIF format in the future, this is still useful in many situations.
Getting ready
In order to parse a format, we need its specification. You can find this at http://www.wwpdb.org/documentation/file-format.php. We will mostly be concerned with secondary structure records (HELIX and SHEET), but you will find more records in your scaffold parser. You can extend this scaffold to other records that you may need.
You can find this content in the 06_Prot/Parser.ipynb
notebook.
How to do it...
Take a look at the following steps:
First, let's retrieve a file to work with. We will only retrieve, not parse as follows:
from __future__ import print_function from Bio import PDB repository = PDB.PDBList() repository.retrieve_pdb_file('1TUP', pdir='.')
We will now devise a basic parsing framework...