[Tutor] List Indexing Issue

Jerry Hill malaclypse2 at gmail.com
Wed May 9 03:19:30 CEST 2012


On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <s.charonis at gmail.com> wrote:
> Hello python community,
>
> I'm having a small issue with list indexing. I am extracting certain
> information from a PDB (protein information) file and need certain fields of
> the file to be copied into a list. The entries look like this:
>
> ATOM   1512  N   VAL A 222       8.544  -7.133  25.697  1.00 48.89
> N
> ATOM   1513  CA  VAL A 222       8.251  -6.190  24.619  1.00 48.64
> C
> ATOM   1514  C   VAL A 222       9.528  -5.762  23.898  1.00 48.32
> C
>
> I am using the following syntax to parse these lines into a list:
...
> charged_res_coord.append(atom_coord[i].split()[1:9])

You're using split, assuming that there will be blank spaces between
your fields.  That's not true, though.  PDB is a fixed length record
format, according to the documentation I found here:
http://www.wwpdb.org/docs.html

If you just have a couple of items to pull out, you can just slice the
string at the appropriate places.  Based on those docs, you could pull
the x, y, and z coordinates out like this:


x_coord = atom_line[30:38]
y_coord = atom_line[38:46]
z_coord = atom_line[46:54]

If you need to pull more of the data out, or you may want to reuse
this code in the future, it might be worth actually parsing the record
into all its parts.  For a fixed length record, I usually do something
like this:

pdbdata = """
ATOM   1512  N   VAL A 222       8.544  -7.133  25.697  1.00 48.89           N
ATOM   1513  CA  VAL A 222       8.251  -6.190  24.619  1.00 48.64           C
ATOM   1514  C   VAL A 222       9.528  -5.762  23.898  1.00 48.32           C
ATOM   1617  N   GLU A1005      11.906  -2.722   7.994  1.00 44.02           N
""".splitlines()

atom_field_spec = [
    slice(0,6),
    slice(6,11),
    slice(12,16),
    slice(16,18),
    slice(17,20),
    slice(21,22),
    slice(22,26),
    slice(26,27),
    slice(30,38),
    slice(38,46),
    slice(46,54),
    slice(54,60),
    slice(60,66),
    slice(76,78),
    slice(78,80),
    ]

for line in pdbdata:
    if line.startswith('ATOM'):
        data = [line[field_spec] for field_spec in atom_field_spec]
        print(data)


You can build all kind of fancy data structures on top of that if you
want to.  You could use that extracted data to build a namedtuple for
convenient access to the data by names instead of indexes into a list,
or to create instances of a custom class with whatever functionality
you need.

-- 
Jerry


More information about the Tutor mailing list