Hi,<br>I would suggest you to use the biopython package. It has a PDB parser with which you can extract any specific information like atom name, residue, chain etc as you wish.<br>Bala<br><br><div class="gmail_quote">On Wed, May 9, 2012 at 3:19 AM, Jerry Hill <span dir="ltr"><<a href="mailto:malaclypse2@gmail.com" target="_blank">malaclypse2@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <<a href="mailto:s.charonis@gmail.com">s.charonis@gmail.com</a>> wrote:<br>
</div><div class="im">> Hello python community,<br>
><br>
> I'm having a small issue with list indexing. I am extracting certain<br>
> information from a PDB (protein information) file and need certain fields of<br>
> the file to be copied into a list. The entries look like this:<br>
><br>
> ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89<br>
> N<br>
> ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64<br>
> C<br>
> ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32<br>
> C<br>
><br>
> I am using the following syntax to parse these lines into a list:<br>
</div>...<br>
<div class="im">> charged_res_coord.append(atom_coord[i].split()[1:9])<br>
<br>
</div>You're using split, assuming that there will be blank spaces between<br>
your fields. That's not true, though. PDB is a fixed length record<br>
format, according to the documentation I found here:<br>
<a href="http://www.wwpdb.org/docs.html" target="_blank">http://www.wwpdb.org/docs.html</a><br>
<br>
If you just have a couple of items to pull out, you can just slice the<br>
string at the appropriate places. Based on those docs, you could pull<br>
the x, y, and z coordinates out like this:<br>
<br>
<br>
x_coord = atom_line[30:38]<br>
y_coord = atom_line[38:46]<br>
z_coord = atom_line[46:54]<br>
<br>
If you need to pull more of the data out, or you may want to reuse<br>
this code in the future, it might be worth actually parsing the record<br>
into all its parts. For a fixed length record, I usually do something<br>
like this:<br>
<br>
pdbdata = """<br>
<div class="im">ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 N<br>
ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 C<br>
ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 C<br>
</div><div class="im">ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02 N<br>
</div>""".splitlines()<br>
<br>
atom_field_spec = [<br>
slice(0,6),<br>
slice(6,11),<br>
slice(12,16),<br>
slice(16,18),<br>
slice(17,20),<br>
slice(21,22),<br>
slice(22,26),<br>
slice(26,27),<br>
slice(30,38),<br>
slice(38,46),<br>
slice(46,54),<br>
slice(54,60),<br>
slice(60,66),<br>
slice(76,78),<br>
slice(78,80),<br>
]<br>
<br>
for line in pdbdata:<br>
if line.startswith('ATOM'):<br>
data = [line[field_spec] for field_spec in atom_field_spec]<br>
print(data)<br>
<br>
<br>
You can build all kind of fancy data structures on top of that if you<br>
want to. You could use that extracted data to build a namedtuple for<br>
convenient access to the data by names instead of indexes into a list,<br>
or to create instances of a custom class with whatever functionality<br>
you need.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Jerry<br>
</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>
Tutor maillist - <a href="mailto:Tutor@python.org">Tutor@python.org</a><br>
To unsubscribe or change subscription options:<br>
<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>C. Balasubramanian<br><br>