mail
J. Cliff Dyer
jcd at sdf.lonestar.org
Wed Jul 15 15:40:36 EDT 2009
On Thu, 2009-07-16 at 00:16 +0530, amrita at iisermohali.ac.in wrote:
> Dear all,
>
> Sorry that I am disturbing you all again and again but this is the way I
> am trying to solve my problem:---
>
> >>> import re
> >>> exp = re.compile("CA")
> >>> infile = open("file1.txt")
> >>> for line in infile:
> ... values = re.split("\s+", line)
> ... if exp.search(line):
> ... print ("%s %s CA = %s" %(values[2], values[3], values[6]))
> ...
> with this it is giving the output like:----
>
> 8 ALA CA = 54.67
> 15 ALA CA = 52.18
> 21 ALA CA = 54.33
> 23 ALA CA = 55.84
> 33 ALA CA = 55.58
> 38 ALA CA = 54.33
>
> which is all right but i want CB and C value also in each row and it
> should take value from 5th column infront of them, file is something
> lookin like:-----
>
> 47 8 ALA H H 7.85 0.02 1
> 48 8 ALA HA H 2.98 0.02 1
> 49 8 ALA HB H 1.05 0.02 1
> 50 8 ALA C C 179.39 0.3 1
> 51 8 ALA CA C 54.67 0.3 1
> 52 8 ALA CB C 18.85 0.3 1
> 53 8 ALA N N 123.95 0.3 1
> 107 15 ALA H H 8.05 0.02 1
> 108 15 ALA HA H 4.52 0.02 1
> 109 15 ALA HB H 1.29 0.02 1
> 110 15 ALA C C 177.18 0.3 1
> 111 15 ALA CA C 52.18 0.3 1
> 112 15 ALA CB C 20.64 0.3 1
> 113 15 ALA N N 119.31 0.3 1
> 154 21 ALA H H 7.66 0.02 1
> 155 21 ALA HA H 4.05 0.02 1
> 156 21 ALA HB H 1.39 0.02 1
> 157 21 ALA C C 179.35 0.3 1
> 158 21 ALA CA C 54.33 0.3 1
> 159 21 ALA CB C 17.87 0.3 1
> 160 21 ALA N N 123.58 0.3 1
> 169 23 ALA H H 8.78 0.02 1
> 170 23 ALA HA H 4.14 0.02 1
> 171 23 ALA HB H 1.62 0.02 1
> 172 23 ALA C C 179.93 0.3 1
> 173 23 ALA CA C 55.84 0.3 1
> 174 23 ALA CB C 17.55 0.3 1
> 175 23 ALA N N 120.16 0.3 1
> 232 33 ALA H H 7.57 0.02 1
> 233 33 ALA HA H 3.89 0.02 1
> 234 33 ALA HB H 1.78 0.02 1
> 235 33 ALA C C 179.24 0.3 1
> 236 33 ALA CA C 55.58 0.3 1
> 237 33 ALA CB C 19.75 0.3 1
> 238 33 ALA N N 121.52 0.3 1
> 269 38 ALA H H 8.29 0.02 1
> 270 38 ALA HA H 4.04 0.02 1
> 271 38 ALA HB H 1.35 0.02 1
> 272 38 ALA C C 178.95 0.3 1
> 273 38 ALA CA C 54.33 0.3 1
> 274 38 ALA CB C 18.30 0.3 1
> 275 38 ALA N N 120.62 0.3 1
>
> I just want that it will give output something like:-----
>
> 8 ALA C = 179.39 CA = 54.67 CB = 18.85
> 15 ALA C = 177.18 CA = 52.18 CB = 20.64
> 21 ALA C = 179.35 CA = 54.33 CB = 17.87.....
>
> so first it will write the position of the amino acid(given by second
> column)then amino acid here it is ALA and then the corresponding value of
> C, CA and CB from 5th colum for each position of ALA.
>
Your program is structured wrong for doing what you want. Right now you
are essentially saying, "for each line in the input file, print
something", but each line of the input file only has part of the
information you want. Instead, you should create a data structure that
gathers the pieces you want, and only print once you have all of those
pieces in place.
Step one, create your data structure, so the data is grouped the way you
want it, not by line:
>>> thingies = {} # create a dictionary
>>> for line in infile:
>>> data = line.split()
>>>
>>> if data[1] not in thingies:
>>> # group data by data[1]
>>> thingies[data[1]] = {}
>>>
>>> thingies[data[1]][data[3]] = data[5]
Step two, extract the data from the list:
>>> for key, data in thingies.items():
>>> print key,
>>> for entry in data
>>> print '%s = %s' % (entry, data[entry]),
This should do what you want, minus some formatting issues.
Cheers,
Cliff
More information about the Python-list
mailing list