mail

J. Cliff Dyer jcd at sdf.lonestar.org
Wed Jul 15 21:40:36 CEST 2009


On Thu, 2009-07-16 at 00:16 +0530, amrita at iisermohali.ac.in wrote:
> Dear all,
> 
> Sorry that I am disturbing you all again and again but this is the way I
> am trying to solve my problem:---
> 
> >>> import re
> >>> exp = re.compile("CA")
> >>> infile = open("file1.txt")
> >>> for line in infile:
> ...     values = re.split("\s+", line)
> ...     if exp.search(line):
> ...        print ("%s %s CA = %s" %(values[2], values[3], values[6]))
> ...
>  with this it is giving the output like:----
> 
> 8 ALA CA = 54.67
> 15 ALA CA = 52.18
> 21 ALA CA = 54.33
> 23 ALA CA = 55.84
> 33 ALA CA = 55.58
> 38 ALA CA = 54.33
> 
> which is all right but i want CB and C value also in each row and it
> should take value from 5th column infront of them, file is something
> lookin like:-----
> 
>  47     8   ALA       H     H      7.85     0.02     1
>  48     8   ALA       HA    H      2.98     0.02     1
>  49     8   ALA       HB    H      1.05     0.02     1
>  50     8   ALA       C     C    179.39      0.3     1
>  51     8   ALA       CA    C     54.67      0.3     1
>  52     8   ALA       CB    C     18.85      0.3     1
>  53     8   ALA       N     N    123.95      0.3     1
> 107    15   ALA       H     H      8.05     0.02     1
> 108    15   ALA       HA    H      4.52     0.02     1
> 109    15   ALA       HB    H      1.29     0.02     1
> 110    15   ALA       C     C    177.18      0.3     1
> 111    15   ALA       CA    C     52.18      0.3     1
> 112    15   ALA       CB    C     20.64      0.3     1
> 113    15   ALA       N     N    119.31      0.3     1
> 154    21   ALA       H     H      7.66     0.02     1
> 155    21   ALA       HA    H      4.05     0.02     1
> 156    21   ALA       HB    H      1.39     0.02     1
> 157    21   ALA       C     C    179.35      0.3     1
> 158    21   ALA       CA    C     54.33      0.3     1
> 159    21   ALA       CB    C     17.87      0.3     1
> 160    21   ALA       N     N    123.58      0.3     1
> 169    23   ALA       H     H      8.78     0.02     1
> 170    23   ALA       HA    H      4.14     0.02     1
> 171    23   ALA       HB    H      1.62     0.02     1
> 172    23   ALA       C     C    179.93      0.3     1
> 173    23   ALA       CA    C     55.84      0.3     1
> 174    23   ALA       CB    C     17.55      0.3     1
> 175    23   ALA       N     N    120.16      0.3     1
> 232    33   ALA       H     H      7.57     0.02     1
> 233    33   ALA       HA    H      3.89     0.02     1
> 234    33   ALA       HB    H      1.78     0.02     1
> 235    33   ALA       C     C    179.24      0.3     1
> 236    33   ALA       CA    C     55.58      0.3     1
> 237    33   ALA       CB    C     19.75      0.3     1
> 238    33   ALA       N     N    121.52      0.3     1
> 269    38   ALA       H     H      8.29     0.02     1
> 270    38   ALA       HA    H      4.04     0.02     1
> 271    38   ALA       HB    H      1.35     0.02     1
> 272    38   ALA       C     C    178.95      0.3     1
> 273    38   ALA       CA    C     54.33      0.3     1
> 274    38   ALA       CB    C     18.30      0.3     1
> 275    38   ALA       N     N    120.62      0.3     1
> 
> I just want that it will give output something like:-----
> 
> 8  ALA  C = 179.39  CA = 54.67  CB = 18.85
> 15 ALA  C = 177.18  CA = 52.18  CB = 20.64
> 21 ALA  C = 179.35  CA = 54.33  CB = 17.87.....
> 
> so first it will write the position of the amino acid(given by second
> column)then amino acid here it is ALA and then the corresponding value of
> C, CA and CB from 5th colum for each position of ALA.
> 

Your program is structured wrong for doing what you want.  Right now you
are essentially saying, "for each line in the input file, print
something", but each line of the input file only has part of the
information you want.  Instead, you should create a data structure that
gathers the pieces you want, and only print once you have all of those
pieces in place.

Step one, create your data structure, so the data is grouped the way you
want it, not by line:

>>> thingies = {} # create a dictionary
>>> for line in infile:
>>>     data = line.split()
>>>
>>>     if data[1] not in thingies:
>>>         # group data by data[1]
>>>         thingies[data[1]] = {}
>>> 
>>>     thingies[data[1]][data[3]] = data[5]

Step two, extract the data from the list:

>>> for key, data in thingies.items(): 
>>>     print key,
>>>     for entry in data
>>>     print '%s = %s' % (entry, data[entry]),

This should do what you want, minus some formatting issues.

Cheers,
Cliff





More information about the Python-list mailing list