[Tutor] arrangement of datafile

Tue Dec 17 20:17:22 CET 2013

Amrita Kumari wrote:

> Hi,
> 
> I am new in programming and want to try Python programming (which is
> simple and easy to learn) to solve one problem: in which
> I have various long file like this:
> 
> 1 GLY HA2=3.7850 HA3=3.9130
> 2 SER H=8.8500 HA=4.3370 N=115.7570
> 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
> 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
> 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
> 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690
> HD12=0.7690 HD13=0.7690 N=117.3260
> 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
> 8 PRO HD2=3.7450
> 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
> 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470
> HG12=1.6010 HG13=2.1670 N=119.0300
> 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
> 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
> 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620
> HD13=0.8620 N=119.1360
> ........................
> .......................
> 
> where first column is the residue number, what I want is to print
> individual atom chemical shift value one by one along with residue
> number.....for example for atom HA2 it should be:
> 
> 1 HA2=3.7850
> 2 HA2=nil
> 3 HA2=nil
> .....
> ............
> ..........
> 13 HA2=nil
> 
> similarly for atom HA3 it should be same as above:
> 
> 1 HA3=3.9130
> 2 HA3=nil
> 3 HA3=nil
> ...........
> ............
> ............
> 13 HA3=nil
> 
> while for atom H it should be:
> 1  H=nil
> 2  H=8.8500
> 3  H=8.7530
> 4  H=7.9100
> 5  H=7.4450
> ........
> 
> but in some file the residue number is not continuous some are missing (in
> between). I want to write python code to solve this problem but don't know
> how to split the datafile and print the desired output. This problem is
> important in order to compare each atom chemical shift value with some
> other web-based generated chemical shift value. As the number of atoms in
> different row are different and similar atom are at random position in
> different residue hence I don't know to to split them. Please help to
> solve this problem.

You tell us what you want, but you don't give us an idea what you can do and 
what problems you run into.

Can you read a file line by line?
Can you split the line into a list of strings at whitespace occurences?
Can you extract the first item from the list and convert it to an int?
Can you remove the first two items from the list?
Can you split the items in the list at the "="?

Do what you can and come back here when you run into problems.
Once you have finished the above agenda you can put your data into two 
nested dicts that look like this:

{1: {'HA2': 3.785, 'HA3': 3.913},
 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757},
 3: {'H': 8.753, 'HA': 4.034, 'HB2': 1.808, 'N': 123.238},
 4: {'H': 7.91, 'HA': 3.862, 'HB2': 1.744, 'HG2': 1.441, 'N': 117.981},
 5: {'H': 7.445, 'HA': 4.077, 'HB2': 1.765, 'HG2': 1.413, 'N': 115.479},
 6: {'H': 7.687,
     'HA': 4.21,
     'HB2': 1.386,
     'HB3': 1.605,
     'HD11': 0.769,
     'HD12': 0.769,
     'HD13': 0.769,
     'HG': 1.513,
     'N': 117.326},
 7: {'H': 7.819, 'HA': 4.554, 'HB2': 3.136, 'N': 117.08},
 8: {'HD2': 3.745},
 9: {'H': 8.235, 'HA': 4.012, 'HB2': 2.137, 'N': 116.366},
 10: {'H': 7.979,
      'HA': 3.697,
      'HB': 1.88,
      'HG12': 1.601,
      'HG13': 2.167,
      'HG21': 0.847,
      'HG22': 0.847,
      'HG23': 0.847,
      'N': 119.03},
 11: {'H': 7.947, 'HA': 4.369, 'HB3': 2.514, 'N': 117.862},
 12: {'H': 8.191, 'HA': 4.192, 'HB2': 3.156, 'N': 121.264},
 13: {'H': 8.133,
      'HA': 3.817,
      'HB3': 1.788,
      'HD11': 0.862,
      'HD12': 0.862,
      'HD13': 0.862,
      'HG': 1.581,
      'N': 119.136}}

Once you are there we can help you print out this nicely. Below's a spoiler 
;)

def show(residues):
    atoms = set().union(*(r.keys() for r in residues.values()))
    residues = sorted(residues.items())
    for atom in sorted(atoms):
        for residue, lookup in residues:
            print "{} {}={}".format(residue, atom, lookup.get(atom, "nil"))
        print
        print "-----------"
        print