[Tutor] arrangement of datafile

Thu Jan 9 13:41:58 CET 2014

Amrita Kumari wrote:

> On 17th Dec. I posted one question, how to arrange datafile in a
> particular fashion so that I can have only residue no. and chemical
> shift value of the atom as:
> 1  H=nil
> 2  H=8.8500
> 3  H=8.7530
> 4  H=7.9100
> 5  H=7.4450
> ........
> Peter has replied to this mail but since I haven't subscribe to the
> tutor mailing list earlier hence I didn't receive the reply, I
> apologize for my mistake, today I checked his reply and he asked me to
> do few things:

I'm sorry, I'm currently lacking the patience to tune into your problem 
again, but maybe the script that I wrote (but did not post) back then is of 
help.

The data sample:

$ cat residues.txt
1 GLY HA2=3.7850 HA3=3.9130
2 SER H=8.8500 HA=4.3370 N=115.7570
3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 
HD12=0.7690 HD13=0.7690 N=117.3260
7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
8 PRO HD2=3.7450
9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 
HG12=1.6010 HG13=2.1670 N=119.0300
11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 
HD13=0.8620 N=119.1360

The script:

$ cat residues.py
def process(filename):
    residues = {}
    with open(filename) as infile:
        for line in infile:
            parts = line.split()            # split line at whitespace
            residue = int(parts.pop(0))     # convert first item to integer
            if residue in residues:
                raise ValueError("duplicate residue {}".format(residue))
            parts.pop(0)                    # discard second item

            # split remaining items at "=" and put them in a dict,
            # e. g. {"HA2": 3.7, "HA3": 3.9}
            pairs = (pair.split("=") for pair in parts)
            lookup = {atom: float(value) for atom, value in pairs}

            # put previous lookup dict in residues dict
            # e. g. {1: {"HA2": 3.7, "HA3": 3.9}}
            residues[residue] = lookup

    return residues

def show(residues):
    atoms = set().union(*(r.keys() for r in residues.values()))
    residues = sorted(residues.items())
    for atom in sorted(atoms):
        for residue, lookup in residues:
            print "{} {}={}".format(residue, atom, lookup.get(atom, "nil"))
        print
        print "-----------"
        print

if __name__ == "__main__":
    r = process("residues.txt")
    show(r)

Note that converting the values to float can be omitted if all you want to 
do is print them. Finally the output of the script:

$ python residues.py 
1 H=nil
2 H=8.85
3 H=8.753
4 H=7.91
5 H=7.445
6 H=7.687
7 H=7.819
8 H=nil
9 H=8.235
10 H=7.979
11 H=7.947
12 H=8.191
13 H=8.133

-----------

1 HA=nil
2 HA=4.337
3 HA=4.034
4 HA=3.862
5 HA=4.077
6 HA=4.21
7 HA=4.554
8 HA=nil
9 HA=4.012
10 HA=3.697
11 HA=4.369
12 HA=4.192
13 HA=3.817

-----------

[snip]