[Tutor] Fwd: arrangement of datafile

Amrita Kumari amrita.g13 at gmail.com
Thu Jan 9 07:51:21 CET 2014


Hi,

Sorry for delay in reply(as internet was very slow from past two
days), I tried this code which you suggested (by saving it in a file):

import csv
with open('19162.csv') as f:
   reader = csv.reader(f)
   for row in reader:
      print(row)
      row[0] = int(row[0])
      key,value = item.split('=', 1)
      value = float(value)
      print(value)

and I got the output as:

C:\Python33>python 8.py
['2', 'ALA', 'C=178.255', 'CA=53.263', 'CB=18.411', '', '', '', '', '', '', '',
'', '', '']
Traceback (most recent call last):
  File "8.py", line 7, in <module>
    key,value = item.split('=', 1)
NameError: name 'item' is not defined

my datafile is like this:

2,ALA,C=178.255,CA=53.263,CB=18.411,,,,,,,,,,
3,LYS,H=8.607,C=176.752,CA=57.816,CB=31.751,N=119.081,,,,,,,,
4,ASN,H=8.185,C=176.029,CA=54.712,CB=38.244,N=118.255,,,,,,,,
5,VAL,H=7.857,HG11=0.892,HG12=0.892,HG13=0.892,HG21=0.954,HG22=0.954,HG23=0.954,C=177.259,CA=64.232,CB=31.524,CG1=21.402,CG2=21.677,N=119.998
6,ILE,H=8.062,HG21=0.827,HG22=0.827,HG23=0.827,HD11=0.807,HD12=0.807,HD13=0.807,C=177.009,CA=63.400,CB=37.177,CG2=17.565,CD1=13.294,N=122.474
7,VAL,H=7.993,HG11=0.879,HG12=0.879,HG13=0.879,HG21=0.957,HG22=0.957,HG23=0.957,C=177.009,CA=65.017,CB=31.309,CG1=21.555,CG2=22.369,N=120.915
8,LEU,H=8.061,HD11=0.844,HD12=0.844,HD13=0.844,HD21=0.810,HD22=0.810,HD23=0.810,C=178.655,CA=56.781,CB=41.010,CD1=25.018,CD2=23.824,N=121.098
9,ASN,H=8.102,C=176.695,CA=54.919,CB=38.674,N=118.347,,,,,,,,
10,ALA,H=8.388,HB1=1.389,HB2=1.389,HB3=1.389,C=178.263,CA=54.505,CB=17.942,N=124.124,,,,,
----------------------
------------------------
where 1st element of each row is the residue no. but it is not
continuous (some are missing also for example the 1st row is starting
from resdiue no. 2 not from 1) second element of each row is the name
of amino acid and rest element of each row are the various atom along
with chemical shift information corresponding to that particular amino
acid for example H=8.388 is showing that atom is H and it has chemical
shift value 8.388. But the arrangement of these atoms in each row are
quite random and in few row there are many more atoms and in few there
are less. This value I got from Shiftx2 web server. I just want to
align the similar atom chemical shift value into one column (along
with residue no.) for example for atom C, it could be:

2 C=178.255
3 C=176.752
4  C=176.029
5 C=177.259
-----------
-----------

for atom H, it could be:

2 H=nil
3 H=8.607
4 H=8.185
5 H=7.857
6 H=8.062
----------------
-----------
and so on. So if a row doesn't have that atom (for ex. row 1 doesn't
have H atom) then if it can print nil that I can undestand that it is
missing for that particular residue. This arrangement I need in order
to compare this chemical shift value with other web server generated
program.

Thanks,
Amrita



and got the output as:

On 1/7/14, Steven D'Aprano <steve at pearwood.info> wrote:
> On Mon, Jan 06, 2014 at 04:57:38PM +0800, Amrita Kumari wrote:
>> Hi Steven,
>>
>> I tried this code:
>>
>> import csv
>> with open('file.csv') as f:
>>      reader = csv.reader(f)
>>      for row in reader:
>>          print(row)
>>          row[0] = int(row[0])
>>
>> up to this extent it is ok; it is ok it is giving the output as:
>>
>> ['1' , ' GLY' ,  'HA2=3.7850' ,  'HA3=3.9130' , ' ' , ' ' , ' ' , ' ']
>> [ '2' ,  'SER' ,  'H=8.8500' ,  'HA=4.3370' ,  'N=115.7570' , ' ' , ' ' ,
>> '
>> ']
>
> It looks like you are re-typing the output into your email. It is much
> better if you copy and paste it so that we can see exactly what happens.
>
>
>> but the command :
>>
>> key, value = row[2].split('=', 1)
>>         value = float(value.strip())
>>         print(value)
>>
>> is giving the value of row[2] element as
>>
>> ['1' , ' GLY' ,  'HA2=3.7850' ,  'HA3=3.9130' , ' ' , ' ' , ' ' , ' ']
>> 3.7850
>> [ '2' ,  'SER' ,  'H=8.8500' ,  'HA=4.3370' ,  'N=115.7570' , ' ' , ' ' ,
>> '
>> ']
>> 8.8500
>
> So far, the code is doing exactly what you told it to do. Take the third
> column (index 2), and split on the equals sign. Convert the part on the
> right of the equals sign to a float, and print the float.
>
>
>> so this is not what I want I want to print all the chemical shift value
>> of
>> similar atom from each row at one time
>
> Okay, then do so. You'll have to write some code to do this.
>
>
>> like this:
>>
>> 1 HA2=3.7850
>> 2 HA2=nil
>> 3 HA2=nil
>
> Where do these values come from?
>
>
>
>> .....
>> ............
>> ..........
>> 13 HA2=nil
>>
>> similarly, for atom HA3:
>>
>> 1 HA3=3.9130
>> 2 HA3=nil
>> 3 HA3=nil
>> ...........
>> ............
>> ............
>> 13 HA3=nil  and so on.
>>
>> so how to split each item into a key and a numeric value
>
> I've already shown you how to split an item into a key and numeric
> value. Here it is again:
>
> key, value = item.split('=', 1)
> value = float(value)
>
>
>> and then search
>> for similar atom and print its chemical shift value at one time along
>> with
>> residue no..
>
> I don't know what a chemical shift value and residue number are.
> Remember, we are Python programmers, not chemists or biochemists or
> whatever your field is. We don't know how to solve your problem, but if
> you describe in simple English terms how you would solve that problem,
> we can probably help you turn it into Python code.
>
> Start with one row, containing this data:
>
> '2', 'SER', 'H=8.8500', 'HA=4.3370', 'N=115.7570', '', '', ''
>
> There are eight columns. What do those columns represent? In simple
> English terms, what would you like to do with those columns? Tell us
> step by step, as if you were explaining to a small child or a computer.
>
>
>
> --
> Steven
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list