[Tutor] map one file and print it out following the sequence

Thu Oct 13 17:43:15 CEST 2011

On 10/13/2011 09:09 AM, lina wrote:
> <snip>
>
>> I think your final version of sortfile() might look something like:
>>
>> def sortfile(infilename=**INFILENAME, outfilename=OUTFILENAME):
>>     infile = open(infilename, "r")
>>     intext = infile.readlines()
>>     outfile = open(OUTFILENAME, "w")
>>     for chainid in CHAINID:
>>         print("chain id = ",chainid)
>>          sortoneblock(chainid, intext, outfile)
>>     infile.close()
>>     outfile.close()
>>
>
> $ python3 map-to-itp.py
> {'O4': '2', 'C19': '3', 'C21': '1'}
> C
> Traceback (most recent call last):
>    File "map-to-itp.py", line 55, in<module>
>      sortfile()
>    File "map-to-itp.py", line 17, in sortfile
>      sortoneblock(chainid,intext,OUTFILENAME)
>    File "map-to-itp.py", line 29, in sortoneblock
>      f.write(line[1].strip() for line in temp)
> TypeError: must be str, not generator
>
>

When you see an error message that describes a generator, it means you 
usually have a for-expression used as a value.

At your stage of learning you probably be ignoring generators and list 
comprehensions, and just write simple for loops.  So you should replace
the f.write with a loop.

         for item in temp:
             f.write(something + "\n")

One advantage is that you can easily stuff print() functions into the 
loop, to debug what's really happening.  After you're sure it's right, 
it might be appropriate to use either a generator or a list comprehension.

> I don't know how to fix the writing issue.
>
> can I write the different chainID one into the same OUTFILE?
>
> Thanks, I attached the code I used below:
>
>   #!/usr/bin/python3
>
> import os.path
>
> LINESTOSKIP=0
> CHAINID="CDEFGHI"
> INFILENAME="pdbone.pdb"
> OUTFILENAME="sortedone.pdb"
> DICTIONARYFILE="itpone.itp"
> mapping={}
> valuefromdict={}
>
> def sortfile():
>      intext=fetchonefiledata(INFILENAME)
>      for chainid in CHAINID:
>          print(chainid)
>          sortoneblock(chainid,intext,OUTFILENAME)
>
One way to get all the output into one file is to create the file in 
sortfile(), and pass the file object.  Look again at what I suggested 
for sortfile().  If you can open the file once, here, you won't have the 
overhead of constantly opening the same file that nobody closed, and 
you'll have the side benefit that the old contents of the file will be 
overwritten.

Andreas' suggestion of using append would make more sense if you wanted 
the output to accumulate over multiple runs of the program.  If you 
don't want the output file to be the history of all the runs, then 
you'll need to do one open(name, "w"), probably in sortfile(), and then 
you might as well pass the file object as I suggested.

>
>
> def sortoneblock(cID,TEXT,OUTFILE):

If you followed my suggestions for sortfile(), then the last paramter to 
this function would be outfile., and you could use outfile.write().
As Andreas says, don't use uppercase for non-constants.

>      temp = []

         #this writes the cID to the output file, once per cID
	outfile.write(cID + "\n")

>      for line in TEXT:
>          blocks=line.strip().split()
>          if len(blocks)== 11 and  blocks[3] == "CUR" and blocks[4] == cID and
> blocks[2] in mapping.keys():

           if (len(blocks)== 11 and  blocks[3] == "CUR"
                 and blocks[4] == cID and blocks[2] in mapping ):

Having the .keys() in that test is redundant and slows execution down 
quite a bit.  "in" already knows how to look things up efficiently in a 
dictionary, so there's no use in converting to a slow list before doing 
the slow lookup.
Also, if you put parentheses around the whole if clause, you can span it
across multiple lines without doing anything special.

>              temp.append((mapping[blocks[2]],line))
>      temp.sort()
>      with open(OUTFILE,"w") as f:
>          f.write(line[1].strip() for line in temp)
>
See comment above for splitting this write into a loop.  You also are 
going to have to decide what to write, as you have tuple containing both 
an index number and a string in each item of temp.  Probably you want to 
write the second item of the tuple. Combining these changes, you
would have
        for index, line in temp:
            outfile.write(line + "\n")

Note that the following are equivalent:
        for item in temp:
             index, line = item
             outfile.write(line + "\n")

        for item in temp:
             outfile.write(item[1] + "\n")

But I like the first form, since it makes it clear what's been stored in 
temp.  That sort of thing is important if you ever change it.
>
>
>
> def generatedictionary(dictfilename):
>      text=fetchonefiledata(DICTIONARYFILE)
>      for line in text:
>          parts=line.strip().split()
>          if len(parts)==8:
>              mapping[parts[4]]=parts[0]
>      print(mapping)
>
>
>
> def fetchonefiledata(infilename):
>      text=open(infilename).readlines()
>      if os.path.splitext(infilename)[1]==".itp":
>          return text
>      if os.path.splitext(infilename)[1]==".pdb":
>          return text[LINESTOSKIP:]
>      infilename.close()
>
>
> if __name__=="__main__":
>      generatedictionary(DICTIONARYFILE)
>      sortfile()
>

Final note: write() doesn't automatically append a newline, so I tend to 
add an explicit one in the write() itself.  But if you start seeing 
double spacing, that's presumably because the line already had a newline 
in it.  You could use rstrip() on it (my choice), or remove the + "\n" 
in the write() method.

-- 

DaveA