[Tutor] Logical Structure of Snippet

Steven D'Aprano steve at pearwood.info
Tue May 24 01:56:54 CEST 2011


On Tue, 24 May 2011 06:53:30 am Spyros Charonis wrote:
> Hello List,
>
> I'm trying to read some sequence files and modify them to a
> particular
[...]

You should almost never modify files in place, especially if you need to 
insert text. It *might*, sometimes, be acceptable to modify files in 
place if you are just over-writing what is already there, but 
absolutely not if you have to insert text!

The problem is that file systems don't support insert. They support 
shrinking files, adding to the end, and overwriting in place. To 
insert, you have to do a LOT more work, which is slow, fragile and 
risky: if something goes bad, you end up with a corrupted file.

It is almost always better to read the file into memory, process it, 
then write the output back out to the file.


You ask:

> for line sequence file:
>    if line.startswith('>P1; ICA ....)
>        make a newline
>        go to list with extracted tt; fields*
>        find the one with the same query (tt; ICA1 ...)*
>        insert this field in the newline


This is better to become some variation of:

infile = open('sequence file', 'r')
outfile = open('processed file', 'w')
for line in infile:
    outfile.write(line)
    if line.startswith('>P1; ICA'):
        new_line = ... #### what to do here???
        outfile.write(new_info)
outfile.close()
infile.close()


The problem then becomes, how to calculate the new_line above. Break 
that into steps:

you have a line that looks like ">P1; ICA1_HUMAN" and you want to 
extract the ICA... part.

def extract_ica(line):
    line = line.strip()
    if not line.startswith('>P1;'):
        raise ValueError('not a >P1 line')
    p = line.index(';')
    s = line[p+1:]
    s = s.strip()
    if s.startswith('ICA'):
        return s
    else:
        raise ValueError('no ICA... field in line')


Meanwhile, you have a dict (not a list, a dictionary) that looks like 
this:

descriptions = {
    'ICA1_BOVINE': description, 
    'ICA1_HUMAN': description, 
    ...}

If you need help assembling this dict, just ask.

With a dict, searches are easy. Making the new line takes three short 
lines of code:

    key = extract_ica(line)
    descr = descriptions[key]
    new_line = 'tt; ' + key + ' ' + desc




-- 
Steven D'Aprano


More information about the Tutor mailing list