[Tutor] Logical Structure of Snippet
Steven D'Aprano
steve at pearwood.info
Tue May 24 01:56:54 CEST 2011
On Tue, 24 May 2011 06:53:30 am Spyros Charonis wrote:
> Hello List,
>
> I'm trying to read some sequence files and modify them to a
> particular
[...]
You should almost never modify files in place, especially if you need to
insert text. It *might*, sometimes, be acceptable to modify files in
place if you are just over-writing what is already there, but
absolutely not if you have to insert text!
The problem is that file systems don't support insert. They support
shrinking files, adding to the end, and overwriting in place. To
insert, you have to do a LOT more work, which is slow, fragile and
risky: if something goes bad, you end up with a corrupted file.
It is almost always better to read the file into memory, process it,
then write the output back out to the file.
You ask:
> for line sequence file:
> if line.startswith('>P1; ICA ....)
> make a newline
> go to list with extracted tt; fields*
> find the one with the same query (tt; ICA1 ...)*
> insert this field in the newline
This is better to become some variation of:
infile = open('sequence file', 'r')
outfile = open('processed file', 'w')
for line in infile:
outfile.write(line)
if line.startswith('>P1; ICA'):
new_line = ... #### what to do here???
outfile.write(new_info)
outfile.close()
infile.close()
The problem then becomes, how to calculate the new_line above. Break
that into steps:
you have a line that looks like ">P1; ICA1_HUMAN" and you want to
extract the ICA... part.
def extract_ica(line):
line = line.strip()
if not line.startswith('>P1;'):
raise ValueError('not a >P1 line')
p = line.index(';')
s = line[p+1:]
s = s.strip()
if s.startswith('ICA'):
return s
else:
raise ValueError('no ICA... field in line')
Meanwhile, you have a dict (not a list, a dictionary) that looks like
this:
descriptions = {
'ICA1_BOVINE': description,
'ICA1_HUMAN': description,
...}
If you need help assembling this dict, just ask.
With a dict, searches are easy. Making the new line takes three short
lines of code:
key = extract_ica(line)
descr = descriptions[key]
new_line = 'tt; ' + key + ' ' + desc
--
Steven D'Aprano
More information about the Tutor
mailing list