[Tutor] Simple string processing problem

Max Noel maxnoel_fr at yahoo.fr
Fri May 13 22:17:27 CEST 2005


On May 13, 2005, at 20:36, cgw501 at york.ac.uk wrote:

> Hi,
>
> i am a Biology student taking some early steps with programming. I'm
> currently trying to write a Python script to do some simple  
> processing of a
> gene sequence file.

     Welcome aboard!

> A line in the file looks like:
> SCER   ATCGATCGTAGCTAGCTATGCTCAGCTCGATCagctagtcgatagcgat
>
> Ther are many lines like this. What I want to do is read the file and
> remove the trailing lowercase letters and create a new file  
> containing the
> remaining information. I have some ideas of how to do this (using the
> isLower() method of the string module. I was hoping someone could  
> help me
> with the file handling. I was thinking I'd us .readlines() to get a  
> list of
> the lines, I'm not sure how to delete the right letters or write to  
> a new
> file. Sorry if this is trivially easy.

     First of all, you shouldn't use readlines() unless you really  
need to have access to several lines at the same time. Loading the  
entire file in memory eats up a lot of memory and scales up poorly.  
Whenever possible, you should iterate over the file, like this:


foo = open("foo.txt")
for line in foo:
     # do stuff with line...
foo.close()


     As for the rest of your problem, the strip() method of string  
objects is what you're looking for:


 >>> "SCER   ATCGATCGTAGCTAGCTATGCTCAGCTCGATCagctagtcgatagcgat".strip 
("atgc")
'SCER   ATCGATCGTAGCTAGCTATGCTCAGCTCGATC'


     Combining those 2 pieces of advice should solve your problem.

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting  
and sweating as you run through my corridors... How can you challenge  
a perfect, immortal machine?"



More information about the Tutor mailing list