[Fwd: Re: [Tutor] searching for data in one file from another]

Scott Melnyk melnyk at gmail.com
Mon Nov 8 14:58:11 CET 2004

Hello all!
Thanks for the guidance so far.
The fasta file is such that one line begins with > and contains the
name of the exon, the transcript it is from the gene, etc.  The next
line down contains the actual sequence

 I needed to match on the "title line" (beginning with >) line then remove
that line and the following, or just not write into the new file that
line and the following I guess is the more correct way to describe it.

I modified things to:

import sys,string
from sets import Set

WFILE=open(sys.argv[1], 'w')	
def deleteExons(fname2='Z:/datasets/altsplice1.fasta',exons_to_delete='Z:/datasets/Exonlist.txt'):
	f = open(fname2)
	f2 = open(exons_to_delete)
	for line in open(exons_to_delete):
	exon = None
	for line in f:
		if flag:
		if line.startswith('>'):
			exon = line[1:].split('|')[0]
		if exon in sExcise:
		yield line

if __name__ == '__main__':
	for line in deleteExons():
		print >> WFILE, line,	#write new file minus the redundant exons

Everything seems to be working now.  The original fasta file was aprox
85 mb and now is down to 47 mb after the information matching the
excise file was removed.

I am moving on to my next steps now but still interested in comments
on how this could be done more effectively.

Thanks again to all for their input.


On Sat, 06 Nov 2004 01:12:26 -0500, Kent Johnson
<kent_johnson at skillsoft.com> wrote:
> At 12:52 AM 11/6/2004 -0500, Rich Krauter wrote:
> Kent,
> >Thanks very much for the reply. Right on the money and within a few
> >minutes, as usual. I'm starting to think you're an automated help system.
> >The OP should find these suggestions helpful for cleaning up his code.
> Just call me the Kent-bot!
> :-)
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

Scott Melnyk

More information about the Tutor mailing list