[Fwd: Re: [Tutor] searching for data in one file from another]

Rich Krauter rmkrauter at yahoo.com
Fri Nov 5 15:16:10 CET 2004


[mail originally sent to my address only; attachment removed]

Hej Rich!

Thanks again for the help.

I grabbed your script and put it together like this:


import sys,string
WFILE=open(sys.argv[1], 'w')	
def 
deleteExons(fname2='Z:/datasets/altsplice1.fasta',exons_to_delete='Z:/datasets/Exonlist.txt'):
     f = open(fname2)
     f2 = open(exons_to_delete)
     list = f2.readlines()	
     exon = None
     for line in f:
         if line.startswith('>'):
             exon = line[1:].split('|')[0]
         if exon in list:
             continue
         yield line


if __name__ == '__main__':
	for line in deleteExons():
		print >> WFILE, line,

exonlist is made from the last program you helped me with and consists
of single lines of exons

altsplice1.fasta is 85583 kb
when I run the program it does not shrink the file at all, in fact
althought the first and last 40 lines appear to be the same, the
output file is larger than the original.

It is a normal fast file:


ENSE00001383339.1|ENSG00000187908.1|ENST00000339871.1 
assembly=NCBI34|chr=10_NT

_078087|strand=forward|bases 57203 to 57283|exons plus upstream and 
downstream r
egions for exon
ACCCAGCAAAATGGGGATCTCCACAGTCATCCTTGAAATGTGTCTTTTATGGGGACAAGTTCTATCTACAGGTATTACGT
T


ENSE00001387275.1|ENSG00000187908.1|ENST00000339871.1 
assembly=NCBI34|chr=10_NT

_078087|strand=forward|bases 72877 to 72981|exons plus upstream and 
downstream r
egions for exon
GAGATGGCAGGTGTCAGGGCCGAGTGGAGATCCTATACCGAGGCTCCTGGGGCACCGTGTGTGATGACAGCTGGGACACC
AATGATGCCAACGTGGTCTGTAGGC


ENSE00001378578.1|ENSG00000187908.1|ENST00000339871.1 
assembly=NCBI34|chr=10_NT

_078087|strand=forward|bases 82505 to 82835|exons plus upstream and 
downstream r
egions for exon
CTGAATCCAGTTTGGCCCTGAGGCTGGTGAATGGAGGTGACAGGTGTCAGGGCCGAGTGGAGGTCCTATACCGAGGCTCC
TGGGGCACCGTGTGTGATGACAGCTGGGACACCAATGATGCCAATGTGGTCTGCAGGCAGCTGGGCTGTGGCTGGGCCAT
GTTGGCCCCAGGAAATGCCCGGTTTGGTCAGGGCTCAGGACCCATTGTCCTGGATGACGTGCGCTGCTCAGGGAATGAGT
CCTACTTGTGGAGCTGCCCCCACAATGGCTGGCTCTCCCATAACTGTGGCCATAGTGAAGACGCTGGTGTCATCTGCTCA
GGTGGGCCTCC


ENSE00001379544.1|ENSG00000187908.1|ENST00000339871.1 
assembly=NCBI34|chr=10_NT

_078087|strand=forward|bases 88623 to 89087|exons plus upstream and 
downstream r
egions for exon

Any thoughts?

Scott


More information about the Tutor mailing list