Need help with a program

nn pruebauno at latinmail.com
Thu Jan 28 11:13:09 EST 2010


On Jan 28, 10:50 am, evilweasel <karthikramaswam... at gmail.com> wrote:
> I will make my question a little more clearer. I have close to 60,000
> lines of the data similar to the one I posted. There are various
> numbers next to the sequence (this is basically the number of times
> the sequence has been found in a particular sample). So, I would need
> to ignore the ones containing '0' and write all other sequences
> (excluding the number, since it is trivial) in a new text file, in the
> following format:
>
> >seq59902
>
> TTTTTTTATAAAATATATAGT
>
> >seq59903
>
> TTTTTTTATTTCTTGGCGTTGT
>
> >seq59904
>
> TTTTTTTGGTTGCCCTGCGTGG
>
> >seq59905
>
> TTTTTTTGTTTATTTTTGGG
>
> The number next to 'seq' is the line number of the sequence. When I
> run the above program, what I expect is an output file that is similar
> to the above output but with the ones containing '0' ignored. But, I
> am getting all the sequences printed in the file.
>
> Kindly excuse the 'newbieness' of the program. :) I am hoping to
> improve in the next few months. Thanks to all those who replied. I
> really appreciate it. :)

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
   partgen=(line.split() for line in infile)
   dnagen=(str(i+1)+'\n'+part[0]+'\n'
           for i,part in enumerate(partgen)
           if len(part)>1 and part[1]!='0')
   outfile.writelines(dnagen)




More information about the Python-list mailing list