Need help with a program

Arnaud Delobelle arnodel at googlemail.com
Thu Jan 28 12:00:54 EST 2010


nn <pruebauno at latinmail.com> writes:

> On Jan 28, 10:50 am, evilweasel <karthikramaswam... at gmail.com> wrote:
>> I will make my question a little more clearer. I have close to 60,000
>> lines of the data similar to the one I posted. There are various
>> numbers next to the sequence (this is basically the number of times
>> the sequence has been found in a particular sample). So, I would need
>> to ignore the ones containing '0' and write all other sequences
>> (excluding the number, since it is trivial) in a new text file, in the
>> following format:
>>
>> >seq59902
>>
>> TTTTTTTATAAAATATATAGT
>>
>> >seq59903
>>
>> TTTTTTTATTTCTTGGCGTTGT
>>
>> >seq59904
>>
>> TTTTTTTGGTTGCCCTGCGTGG
>>
>> >seq59905
>>
>> TTTTTTTGTTTATTTTTGGG
>>
>> The number next to 'seq' is the line number of the sequence. When I
>> run the above program, what I expect is an output file that is similar
>> to the above output but with the ones containing '0' ignored. But, I
>> am getting all the sequences printed in the file.
>>
>> Kindly excuse the 'newbieness' of the program. :) I am hoping to
>> improve in the next few months. Thanks to all those who replied. I
>> really appreciate it. :)
>
> People have already given you some pointers to your problem. In the
> end you will have to "tweak the details" because only you have access
> to the data not us.
>
> Just as example here is another way to do what you are doing:
>
> with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
>    partgen=(line.split() for line in infile)
>    dnagen=(str(i+1)+'\n'+part[0]+'\n'
>            for i,part in enumerate(partgen)
>            if len(part)>1 and part[1]!='0')
>    outfile.writelines(dnagen)

I think that generator expressions are overrated :) What's wrong with:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
    for i, line in enumerate(infile):
        parts = line.split()
        if len(parts) > 1 and parts[1] != '0':
            outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))

(untested)

-- 
Arnaud



More information about the Python-list mailing list