[Tutor] Problem When Iterating Over Large Test Files

Thu Jul 19 05:33:21 CEST 2012

On Wed, Jul 18, 2012 at 8:23 PM, Lee Harr <missive at hotmail.com> wrote:
>
>>   grep ^TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT$> with no results
>
> How about:
> grep TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT outfile
> Just in case there is some non-printing character in there...

There are many instances of that sequence of characters in the RAW
input file, but that is what I would expect.

>
> Beyond that ... my guess would be that you are either not readingthe file you think you are, or not writing the file you think you are  :o)
> out = each.replace('/gzip', '/rem_clusters2')
> Seems pretty bulletproof, but maybe just print each and out hereto make sure...

Checked this multiple times

>
> Also, I'm curious... Reading your code, I sort of feel like when I amlistening to a non-native speaker. I always get the urge to throw out thecorrect "Americanisms" for people -- to help them fit in better. So, I hope itdoes not make me a jerk, but ...
> infile = open(each, 'r') # I'd probably drop the 'r' also...

working in science, I try to be as explicit as possible, I've come to
dislike Perl for this reason.

> while not check_for_end_of_file:
> reads += 1
> head, sep, tail = id_line_1.partition(' ') # or, if I'm only using the one thing ..._, _, meaningful_name = id_line_1.partition(' ') # maybe call it "selector", then ...
> if selector in ('1:N:0:', '2:N:0:'):
>

Points taken, thanks.

> Hope this helps.