Need help with a program
jjposner at optimum.net
Thu Jan 28 18:23:36 CET 2010
On 1/28/2010 10:50 AM, evilweasel wrote:
> I will make my question a little more clearer. I have close to 60,000
> lines of the data similar to the one I posted. There are various
> numbers next to the sequence (this is basically the number of times
> the sequence has been found in a particular sample). So, I would need
> to ignore the ones containing '0' and write all other sequences
> (excluding the number, since it is trivial) in a new text file, in the
> following format:
> The number next to 'seq' is the line number of the sequence. When I
> run the above program, what I expect is an output file that is similar
> to the above output but with the ones containing '0' ignored. But, I
> am getting all the sequences printed in the file.
> Kindly excuse the 'newbieness' of the program. :) I am hoping to
> improve in the next few months. Thanks to all those who replied. I
> really appreciate it. :)
Your program is a good first try. It contains a newbie error (looking
for the number 0 instead of the string "0"). But more importantly,
you're doing too much work yourself, rather than letting Python do the
heavy lifting for you. These practices and tools make life a lot easier:
* As others have noted, don't accumulate output in a list. Just write
data to the output file line-by-line.
* You don't need to initialize every variable at the beginning of the
program. But there's no harm in it.
* Use the enumerate() function to provide a line counter:
for counter, line in enumerate(file1):
This eliminates the need to accumulate output data in a list, then use
the index variable "j" as the line counter.
* Use string formatting. Each chunk of output is a two-line string, with
the line-counter and the DNA sequence as variables:
outformat = """seq%05d
... later, inside your loop ...
resultsfile.write(outformat % (counter, sequence))
More information about the Python-list