[Tutor] same python script now running much slower
Danny Yoo
dyoo at hashcollision.org
Tue Dec 31 04:41:48 CET 2013
On Mon, Dec 30, 2013 at 5:27 PM, William Ray Wing <wrw at mac.com> wrote:
> On Dec 30, 2013, at 7:54 PM, "Protas, Meredith" <ProtasM at vision.ucsf.edu> wrote:
>
>> Thanks for all of your comments! I am working with human genome information which is in the form of many very short DNA sequence reads. I am using a script that sorts through all of these sequences and picks out ones that contain a particular sequence I'm interested in. Because my data set is so big, I have the data on an external hard drive (but that's where I had it before when it was faster too).
A strong suggestion: please show the content of the program to a
professional programmer and get their informed analysis on the
program. If it's possible, providing a clear description on what
problem the program is trying to solve would be very helpful. It's
very possible that the current program you're working with is not
written with efficiency in mind. In many domains, efficiency isn't
such a concern because the input is relatively small. But in
bioinformatics, the inputs are huge (on the order of gigabytes or
terabytes), and the proper use of memory and cpu matter a lot.
In a previous thread on python-tutor, a bioinformatician was asking
how to load their whole data set into memory. After a few questions,
we realized their data set was about 100 gigabytes or so. Most of us
here then tried to convince the original questioner to reconsider,
that whatever performance gains they thought they were getting by read
the whole file into memory were probably delusional dreams.
I guess I'm trying to say: if you can, show us the source. Maybe
there's something there that needs to be fixed. And maybe Python
isn't even the right tool for the job. From the limited description
you've provided of the problem---searching for a pattern among a
database of short sequences---I'm wondering if you're using BLAST or
not. (http://blast.ncbi.nlm.nih.gov/Blast.cgi)
More information about the Tutor
mailing list