iteration slowing, no increase in memory
Hello, I have a routine that is iterating through a series of directories, loading files, plotting, then moving on... It runs very well for the first few iterations, but then slows tremendously - there is nothing significantly different about the files or directory in which it slows. I've monitored the memory use, and it is not increasing. I've looked at what other possible explanations there may be, but I am at a loss. Does anyone have suggestions for where to start looking. I recognized without the code it is difficult, but I don't know that there is any one 'piece' of code to post, and it's problem not of interest for me to post the entire script here. Thanks! -- View this message in context: http://www.nabble.com/iteration-slowing%2C-no-increase-in-memory-tp25387205p... Sent from the Numpy-discussion mailing list archive at Nabble.com.
On Thu, Sep 10, 2009 at 12:03, John [H2O]<washakie@gmail.com> wrote:
Hello,
I have a routine that is iterating through a series of directories, loading files, plotting, then moving on...
It runs very well for the first few iterations, but then slows tremendously - there is nothing significantly different about the files or directory in which it slows.
One thing you can do to verify this is to change the order of iteration. You will also want to profile your code. Then you can see what is taking up so much time. http://docs.python.org/library/profile -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 10-Sep-09, at 1:09 PM, Robert Kern wrote:
One thing you can do to verify this is to change the order of iteration. You will also want to profile your code. Then you can see what is taking up so much time.
Because apparently Robert is too modest to mention his own contribution to profiling: http://packages.python.org/line_profiler/ David
On Thu, Sep 10, 2009 at 14:39, David Warde-Farley<dwf@cs.toronto.edu> wrote:
On 10-Sep-09, at 1:09 PM, Robert Kern wrote:
One thing you can do to verify this is to change the order of iteration. You will also want to profile your code. Then you can see what is taking up so much time.
Because apparently Robert is too modest to mention his own contribution to profiling:
Not at all. It's just not relevant yet. line_profiler is good if you know that a particular function is taking too long but don't know why. You have to use cProfile first to figure out which function, if any, is the bottleneck. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Sep 10, 2009 at 10:03 AM, John [H2O] <washakie@gmail.com> wrote:
It runs very well for the first few iterations, but then slows tremendously - there is nothing significantly different about the files or directory in which it slows. I've monitored the memory use, and it is not increasing.
The memory use itself is not a good indicator, as modern operating systems (Linux, Windows, Mac, et al) generally use all available free memory as a disk cache. So the system memory use may remain quite steady while old data is flushed and new data paged in. The first few iterations could be "fast" if they are already in memory, although the behavior should probably change on repeated runs. If you reboot, then immediately run the script, is it slow on all directories? Or if you can't reboot, can you at least remount the filesystem (which should flush all the cached data and metadata)? Or, for recent Linux kernels: http://linux-mm.org/Drop_Caches Are other operations slow/fast for the different directories, such as tar'ing them up, or "du -s"? Can you verify the integrity of the drive with SMART tools? If its Linux, can you get data on the actual disk device I/O (using "iostat" or "vmstat")? Or you could test by iterating over the same directory repeatedly; it should be fast after the first iteration. Then move to a "problem" directory and see if the first iteration only is slow, or if all iterations are slow. -C
On Thursday 10 September 2009 19:03:20 John [H2O] wrote:
I have a routine that is iterating through a series of directories, loading files, plotting, then moving on...
It runs very well for the first few iterations, but then slows tremendously
Maybe you "collect" some data into growing data structures and some of your algorithms have non-constant time complexity w.r.t the size of these? HTH, Hans
On Thu, Sep 10, 2009 at 10:03 AM, John [H2O]<washakie@gmail.com> wrote:
I have a routine that is iterating through a series of directories, loading files, plotting, then moving on...
It runs very well for the first few iterations, but then slows tremendously
You mention plotting. I'd suggest checking that you aren't holding state inside matplotlib, which is exceedingly easy to do without noticing if you only use the pylab/pyplot interface and don't take care to clear things out. As a quick check, disable the plotting by commenting the plot commands out, but leave the rest of the code to run. That will help isolate whether the problem is indeed in your plotting code. Cheers, f
participants (6)
-
Chad Netzer
-
David Warde-Farley
-
Fernando Perez
-
Hans Meine
-
John [H2O]
-
Robert Kern