[Tutor] Python execution timer/proficiency testing

Tue Aug 27 02:04:19 CEST 2013

On 26/8/2013 14:20, Dino Bektešević wrote:

Please post using text email, not html.  All the extra junk is a waste
of space, and really slows down reading.  There are other problems
caused sometimes, but you haven't hit those yet.

>
> <div dir="ltr"><div><div><div><div>Hello,<br><br></div>I'm interested in learning more about testing a program proficiency and how to measure execution times in seconds.

I think you mean efficiency, not proficiency.

> I have a very repetitive functions and methods that work on images
> with large amount of measuring points each one working with numpy.ndarrays which is really taxing and has to be really repetitive because I either use .fill() or either have to go pix by pix.<br>

I think you're saying it's slow, but you're not sure how slow.

> </div>I don't dare to run my program on a batch of ~9.5million images because I can't assert how long could it last and because of obvious space issues my program edits the information on
>  the images and then overwrites the original data. Should something go
> awry I'd have to spend a long time cleaning it up.

Doesn't matter how slow or fast a program is.  If it's not reliable,
you'd better not run it with in-place updating of valuable data.  And if
it is reliable, so you trust it, but too slow to run in one pass, then
you'd better arrange that each file is recognizably done or not done. 
For example, you might drop a marker into a directory, make a copy of
that directory, work on the copy, then only when the whole directory is
finished do you move the files back where they belong and remove the
marker.  Pick your method such that it would only take a few seconds for
the program to figure out where it had left off.

> My plan is to test average profficiency on ~100 000 images to see how
> it fairs and what to do next.<br>
> <br></div>So far it takes about 1.5-2sec per image using the "guess_by_eye" method (which isn't long, 

If we assume 1 second per file, you're talking 3 years of execution
time for 10 million files.  Assuming you can wait that long, you'll also
have to figure that the program/os/computer will crash a few times in
that span.  So restartability is mandatory.  is anything else going to
be using these files, or this computer, in the meantime?

How big are these files, in total?  It may be much more practical to
have a separate drive(s) to hold the results.

> the first version took 25sec xD, but I will still add couple of
> functions) but I still get the following warning:<br>
> <br><div style="margin-left:40px"><span style="color:rgb(204,0,0)">Warning (from warnings module):<br>  File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 152<br>    warnings.warn(msg, RuntimeWarning)<br>
> RuntimeWarning: The iteration is not making good progress, as measured by the <br>  improvement from the last ten iterations.</span><br><br></div>It doesn't seem to produce any error in my data, but how dangerous is this?<br>
> <br><br></div>thanks!<br>Dino<br></div>

That sounds like a question specific to scipy.  If I had to guess what
it means, I'd say that the processing time per file varies too much to
show a meaningful progress bar.

Perhaps that last question should be made on a scipy forum, or at least
on the main python-list. This tutor list is intended for questions on
the language itself. No harm in asking here, but you reduce the odds
that somebody has actually used scipy enough to know that message.

You can do your own timings, if you like, using time.time() or other
mechanisms.  Which one is more accurate depends on which OS you're
running.  But if you make measurements of say 100 files, then any
methods is accurate enough.

-- 
DaveA