[Tutor] Python execution timer/proficiency testing (Dino Bektešević)

Tue Aug 27 16:05:19 CEST 2013

Hello,

First off thank you for the good responses. That's most likely why I
couldn't find it with google, English is not my native language so the
little difference between proficiency and efficiency escaped me.
I apologize for HTML text first time I sent something from gmail there
seemed to be no issues.

2013/8/27  <tutor-request at python.org>:
> Message: 5
> Date: Tue, 27 Aug 2013 00:04:19 +0000 (UTC)
> From: Dave Angel <davea at davea.name>
> To: tutor at python.org
> Subject: Re: [Tutor] Python execution timer/proficiency testing
> Message-ID: <kvgqe1$hfm$1 at ger.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-2
>
>> </div>I don't dare to run my program on a batch of ~9.5million images because I can't assert how long could it last and because of obvious space issues my program edits the information on
>>  the images and then overwrites the original data. Should something go
>> awry I'd have to spend a long time cleaning it up.
>
> Doesn't matter how slow or fast a program is.  If it's not reliable,
> you'd better not run it with in-place updating of valuable data.  And if
> it is reliable, so you trust it, but too slow to run in one pass, then
> you'd better arrange that each file is recognizably done or not done.
> For example, you might drop a marker into a directory, make a copy of
> that directory, work on the copy, then only when the whole directory is
> finished do you move the files back where they belong and remove the
> marker.  Pick your method such that it would only take a few seconds for
> the program to figure out where it had left off.
>
>> My plan is to test average profficiency on ~100 000 images to see how
>> it fairs and what to do next.<br>
>> <br></div>So far it takes about 1.5-2sec per image using the "guess_by_eye" method (which isn't long,
>
> If we assume 1 second per file, you're talking 3 years of execution
> time for 10 million files.  Assuming you can wait that long, you'll also
> have to figure that the program/os/computer will crash a few times in
> that span.  So restartability is mandatory.  is anything else going to
> be using these files, or this computer, in the meantime?
>
> How big are these files, in total?  It may be much more practical to
> have a separate drive(s) to hold the results.
>

The entire database is ~60TB large so you don't have to worry that I
will try to run the entire thing at once on one computer. They will be
split up in smaller sections called 'runs' couple of runs will be
processed on a single comp and multiple comps will be used. Number of
files per run varies and I will most likely not try to change that.
The computers will not be used for anything else in the meantime. I'm
still waiting to hear from my mentor if the server will be available.
They are 'FITS' files and also vary from 12-16MB taken from Sloan
Digital Sky Survey (SDSS) db. Because of the organisation of FITS
files I need the information in 'headers' to further process the image
itself, by overwriting the original data I save space and time it
takes me to copy paste the files. However you are right I will mostly
likely add a 'FLAG' entry into the header so I can restart should
something happen.

> Date: Tue, 27 Aug 2013 00:04:19 +0000 (UTC)
> From: eryksun <eryksun at gmail.com>
> To: tutor at python.org, Dino Bektešević <ljetibo at gmail.com>
> Subject: Re: [Tutor] Python execution timer/proficiency testing
>The MINPACK routine called by fsolve() failed to converge; it quit
>after making little or no progress over 10 consecutive iterations.
>Maybe you need a better initial estimate; maybe there's no solution.

> From: Oscar Benjamin <oscar.j.benjamin at gmail.com>
> To: tutor at python.org, Dino Bektešević <ljetibo at gmail.com>, eryksun <eryksun at gmail.com>
> Subject: Re: [Tutor] Python execution timer/proficiency testing
>Exactly. Dino, whatever scipy routine you're using is warning you that
>it has failed. You should heed this warning since it likely means that
>your code is not doing what you want it to do. Without knowing what
>you're trying to do and what function you're calling I can't say more
>than that.

Thank you both I did not know it quits(!) but since my further code
never reported an error I assume it returned something similar to
initial guess?
I will add a test of the returned variable ier and try to find another
initial guess or handle it somehow else.
Under 'Narrow-field astrometry' are the equations I'm solving and
here's the code snippet:

row_guess = ( mudiff*fd['f'] - fd['c']*nudiff )/det
col_guess = ( fd['b']*nudiff - mudiff*fd['e'] )/det

row=zeros(mu.size,dtype='f8')
col=zeros(mu.size,dtype='f8')
for i in xrange(mu.size):
    self._tmp_color=color[i]

    self._tmp_munu=array([mu[i],nu[i]])

    rowcol_guess=array([row_guess[i], col_guess[i]])

    rowcol = scipy.optimize.fsolve(self._pix2munu_for_fit, rowcol_guess)
    row[i] = rowcol[0]
    col[i] = rowcol[1]

Thanks,
Dino