Mailman 3 return "full_output" or how to stop throwing away already calculated results - SciPy-User

newer
<transferred from Scipy-Dev>Re:...

return "full_output" or how to stop throwing away already calculated results

older
fromimage/imread segfaults on my...

josef.pktd＠gmail.com

9 Mar 2010 9 Mar '10

9:01 p.m.

scipy functions throw away a lot of intermediate results examples fmin_slsqp ticket:1129 scipy.linalg.lstsq throws away residuals : if n<m: x1 = x[:n] if rank==n: resids = sum(x[n:]**2,axis=0) x = x1 return x,resids,rank,s in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...) Is it possible to start the same pattern in scipy? E.g. for optimizers like slsqp, if we want to change the returns then each time it breaks the API or we have to increase full_output= 1,2,3 ( like nargout in matlab) (ugly) I saw linalg.lstsq by chance this morning, but I assume there is additional useful information hidden in other functions. Any opinion ? Josef

Show replies by date

David Warde-Farley

10 Mar 10 Mar

1:56 a.m.

On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:

...

in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)

David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this. I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is). David

Rob Falck

3:37 a.m.

Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum result.x result.objfun result.iterations result.exit_status Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes. On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...

On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:

...
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)

David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.

I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).

David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

-- - Rob Falck

David Warde-Farley

4:45 a.m.

On 9-Mar-10, at 10:37 PM, Rob Falck wrote:

...

Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum

result.x result.objfun result.iterations result.exit_status

Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.

It would require at least one version's worth of deprecation of the old interface, probably two or more. Better to settle on the interface, implement it, and introduce it alongside. In my opinion, the more descriptive the names, the better: result.minimizer instead of result.x, result.minimum_found or result.minimum_value rather than result.objfun, etc. - if we're going to add a layer of complexity then we might as well make the names as unambiguous as possible. Also often of interest: number of function evaluations, gradient evaluations, Hessian evaluations, etc. The current report that gets printed could also form a print_summary() method for these objects or something like that. David

josef.pktd＠gmail.com

4:54 a.m.

On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:

...

Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum

result.x result.objfun result.iterations result.exit_status

Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.

If we create new functions, as David argued, then the old signature could still be kept. I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations. The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result. Josef

...

On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:

...
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)

David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.

I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).

David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

-- - Rob Falck _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

josef.pktd＠gmail.com

5:32 a.m.

On Tue, Mar 9, 2010 at 11:54 PM, <josef.pktd@gmail.com> wrote:

...

On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:

...
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum

result.x result.objfun result.iterations result.exit_status

Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.

If we create new functions, as David argued, then the old signature could still be kept.

I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations.

The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result.

something like this might work, without any undesired calculations and any break in current API, and return type is independent of "full_output" flag. (short of going to a class with lazy evaluation) def function2a(): '''current function''' return np.arange(5) new: def function5a(full_output=True): '''current function rewritten to return Store''' result = Store() result.a = np.arange(5) if full_output: #results that require additional memory or additional calculations result.b = function0() return result def function5(): '''replacement for current function''' result = function5a(full_output=False) return result.a The extra calculations might not apply for many functions, but I have some in statsmodels for which I could also switch to this kind of pattern instead of making the return type conditional on a keyword. Josef

...

...
On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:

...
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)

David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.

I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).

David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

-- - Rob Falck _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

Rob Falck

12:44 p.m.

At least in the case of fmin_slsqp (and probably most of the other optimizers) the attributes returned when full_output=True are things that are being calculated anyway. There really shouldn't be much penalty in terms of speed. I agree that keeping the old call signatures in a deprecated mode would be a good idea, though I admit I don't really like the idea of changing the function names away from fmin_xxxx. Perhaps we keep the current function definitions and only the full_output argument becomes deprecated? On Wed, Mar 10, 2010 at 12:32 AM, <josef.pktd@gmail.com> wrote:

...

On Tue, Mar 9, 2010 at 11:54 PM, <josef.pktd@gmail.com> wrote:

...
On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:

...
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum

result.x result.objfun result.iterations result.exit_status

Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.

If we create new functions, as David argued, then the old signature could still be kept.

I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations.

The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result.

something like this might work, without any undesired calculations and any break in current API, and return type is independent of "full_output" flag. (short of going to a class with lazy evaluation)

def function2a(): '''current function''' return np.arange(5)

new:

def function5a(full_output=True): '''current function rewritten to return Store''' result = Store() result.a = np.arange(5) if full_output: #results that require additional memory or additional calculations result.b = function0() return result

def function5(): '''replacement for current function''' result = function5a(full_output=False) return result.a

The extra calculations might not apply for many functions, but I have some in statsmodels for which I could also switch to this kind of pattern instead of making the return type conditional on a keyword.

Josef

-- - Rob Falck

5353

Age (days ago)

5354

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

David Warde-Farley
josef.pktd＠gmail.com
Rob Falck

return "full_output" or how to stop throwing away already calculated results

josef.pktd＠gmail.com

David Warde-Farley

Rob Falck

David Warde-Farley

josef.pktd＠gmail.com

josef.pktd＠gmail.com

Rob Falck

tags

participants (3)