return "full_output" or how to stop throwing away already calculated results
scipy functions throw away a lot of intermediate results examples fmin_slsqp ticket:1129 scipy.linalg.lstsq throws away residuals : if n<m: x1 = x[:n] if rank==n: resids = sum(x[n:]**2,axis=0) x = x1 return x,resids,rank,s in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...) Is it possible to start the same pattern in scipy? E.g. for optimizers like slsqp, if we want to change the returns then each time it breaks the API or we have to increase full_output= 1,2,3 ( like nargout in matlab) (ugly) I saw linalg.lstsq by chance this morning, but I assume there is additional useful information hidden in other functions. Any opinion ? Josef
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)
David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this. I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is). David
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum result.x result.objfun result.iterations result.exit_status Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes. On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)
David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.
I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).
David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-- - Rob Falck
On 9-Mar-10, at 10:37 PM, Rob Falck wrote:
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum
result.x result.objfun result.iterations result.exit_status
Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.
It would require at least one version's worth of deprecation of the old interface, probably two or more. Better to settle on the interface, implement it, and introduce it alongside. In my opinion, the more descriptive the names, the better: result.minimizer instead of result.x, result.minimum_found or result.minimum_value rather than result.objfun, etc. - if we're going to add a layer of complexity then we might as well make the names as unambiguous as possible. Also often of interest: number of function evaluations, gradient evaluations, Hessian evaluations, etc. The current report that gets printed could also form a print_summary() method for these objects or something like that. David
On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum
result.x result.objfun result.iterations result.exit_status
Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.
If we create new functions, as David argued, then the old signature could still be kept. I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations. The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result. Josef
On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)
David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.
I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).
David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-- - Rob Falck _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Tue, Mar 9, 2010 at 11:54 PM, <josef.pktd@gmail.com> wrote:
On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum
result.x result.objfun result.iterations result.exit_status
Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.
If we create new functions, as David argued, then the old signature could still be kept.
I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations.
The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result.
something like this might work, without any undesired calculations and any break in current API, and return type is independent of "full_output" flag. (short of going to a class with lazy evaluation) def function2a(): '''current function''' return np.arange(5) new: def function5a(full_output=True): '''current function rewritten to return Store''' result = Store() result.a = np.arange(5) if full_output: #results that require additional memory or additional calculations result.b = function0() return result def function5(): '''replacement for current function''' result = function5a(full_output=False) return result.a The extra calculations might not apply for many functions, but I have some in statsmodels for which I could also switch to this kind of pattern instead of making the return type conditional on a keyword. Josef
On Tue, Mar 9, 2010 at 8:56 PM, David Warde-Farley <dwf@cs.toronto.edu> wrote:
On 9-Mar-10, at 4:01 PM, josef.pktd@gmail.com wrote:
in statsmodels, I am switching more to the pattern when full_output=True (or similar keyword) for a function, then all intermediate results are returned, attached to a generic class instance ( or could be a struct in matlab or a bunch for Gael,...)
David Cournapeau pointed out (in the thread "anyone to look at #1402?" on the numpy-discussion list) that the wider Python community frowns on the type of the returned value depending on a boolean flag argument, preferring instead different function names that call some common (private) helper function. It provided some validation for my uncomfortable feelings about scipy.optimize methods that do this.
I do think it's worth thinking about returning some sort of proxy object that can have attributes set to either 'None' or the appropriate values. It would certainly make code more readable (except in the situation where you use argument unpacking, but even then it isn't obvious to an outsider how crucial that full_output thing is).
David _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-- - Rob Falck _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
At least in the case of fmin_slsqp (and probably most of the other optimizers) the attributes returned when full_output=True are things that are being calculated anyway. There really shouldn't be much penalty in terms of speed. I agree that keeping the old call signatures in a deprecated mode would be a good idea, though I admit I don't really like the idea of changing the function names away from fmin_xxxx. Perhaps we keep the current function definitions and only the full_output argument becomes deprecated? On Wed, Mar 10, 2010 at 12:32 AM, <josef.pktd@gmail.com> wrote:
On Tue, Mar 9, 2010 at 11:54 PM, <josef.pktd@gmail.com> wrote:
On Tue, Mar 9, 2010 at 10:37 PM, Rob Falck <robfalck@gmail.com> wrote:
Returning an object would be my preference as well, it seems more pythonic. Most optimizers should be able to return at a minimum
result.x result.objfun result.iterations result.exit_status
Some optimizers have other useful data to return, such as the Lagrange multiplers from slsqp. Going down that path means probably breaking current implementations of the optimizers, but doing it right would be worth it, in my opinion. We should also agree upon names for the common attributes.
If we create new functions, as David argued, then the old signature could still be kept.
I was curious what the increase in call overhead is, and for an essentially empty function it would be up to 60%. However, since this is mainly for functions that do heavier work, this will not be really relevant. (It might add a little bit to my Monte Carlos or bootstrap, but this should be negligible), and we will gain if we don't have to redo some calculations.
The main reason I liked the full output option instead of always returning a result instance, is that, if full_output is true, then additional results can be calculated, that I don't want when I just need a fast minimal result.
something like this might work, without any undesired calculations and any break in current API, and return type is independent of "full_output" flag. (short of going to a class with lazy evaluation)
def function2a(): '''current function''' return np.arange(5)
new:
def function5a(full_output=True): '''current function rewritten to return Store''' result = Store() result.a = np.arange(5) if full_output: #results that require additional memory or additional calculations result.b = function0() return result
def function5(): '''replacement for current function''' result = function5a(full_output=False) return result.a
The extra calculations might not apply for many functions, but I have some in statsmodels for which I could also switch to this kind of pattern instead of making the return type conditional on a keyword.
Josef
-- - Rob Falck
participants (3)
-
David Warde-Farley
-
josef.pktd@gmail.com
-
Rob Falck