[SciPy-Dev] Regarding taking up project ideas and GSoC 2015

Wed Mar 18 16:04:16 EDT 2015

Hi everyone, thanks for the feedback.

On Mon, Mar 16, 2015 at 4:24 AM, Christoph Deil <
deil.christoph at googlemail.com> wrote:

> Hi Maniteja,
>
>
> On 15 Mar 2015, at 17:44, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Sat, Mar 14, 2015 at 4:53 PM, Maniteja Nandana <
> maniteja.modesty067 at gmail.com> wrote:
>
>> Hi everyone,
>>
>> I was hoping if I could get some suggestions regarding the API for
>> *scipy.diff* package.
>>
>>    1. Type of input to be given - callable function objects or a set of
>>    points as in scipy.integrate.
>>
>> I would expect functions.
>
>
> I think most users will pass a function in, so that should be the input to
> the main API functions.
>
> But it can’t hurt to implement the scipy.diff methods that work on fixed
> samples as functions that take these fixed samples as input, just like
> these in scipy.integrate:
>
> http://docs.scipy.org/doc/scipy/reference/integrate.html#integrating-functions-given-fixed-samples
>
> Whether people have use cases for this and thus wether it should be part
> of the public scipy.diff API I’m not sure.
>
 I suppose it would be better to let the API to be able to find a
derivative of a callable function at given set of inputs.

>
>
>>    1. Parameters to be given to derivative methods, like *method *(as in
>>    scipy.optimize) to accommodate options like *central, forward,
>>    backward, complex or richardson.*
>>
>> There may be a lot of parameters that make sense, depending on the exact
> differentiation method(s) used. I think it's important to think about which
> ones will be used regularly, and which are only for niche usecases or power
> users that really understand the methods. Limit the number of parameters,
> and provide some kind of configuration object to tweak detailed behavior.
>
> This is the constructor of numdifftools.Derivative, as an example of too
> many params:
>
>     def __init__(self, fun, n=1, order=2, method='central',
> romberg_terms=2,
>                  step_max=2.0, step_nom=None, step_ratio=2.0, step_num=26,
>                  delta=None, vectorized=False, verbose=False,
>                  use_dea=True):
>
>
> I do like the idea of a single function that’s good enough for 90% of
> users with ~ 5 parameters and a `method` option.
> This will probably work very well for all fixed-step methods.
> For the iterative ones the extra parameters will probably be different for
> each method … I guess an `options` dict parameter as in
> `scipy.optimize.minimize` is the best way to expose those?
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.show_options.html
>
>  I agree that there needs to be a limit of parameters, and a decision on
whether they need to be taken as dict options, as in Optimize.

>
>
>>    1. The maximum order of derivative needed ? Also the values of order
>>    *k* used in the basic method to determine the truncation error O(h^k)
>>    ?
>>
>> Maybe propose to implement max order=2 and k=2 only?
> I think this is the absolute minimum that’s needed, and then you can wait
> if someone says “I want order=3” or “I want k=4” for my application.
> It’s easy to implement additional orders or k’s with just a few lines or
> code and without changing the API, but there should be an expressed need
> before you put this in.
>
>
>>    1. API defined in terms of functions(as in statsmodels) or classes(as
>>    in numdifftools) ?
>>
>> No strong preference, as long as it's a coherent API. The scipy.optimize
> API (minimize, root) is nice, something similar but as classes is also fine.
>
>
> My understanding is that classes are used in numdifftools as a way of code
> re-use … the constructor does no computation, everything happens in
> __call__.
>
> I think maybe using functions and returning results objects would be best.
>
> But then numdifftools would have to be either restructured or you’d keep
> it as-is and implement a small wrapper to it where you __init__ and
> __call__ the Derivative etc. objects in the function.
>
> I would really think that using functions is more intuitive to think of,
but classes provide the flexibility of probably having some parameters
changed at runtime, maybe something like error tolerance, method or epsilon
value( if taken from the user, depends on the method)

>
>>    1. Return type of the methods should contain the details of the
>>    result, like *error *?( on lines of OptimizeResult, as
>>    in scipy.optimize )
>>
>> I do have a strong preference for a Results object where the number of
> return values can be changed later on without breaking backwards
> compatibility.
>
>
> +1 to always return a DiffResult object in analogy to OptimizeResult.
>
> There will be cases where you want to return more info than (derivative
> estimate, derivative error estimate), e.g. number of function calls or even
> the function samples or a status code.
> It’s easy to attach useful extra info to the results object, and the extra
> cost for simple use cases of having to type `.value` to get at the
> derivative estimate is acceptable.
>
>
>  I have updated a wiki page, here
<https://github.com/maniteja123/GSoC/wiki/Proposal:-add-finite-difference-numerical-derivatives-as-%60%60scipy.diff%60%60#approx_fprime>
with
a simple API idea about the possible interface for the derivative.

> I would really appreciate some feedback and suggestions on these issues.
>> The whole draft of the proposal can be seen here
>> <https://github.com/maniteja123/GSoC/wiki/Proposal%3A-add-finite-difference-numerical-derivatives-as-%60%60scipy.diff%60%60>
>> .
>>
>
> Regarding your "to be discussed" list:
> - Don't worry about the module name (diff/derivative/...), this can be
> changed easily later on.
> - Broadcasting: I'm not sure what needs to be broadcasted. If you provide
> a function and the derivative order as int, that seems OK to me.
>
>
> Thanks, the issue of broadcasting is something I got confused regarding a
higher dimensional input or function ,and suppose we need to calculate
hessian or jacobian. Maybe if we to look into specific cases, there might
arise a need to elegantly handle broadcasting. But I get the point you are
trying to make. In my understanding of the code, the broadcasting is
elegantly handled in statsmodels than numdifftools.

Broadcasting was one of the major points of discussion in
> https://github.com/scipy/scipy/pull/2835.
> If someone has examples that illustrate how it should work, that would be
> great.
> Otherwise we’ll try to read through the code an discussion there and try
> to understand the issue / proposed solution.
>
> - Parallel evaluation should be out of scope imho.
>
>
>
Maybe for fixed-step derivatives, this makes sense to have, but it needs to
be looked into the details of the implementation, regarding whether Python,
C or Cython and also whether multiprocessing is already used elsewhere. As
of now, I don't have enough knowledge to venture into this. Any feedback is
really great and I will look into it.

It would be really nice to be able to use multiple cores in scipy.diff,
> e.g. to compute the Hesse matrix of a likelihood function.
>
> Concretely I think this could be implemented via a single `processes`
> option,
> where `processes=1` means no parallel function evaluation by default,
> and `processes>1` means evaluating the function samples via a
> `multiprocessing.Pool(processes=processes)`.
>
> Although I have to admit that the fact that multiprocessing is used
> no-where else in scipy (as far as I know) is a strong hint that maybe you
> shouldn’t try to introduce it as part of your GSoC project on scipy.diff.
>
> Exposing the fixed-step derivative computation functions using samples as
> input as mentioned above would also allow the user to perform the function
> calls in parallel if they like.
>
> Cheers,
> Christoph
>
>
> Cheers,
> Ralf
>
>
> I am also putting a simple thought about possible API implementation for
calculating derivative (gradient or jacobian), though this is a simple
approach I was wondering about. Please do correct me if I overlooked some
important points. Waiting to hear from you.

   1. approx_fprime

arguments:   *f*  : callable function

                  *x* : ndarray values at which the derivative needs to be
calculated

                  *method* : str method to be used to approximate the
derivative - 'central', 'forward', 'backward', 'complex', 'richardson'

                  *n* : Integer from 1 to 4 (Default 1) defining derivative
order.

                  *order* : Integer from 1 to 4 (Default 2) defining order
of basic method used. For 'central' methods, it must be from the set [2,4].

                  *args* : tuple arguments for the function f

                  *kwargs* : dict Keyword arguments for function `f`.

                  *epsabs* : float or int optional Absolute error tolerance.

                  *epsrel* : float or int, optional Relative error
tolerance.

                  *disp* : bool Set to True to print error messages.

return :  *res* : DiffResult

     The differentiation result represented as a DiffResult object. Important
attributes are:

         *x*: ndarray solution array,

              *success :* bool a flag indicating if the derivative was
calculated successfully

              *message* : str which describes the cause of the error, if
occurred

              *nfev *: int number of function evaluations

              *abserr_round * : float absolute value of the roundoff error,
if applicable

*              abserr_truncate *: float absolute value of the truncation
error, if applicable

Cheers,
Maniteja
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150319/9bd66280/attachment.html>