Dear Lucas,

I want the ability to reuse the bin numbers for a new input dataset.

Indeed one should already be able to compute several statistics at once (and also for several datasets available at the same time).

I have a PR ready to submit.
Thank you for proposing to review it.

Best regards,

Edouard

On Wed, Sep 18, 2019 at 9:59 PM <rlucas7@vt.edu> wrote:
 
> On Sep 18, 2019, at 9:45 AM, scipy-dev-request@python.org wrote:
>
> Send SciPy-Dev mailing list submissions to
>    scipy-dev@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
>    scipy-dev-request@python.org
>
> You can reach the person managing the list at
>    scipy-dev-owner@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
>
>
> Today's Topics:
>
>   1. Re: improvement to binned statistic (Ralf Gommers)
>   2. Adding alpha complexes/filtrations to scipy.spatial?
>      (Hamilton, Wesley)
>   3. Re: Improvement to regular grid interpolation (Simon S. Clift)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Sep 2019 15:02:17 +0200
> From: Ralf Gommers <ralf.gommers@gmail.com>
> To: SciPy Developers List <scipy-dev@python.org>
> Subject: Re: [SciPy-Dev] improvement to binned statistic
> Message-ID:
>    <CABL7CQhHJ-qJmbNnmJeGYATLKZQZCc6z9EB-RivXxKBUo8pscA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Edouard,
>
>
> On Wed, Sep 18, 2019 at 11:29 AM Edouard Goudenhoofdt <egouden@gmail.com>
> wrote:
>
>> Dear scipy developers,
>>
>> One could use scipy.stats.binned_statistic_dd for the same sample points
>> but for values available at different times.
>> Currently this involves the computation of the bin numbers every time the
>> function is called.
>> Therefore I would like to add an optional argument "binnumbers" to skip
>> this step when calling the function again.
>>
>
> That seems sensible. Could you check that creating the bin numbers really
> takes the majority of the time? There's also a fair amount of input
> validation that shouldn't be skipped even when a new `binnumbers` is passed
> in. If that is the case, sending a PR with a benchmark would be very
> welcome.
>
> Cheers,
> Ralf

IIUC Edouard what you’d like to do is take input data, run binned_statistic_dd() and then do the same thing with the bin edges calculated from this first call either on a new input dataset or on the same data(perhaps calculating on a new statistic?).

AFAIK the binned_statistic_dd() function isn’t able to take binedges as an argument. If you want multiple stats for the same data I think you can achieve that via a custom callable() that returns multiple statistics rather than a single scalar, but I haven’t done this so you should confirm that the approach would work fine.

If you want to take that up I’m happy to review the PR.

If not, and this is something others agree is useful and should be implemented, it seems reasonable to do. I can implement if you don’t have time or are otherwise unable to open a PR.

Let me know either way.

-Lucas Roberts
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@python.org
https://mail.python.org/mailman/listinfo/scipy-dev