[Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

Joseph Fox-Rabinovitz jfoxrabinovitz at gmail.com
Tue Feb 16 14:48:26 EST 2016


Please correct me if I misunderstood, but the code in that commit is
doing a full sort, somewhat similar to what
`scipy.stats.scoreatpercentile`. If that is correct, I will run some
benchmarks first, but I think there is value to going forward with a
numpy version that extends the current partitioning scheme.

    - Joe

On Tue, Feb 16, 2016 at 2:39 PM,  <josef.pktd at gmail.com> wrote:
>
>
> On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz
> <jfoxrabinovitz at gmail.com> wrote:
>>
>> Thanks for pointing me to that. I had something a bit different in
>> mind but that definitely looks like a good start.
>>
>> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee <antony.lee at berkeley.edu>
>> wrote:
>> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326
>> > Basically, naïvely sorting may be faster than a not-so-optimized version
>> > of
>> > quickselect.
>> >
>> > Antony
>> >
>> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz
>> > <jfoxrabinovitz at gmail.com>:
>> >>
>> >> I would like to add a `weights` keyword to `np.partition`,
>> >> `np.percentile` and `np.median`. My reason for doing so is to to allow
>> >> `np.histogram` to process automatic bin selection with weights.
>> >> Currently, weights are not supported for the automatic bin selection
>> >> and would be difficult to support in `auto` mode without having
>> >> `np.percentile` support a `weights` keyword. I suspect that there are
>> >> many other uses for such a feature.
>> >>
>> >> I have taken a preliminary look at the C implementation of the
>> >> partition functions that are the basis for `partition`, `median` and
>> >> `percentile`. I think that it would be possible to add versions (or
>> >> just extend the functionality of existing ones) that check the ratio
>> >> of the weights below the partition point to the total sum of the
>> >> weights instead of just counting elements.
>> >>
>> >> One of the main advantages of such an implementation is that it would
>> >> allow any real weights to be handled correctly, not just integers.
>> >> Complex weights would not be supported.
>> >>
>> >> The purpose of this email is to see if anybody objects, has ideas or
>> >> cares at all about this proposal before I spend a significant amount
>> >> of time working on it. For example, did I miss any functions in my
>> >> list?
>> >>
>> >> Regards,
>> >>
>> >>     -Joe
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at scipy.org
>> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> statsmodels just got weighted quantiles
> https://github.com/statsmodels/statsmodels/pull/2707
>
> I didn't try to figure out it's computational efficiency, and we would
> gladly delegate to whatever fast algorithm would be in numpy.
>
> Josef
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list