[Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)

Wed Apr 24 12:03:17 EDT 2013

On Wed, Apr 24, 2013 at 4:11 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Tue, 2013-04-23 at 23:33 -0400, josef.pktd at gmail.com wrote:
>> On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg
>> <sebastian at sipsolutions.net> wrote:
>> > On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote:
>> >>      Back in December it was pointed out on the scipy-user list[1] that
>> >> numpy has a percentile function which has similar functionality to
>> >> scipy's stats.scoreatpercentile.  I've been trying to harmonize these
>> >> two functions into a single version which has the features of both.
>> >>      Scipy PR 374[2] introduced a version which look the parameters from
>> >> both the scipy and numpy percentile function and was accepted into Scipy
>> >> with the plan that it would be depreciated when a similar function was
>> >> introduced into Numpy.  Then I moved to enhancing the Numpy version with
>> >> Pull Request 2970 [3].  With some input from Sebastian Berg the
>> >> percentile function was rewritten with further vectorization, but
>> >> neither of us felt fully comfortable with the final product.  Can
>> >> someone look at implementation in the PR and suggest what should be done
>> >> from here?
>> >>
>> >
>> > Thanks! For me the main question is the vectorized usage when both
>> > haystack (`a`) and needle (`q`) are vectorized. What I mean is for:
>> >
>> > np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1)
>> >
>> > I would probably expect an output shape of (n1, n2, 3), but currently
>> > you will get the needle dimensions first, because it is roughly the same
>> > as
>> >
>> > [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]]
>> >
>> > so for the (probably rare) vectorization of both `a` and `q`, would it
>> > be preferable to do some kind of long term behaviour change, or just put
>> > the dimensions in `q` first, which should be compatible to the current
>> > list?
>>
>> I don't have much of a preference either way, but I'm glad this is
>> going into numpy.
>> We can work with it either way.
>>
>> In stats, the most common case will be axis=0, and then the two are
>> the same, aren't they?
>>
>> What I like about the second version is unrolling (with 2 or 3
>> quantiles), which I think will work
>>
>> u, l = np.random.randn(2,5)
>> or
>> res = np.percentile(...)
>> func(*res)
>>
>> The first case will be nicer when there are lots of percentiles, but I
>> guess I won't need it much except for axis=0.
>>
>> Actually, I would prefer the second version, because it might be a bit
>> more cumbersome to get the individual percentiles out if the axis is
>> somewhere in the middle, however I don't think I have a case like
>> that.
>>
>
> I never thought about the axis being where to insert the dimensions of
> the quantiles. That would be a third option. It feels simpler to me to
> just always use the end (or the start) though.

If the choices are start or end, then I prefer start for unpacking.

Josef

>
> - Sebastian
>
>> The first version would be consistent with reduceat, and that would be
>> more numpythonic. I would go for that in numpy.
>>
>> my 2.5c
>>
>> Josef
>>
>> >
>> > Regards,
>> >
>> > Sebastian
>> >
>> >>   Cheers,
>> >>
>> >>      - Jonathan Helmus
>> >>
>> >>
>> >> [1] http://thread.gmane.org/gmane.comp.python.scientific.user/33331
>> >> [2] https://github.com/scipy/scipy/pull/374
>> >> [3] https://github.com/numpy/numpy/pull/2970
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion