[Numpy-discussion] Medians that ignore values

David Cournapeau david at ar.media.kyoto-u.ac.jp
Fri Sep 19 03:11:05 EDT 2008


Anne Archibald wrote:
>
> Well, for example, you might ask that all the non-nan elements be in
> order, even if you don't specify where the nan goes.


Ah, there are two problems, then:
    - sort
    - how median use sort.

For sort, I don't know how sort speed would be influenced by treating
nan. In a way, calling sort with nan inside is a user error (if you take
the POV nan are not comparable), but nan are used for all kind of
purpose, hence maybe having a nansort would be nice. OTOH (I took a look
at this when I fixed nanmean and co a while ago in scipy), matlab and R
treat sort differently than mean and co.

I am puzzled by this:
    - R sort arrays with nan as you want by default (nan can be ignored,
put in front or at the end of the array).
    - R max does not ignore nan by default.
    - R median does not ignore median by default.

I don't know how to set a consistency here. I don't think we are
consistent by having max/amax/etc... ignoring nan but sort not ignoring
it. OTOH, R is not consistent either.

>
> You can always just set numpy to raise an exception whenever it comes
> across a nan. In fact, apart from the difficulty of correctly frobbing
> numpy's floating-point handling, how reasonable is it for (say) median
> to just run as it is now, but if an exception is thrown, fall back to
> a nan-aware version?

It would be different from the current nan vs usual function behavior
for median/mean/etc...: why should sort handle nan by default, but not
the other functions ? For mean/std/variance/median, if having nan is an
error, you see it in the result (once we fix our median), but not with sort.

Hm, I am always puzzled when I think about nan handling :) It always
seem there is not good answer.

David



More information about the NumPy-Discussion mailing list