[Numpy-discussion] Apropos ticked #913

Thu Mar 5 00:37:44 EST 2009

Charles R Harris wrote:
>
>
> On Wed, Mar 4, 2009 at 9:09 PM, David Cournapeau
> <david at ar.media.kyoto-u.ac.jp <mailto:david at ar.media.kyoto-u.ac.jp>>
> wrote:
>
>     Charles R Harris wrote:
>     >
>     >
>     > On Wed, Mar 4, 2009 at 1:57 PM, Pauli Virtanen <pav at iki.fi
>     <mailto:pav at iki.fi>
>     > <mailto:pav at iki.fi <mailto:pav at iki.fi>>> wrote:
>     >
>     >     Wed, 04 Mar 2009 13:18:55 -0700, Charles R Harris wrote:
>     >     [clip]
>     >     > There are python max/min and their behaviour depends on the
>     >     scalar type.
>     >     > I haven't looked at the numpy scalars to see precisely
>     what they do.
>     >     >
>     >     > Numpy max/min are aliases for amax/amin defined when the
>     core is
>     >     > imported. The functions amax/amin in turn map to the array
>     methods
>     >     > max/min which call the maximum.reduce/minimum.reduce
>     ufuncs, so
>     >     they all
>     >     > propagate nans, i.e., if the array contains a nan, nan
>     will be the
>     >     > return value.
>     >     >
>     >     > The nonpropagating comparisons are the ufuncs fmax/fmin and
>     >     there are no
>     >     > corresponding array methods. I think fmax/fmin should be
>     renamed
>     >     > fmaximum/fminimum before the release of 1.3 and the names
>     fmax/fmin
>     >     > reserved for the reduced versions to match the names
>     amax/amin.
>     >     I'll do
>     >     > that if there are no objections.
>     >
>     >     Aren't the nonpropagating versions of `amax` and `amin` called
>     >     `nanmax`
>     >     and `nanmin`? But these are functions, not array methods.
>     >
>     >     What does the `f` in the beginning of `fmax` and `fmin`
>     stand for?
>     >
>     >
>     > The functions fmax/fmin are C standard library names, I assume the f
>     > stands for floating like the f in fabs. Nanmax and nanmin work by
>     > replacing nans with a fill value and then performing the specified
>     > operation. For instance, nanmin replaces nans with inf. In contrast,
>     > the functions fmax and fmin are real ufuncs and return nan when
>     *both*
>     > the inputs are nans, return the non-nan value when only one of the
>     > inputs is a nan, and do the normal comparisons when both inputs
>     are valid.
>
>     Thanks for the clarification. I agree fmax/fmin is better because
>     of the
>     C convention. 
>
>
> Better in what way? I was suggesting renaming them to
> fmaximum/fminimum but am perfectly happy with the current names if you
> feel fmax/fmin are better because of the c connection.

Oups, I read the contrary of what you meant :) My rationale for the name
fmax/fmin is that their behavior is a bit surprising for people not used
to C, so having a different name than C would only add to the confusion.
It is obviously not a strong rationale.

> One thing that still bothers me a bit is the return value of fmax/fmin
> when comparing two complex nan values. A complex number is a nan
> whenever the real or imaginary part is nan, and currently the
> functions return such a number but originally they returned a complex
> number with both parts set to nan. The current implemetation was a
> compromise that kept the code simple while never explicitly using a
> nan value, i.e., the nan came from one of the inputs. I avoided the
> explicit use of a nan value because the NAN macro was possibly
> unreliable at the time. I'm open to thoughts on what the behavior
> should be.

Is it a problem if only one part (real or imaginary) is nan ? We should
have a reliable NAN macro - this should be part of the npymath library,
IMO. I will look into it.

>  
>
>     We should clearly document the difference between those
>     function, though.
>
>
> You mean the differences with nanmax/nanmin?

max (undefined behavior with nan) vs fmax (same semantics as C
counterpart) vs nanmx (ignore nan). In particular, I think it would be
helpful to document the differences with matlab and R, and suggestions
on how to replace which function from those environments with numpy
equivalent code. I can do this.

>
>     Would you have time to implement something similar for
>     sort (sort is important for correct and relatively efficient
>     support of
>     nanmedian I think) ? If not, that's ok, we'll do without for 1.3
>     series,
>
>
> I would rather take more time for the sort functions.

Sure. My own experience is that this kind of code handling nan is
difficult to make right. We specially need a relatively good set of
tests, because of compilers/platforms specificities.

> I'm also not convinced that would solve the median problem. If 60% of
> the entries were nans would nan be the median? If not we would have to
> find where the nans began or ended and that would most likely need
> searchsorted to be fixed also.

I meant nanmedian, sorry. The current implementation is slow and/or
buggy (I should check the related tickets, though, maybe it was a scipy
ticket)

David