[Numpy-discussion] Apropos ticked #913
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Thu Mar 5 00:37:44 EST 2009
Charles R Harris wrote:
>
>
> On Wed, Mar 4, 2009 at 9:09 PM, David Cournapeau
> <david at ar.media.kyoto-u.ac.jp <mailto:david at ar.media.kyoto-u.ac.jp>>
> wrote:
>
> Charles R Harris wrote:
> >
> >
> > On Wed, Mar 4, 2009 at 1:57 PM, Pauli Virtanen <pav at iki.fi
> <mailto:pav at iki.fi>
> > <mailto:pav at iki.fi <mailto:pav at iki.fi>>> wrote:
> >
> > Wed, 04 Mar 2009 13:18:55 -0700, Charles R Harris wrote:
> > [clip]
> > > There are python max/min and their behaviour depends on the
> > scalar type.
> > > I haven't looked at the numpy scalars to see precisely
> what they do.
> > >
> > > Numpy max/min are aliases for amax/amin defined when the
> core is
> > > imported. The functions amax/amin in turn map to the array
> methods
> > > max/min which call the maximum.reduce/minimum.reduce
> ufuncs, so
> > they all
> > > propagate nans, i.e., if the array contains a nan, nan
> will be the
> > > return value.
> > >
> > > The nonpropagating comparisons are the ufuncs fmax/fmin and
> > there are no
> > > corresponding array methods. I think fmax/fmin should be
> renamed
> > > fmaximum/fminimum before the release of 1.3 and the names
> fmax/fmin
> > > reserved for the reduced versions to match the names
> amax/amin.
> > I'll do
> > > that if there are no objections.
> >
> > Aren't the nonpropagating versions of `amax` and `amin` called
> > `nanmax`
> > and `nanmin`? But these are functions, not array methods.
> >
> > What does the `f` in the beginning of `fmax` and `fmin`
> stand for?
> >
> >
> > The functions fmax/fmin are C standard library names, I assume the f
> > stands for floating like the f in fabs. Nanmax and nanmin work by
> > replacing nans with a fill value and then performing the specified
> > operation. For instance, nanmin replaces nans with inf. In contrast,
> > the functions fmax and fmin are real ufuncs and return nan when
> *both*
> > the inputs are nans, return the non-nan value when only one of the
> > inputs is a nan, and do the normal comparisons when both inputs
> are valid.
>
> Thanks for the clarification. I agree fmax/fmin is better because
> of the
> C convention.
>
>
> Better in what way? I was suggesting renaming them to
> fmaximum/fminimum but am perfectly happy with the current names if you
> feel fmax/fmin are better because of the c connection.
Oups, I read the contrary of what you meant :) My rationale for the name
fmax/fmin is that their behavior is a bit surprising for people not used
to C, so having a different name than C would only add to the confusion.
It is obviously not a strong rationale.
> One thing that still bothers me a bit is the return value of fmax/fmin
> when comparing two complex nan values. A complex number is a nan
> whenever the real or imaginary part is nan, and currently the
> functions return such a number but originally they returned a complex
> number with both parts set to nan. The current implemetation was a
> compromise that kept the code simple while never explicitly using a
> nan value, i.e., the nan came from one of the inputs. I avoided the
> explicit use of a nan value because the NAN macro was possibly
> unreliable at the time. I'm open to thoughts on what the behavior
> should be.
Is it a problem if only one part (real or imaginary) is nan ? We should
have a reliable NAN macro - this should be part of the npymath library,
IMO. I will look into it.
>
>
> We should clearly document the difference between those
> function, though.
>
>
> You mean the differences with nanmax/nanmin?
max (undefined behavior with nan) vs fmax (same semantics as C
counterpart) vs nanmx (ignore nan). In particular, I think it would be
helpful to document the differences with matlab and R, and suggestions
on how to replace which function from those environments with numpy
equivalent code. I can do this.
>
> Would you have time to implement something similar for
> sort (sort is important for correct and relatively efficient
> support of
> nanmedian I think) ? If not, that's ok, we'll do without for 1.3
> series,
>
>
> I would rather take more time for the sort functions.
Sure. My own experience is that this kind of code handling nan is
difficult to make right. We specially need a relatively good set of
tests, because of compilers/platforms specificities.
> I'm also not convinced that would solve the median problem. If 60% of
> the entries were nans would nan be the median? If not we would have to
> find where the nans began or ended and that would most likely need
> searchsorted to be fixed also.
I meant nanmedian, sorry. The current implementation is slow and/or
buggy (I should check the related tickets, though, maybe it was a scipy
ticket)
David
More information about the NumPy-Discussion
mailing list