[Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions)

Mon Mar 26 14:09:00 EDT 2018

On Mon, 2018-03-26 at 17:40 +0000, Eric Wieser wrote:
> The difficulty in supporting object arrays is that func.reduce(arr,
> initial=func.identity) and func.reduce(arr) have different meanings -
> whereas with the current patch, they are equivalent.
> 

True, but the current meaning is:

func.reduce(arr, intial=<NoValue>, default=func.identity)

in the case for object dtype. Luckily for normal dtypes, func.identity
is both the correct default "default" and a no-op for initial. Thus the
name "identity" kinda works there. I am also not really sure that both
kwargs would make real sense (plus initial probably disallows
default...), but I got some feeling that the "default" meaning may be
even more useful to simplify special casing the empty case.

Anyway, still just pointing out that I it gives me some headaches to
see such a special case for objects :(.

- Sebastian

> 
> On Mon, 26 Mar 2018 at 10:10 Sebastian Berg <sebastian at sipsolutions.n
> et> wrote:
> > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote:
> > > That may be complicated. Currently, the identity isn't used in
> > object
> > > dtype reductions. We may need to change that, which could cause a
> > > whole lot of other backwards incompatible changes. For example,
> > sum
> > > actually including zero in object reductions. Or we could pass in
> > a
> > > flag saying an initializer was passed in to change that
> > behaviour. If
> > > this is agreed upon and someone is kind enough to point me to the
> > > code, I'd be willing to make this change.
> > 
> > I realize the implication, I am not suggesting to change the
> > default
> > behaviour (when no initial=... is passed), I would think about
> > deprecating it, but probably only if we also have the `default`
> > argument, since otherwise you cannot replicate the old behaviour.
> > 
> > What I think I would like to see is to change how it works if (and
> > only
> > if) the initializer is passed in. Yes, this will require holding on
> > to
> > some extra information since you will have to know/remember whether
> > the
> > "identity" was passed in or defined otherwise.
> > 
> > I did not check the code, but I would hope that it is not awfully
> > tricky to do that.
> > 
> > - Sebastian
> > 
> > 
> > PS: A side note, but I see your emails as a single block of text
> > with
> > no/broken new-lines.
> > 
> > 
> > >  On 26/03/2018 at 18:54,
> > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian
> > Berg
> > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote:
> > It'll
> > > need to be thought out for object arrays and subclasses. But for
> > > Regular numeric stuff, Numpy uses fmin and this would have the
> > > desired
> > > effect. I do not want to block this, but I would like a clearer
> > > opinion about this issue, `np.nansum` as Benjamin noted would
> > require
> > > something like: np.nansum([np.nan], default=np.nan) because
> > > np.sum([1], initializer=np.nan) np.nansum([1],
> > initializer=np.nan)
> > > would both give NaN if the logic is the same as the current
> > `np.sum`.
> > > And yes, I guess for fmin/fmax NaN happens to work. And then
> > there
> > > are
> > > many nonsense reduces which could make sense with `initializer`.
> > Now
> > > nansum is not implemented in a way that could make use of the new
> > > kwarg anyway, so maybe it does not matter in some sense. We can
> > in
> > > principle use `default` in nansum and at some point possibly add
> > > `default` to the normal ufuncs. If we argue like that, the only
> > > annoying thing is the `object` dtype which confuses the two use
> > cases
> > > currently. This confusion IMO is not harmless, because I might
> > want
> > > to
> > > use it (e.g. sum with initializer=5), and I would expect things
> > like
> > > dropping in `decimal.Decimal` to work most of the time, while
> > here it
> > > would give silently bad results. In other words: I am very very
> > much
> > > in favor if you get rid that object dtype special case. I frankly
> > not
> > > see why not (except that it needs a bit more code change). If
> > given
> > > explicitly, we might as well force the use and not do the funny
> > stuff
> > > which is designed to be more type agnostic! If it happens to fail
> > due
> > > to not being type agnostic, it will at least fail loudly. If you
> > > leave
> > > that object special case I am *very* hesitant about it. That I
> > think
> > > I
> > > would like a `default` argument as well, is another issue and it
> > can
> > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at
> > 17:45,
> > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi
> > > wrote: That is the idea, but NaN functions are in a separate
> > branch
> > > for another PR to be discussed later. You can see it on my fork,
> > if
> > > you're interested. Except that as far as I understand I am not
> > sure
> > > it
> > > will help much with it, since it is not a default, but an
> > > initializer.
> > > Initializing to NaN would just make all results NaN. - Sebastian
> > On
> > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine
> > it
> > > would finally give some people a choice on what
> > np.nansum([np.nan])
> > > should return? It caused a huge hullabeloo a few years ago when
> > we
> > > changed it from returning NaN to returning zero. Ben Root On Mon,
> > Mar
> > > 26, 2018 at 11:16 AM, Sebastian Berg <sebastian at sipsolutions.net>
> > > wrote: OK, the new documentation is actually clear: initializer :
> > > scalar, optional The value with which to start the reduction.
> > > Defaults
> > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is
> > given,
> > > the
> > > first element of the reduction is used, and an error is thrown if
> > the
> > > reduction is empty. If ``a.dtype`` is ``object``, then the
> > > initializer
> > > is _only_ used if reduction is empty. I would actually like to
> > say
> > > that I do not like the object special case much (and it is
> > probably
> > > the reason why I was confused), nor am I quite sure this is what
> > > helps
> > > a lot? Logically, I would argue there are two things: 1.
> > > initializer/start (always used) 2. default (oly used for empty
> > > reductions) For example, I might like to give `np.nan` as the
> > default
> > > for some empty reductions, this will not work. I understand that
> > this
> > > is a minimal invasive PR and I am not sure I find the solution
> > bad
> > > enough to really dislike it, but what do other think? My first
> > > expectation was the default behaviour (in all cases, not just
> > object
> > > case) for some reason. To be honest, for now I just wonder a bit:
> > How
> > > hard would it be to do both, or is that too annoying? It would at
> > > least get rid of that annoying thing with object ufuncs (which
> > > currently have a default, but not really an
> > identity/initializer).
> > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi
> > > wrote: > Actually, the behavior right now isn’t that of `default`
> > but
> > > that of > `initializer` or `start`. > > This was discussed
> > further
> > > down in the PR but to reiterate: > `np.sum([10], initializer=5)`
> > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`,
> > so
> > > it isn’t really > the default value, it’s the initial value among
> > > which the reduction > is performed. > > This was the reason to
> > call
> > > it
> > > initializer in the first place. I like > `initial` and
> > > `initial_value`
> > > as well, and `start` also makes sense > but isn’t descriptive
> > enough.
> > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at
> > 12:06,
> > >
> > > Sebastian Berg <sebastian at sipsolutions.ne > > t> wrote: > > > >
> > > Initializer or this sounds fine to me. As an other data point
> > which >
> > > > I > > think has been mentioned before, `sum` uses start and
> > min/max
> > >
> > > use > > default. `start` does not work, unless we also change the
> > > code
> > > to > > always use the identity if given (currently that is not
> > the
> > > case), > > in > > which case it might be nice. However, "start"
> > seems
> > > a bit like > > solving > > a different issue in any case. > > > >
> > > Anyway, mostly noise. I really like adding this, the only thing >
> > >
> > > worth > > discussing a bit is the name :). - Sebastian > > > > >
> > > On
> > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It
> > calls
> > > it
> > > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f
> > > >
> > > >
> > > unctools.html#functools.reduce > > > > > > Sent from Astro for
> > Mac On
> > > Mar 26, 2018 at 09:54, Eric Wieser <wieser.eric+numpy at gmail. > >
> > com>
> > > > > > > wrote: > > > > > > > > It turns out I mispoke -
> > >
> > > functools.reduce calls the argument > > > > `initial` > > > > > >
> > >
> > > On
> > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer <shoyer at gmail.com> > > >
> > > wrote: > > > > > This looks like a very logical addition to the
> > > reduce
> > > interface. > > > > > It has my support! > > > > > > > > > I would
> > > have
> > > preferred the more descriptive name > > > > > "initial_value", >
> > > >
> > > >
> > > > but consistency with functools.reduce makes a compelling case >
> > > >
> > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018
> > at
> > >
> > > 1:15 PM Eric Wieser <wieser.eric+nump y at gm > > > > > ail.com>
> > wrote:
> > > To reiterate my comments in the issue - I'm in favor of > this. >
> > > >
> > > > > > > > > > > > It seems seem especially valuable for identity-
> > less
> > > > > > > > >
> > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the
> > argument
> > >
> > > name is consistent > > > with > `functools.reduce`. too. > > > >
> > > >
> > > >
> > > > > > > > The only argument I can see against merging this would
> > be >
> > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough
> > use
> > > > > > > > >
> > > > > >
> > > > > > cases to justify that. > > > > > > > > > > > > I'd like to
> > > > > > merge
> > >
> > > in a few days, if no one else has any > > > > > > opinions. > > >
> > > >
> > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13
> > Hameer
> > >
> > > Abbasi <einstein.edison > > > > > > @gma > > > > > > il.com>
> > wrote:
> > > Hello, everyone. I’ve submitted a PR to add a initializer kwarg
> > to
> > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > >
> > it
> > > allows one to supply a “default” value for identity- > > > > > >
> > >
> > > less > > > > > > > ufunc reductions, and specify an initial value
> > for
> > > > > > > > > > reductions such as sum (other than zero.) > > > > >
> > > >
> > > > > > > >
> > > > > > > > Please feel free to review or leave feedback, (although
> > I >
> > > > > > >
> > > > > > > think Eric and Marten have picked it apart pretty well).
> > > >
> > > > > > > >
> > > >
> > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > >
> > > >
> > >
> > > Hameer > > > > Sent from Astro for Mac > > > > >
> > > _______________________________________________ > > > NumPy-
> > > Discussion
> > > mailing list > > > > > > > NumPy-Discussion at python.org > > > > >
> > > >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > >
> > > _______________________________________________ > > > > >
> > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python
> > .o
> > > rg
> > > https://mail.python.org/mailman/listinfo/numpy-discussi on
> > > _______________________________________________ > > > >
> > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.o
> > rg
> > > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > > _______________________________________________ > > > >
> > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org
> >  >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > > _______________________________________________ > > > NumPy-
> > > Discussion mailing list > > > NumPy-Discussion at python.org > > >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > > _______________________________________________ > > NumPy-
> > Discussion
> > > mailing list > > NumPy-Discussion at python.org > >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > > _______________________________________________ > NumPy-
> > Discussion
> > > mailing list > NumPy-Discussion at python.org >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________ NumPy-Discussion
> > > mailing list NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________ NumPy-Discussion
> > > mailing list NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________ NumPy-Discussion
> > > mailing list NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________ NumPy-Discussion
> > > mailing list NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion________
> > _______________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180326/e11bc0e0/attachment.sig>