[Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions)

Eric Wieser wieser.eric+numpy at gmail.com
Mon Mar 26 13:40:34 EDT 2018


The difficulty in supporting object arrays is that func.reduce(arr,
initial=func.identity) and func.reduce(arr) have different meanings -
whereas with the current patch, they are equivalent.
​

On Mon, 26 Mar 2018 at 10:10 Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote:
> > That may be complicated. Currently, the identity isn't used in object
> > dtype reductions. We may need to change that, which could cause a
> > whole lot of other backwards incompatible changes. For example, sum
> > actually including zero in object reductions. Or we could pass in a
> > flag saying an initializer was passed in to change that behaviour. If
> > this is agreed upon and someone is kind enough to point me to the
> > code, I'd be willing to make this change.
>
> I realize the implication, I am not suggesting to change the default
> behaviour (when no initial=... is passed), I would think about
> deprecating it, but probably only if we also have the `default`
> argument, since otherwise you cannot replicate the old behaviour.
>
> What I think I would like to see is to change how it works if (and only
> if) the initializer is passed in. Yes, this will require holding on to
> some extra information since you will have to know/remember whether the
> "identity" was passed in or defined otherwise.
>
> I did not check the code, but I would hope that it is not awfully
> tricky to do that.
>
> - Sebastian
>
>
> PS: A side note, but I see your emails as a single block of text with
> no/broken new-lines.
>
>
> >  On 26/03/2018 at 18:54,
> > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian Berg
> > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: It'll
> > need to be thought out for object arrays and subclasses. But for
> > Regular numeric stuff, Numpy uses fmin and this would have the
> > desired
> > effect. I do not want to block this, but I would like a clearer
> > opinion about this issue, `np.nansum` as Benjamin noted would require
> > something like: np.nansum([np.nan], default=np.nan) because
> > np.sum([1], initializer=np.nan) np.nansum([1], initializer=np.nan)
> > would both give NaN if the logic is the same as the current `np.sum`.
> > And yes, I guess for fmin/fmax NaN happens to work. And then there
> > are
> > many nonsense reduces which could make sense with `initializer`. Now
> > nansum is not implemented in a way that could make use of the new
> > kwarg anyway, so maybe it does not matter in some sense. We can in
> > principle use `default` in nansum and at some point possibly add
> > `default` to the normal ufuncs. If we argue like that, the only
> > annoying thing is the `object` dtype which confuses the two use cases
> > currently. This confusion IMO is not harmless, because I might want
> > to
> > use it (e.g. sum with initializer=5), and I would expect things like
> > dropping in `decimal.Decimal` to work most of the time, while here it
> > would give silently bad results. In other words: I am very very much
> > in favor if you get rid that object dtype special case. I frankly not
> > see why not (except that it needs a bit more code change). If given
> > explicitly, we might as well force the use and not do the funny stuff
> > which is designed to be more type agnostic! If it happens to fail due
> > to not being type agnostic, it will at least fail loudly. If you
> > leave
> > that object special case I am *very* hesitant about it. That I think
> > I
> > would like a `default` argument as well, is another issue and it can
> > wait to another day. - Sebastian - Sebastian On 26/03/2018 at 17:45,
> > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi
> > wrote: That is the idea, but NaN functions are in a separate branch
> > for another PR to be discussed later. You can see it on my fork, if
> > you're interested. Except that as far as I understand I am not sure
> > it
> > will help much with it, since it is not a default, but an
> > initializer.
> > Initializing to NaN would just make all results NaN. - Sebastian On
> > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine it
> > would finally give some people a choice on what np.nansum([np.nan])
> > should return? It caused a huge hullabeloo a few years ago when we
> > changed it from returning NaN to returning zero. Ben Root On Mon, Mar
> > 26, 2018 at 11:16 AM, Sebastian Berg <sebastian at sipsolutions.net>
> > wrote: OK, the new documentation is actually clear: initializer :
> > scalar, optional The value with which to start the reduction.
> > Defaults
> > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is given,
> > the
> > first element of the reduction is used, and an error is thrown if the
> > reduction is empty. If ``a.dtype`` is ``object``, then the
> > initializer
> > is _only_ used if reduction is empty. I would actually like to say
> > that I do not like the object special case much (and it is probably
> > the reason why I was confused), nor am I quite sure this is what
> > helps
> > a lot? Logically, I would argue there are two things: 1.
> > initializer/start (always used) 2. default (oly used for empty
> > reductions) For example, I might like to give `np.nan` as the default
> > for some empty reductions, this will not work. I understand that this
> > is a minimal invasive PR and I am not sure I find the solution bad
> > enough to really dislike it, but what do other think? My first
> > expectation was the default behaviour (in all cases, not just object
> > case) for some reason. To be honest, for now I just wonder a bit: How
> > hard would it be to do both, or is that too annoying? It would at
> > least get rid of that annoying thing with object ufuncs (which
> > currently have a default, but not really an identity/initializer).
> > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi
> > wrote: > Actually, the behavior right now isn’t that of `default` but
> > that of > `initializer` or `start`. > > This was discussed further
> > down in the PR but to reiterate: > `np.sum([10], initializer=5)`
> > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, so
> > it isn’t really > the default value, it’s the initial value among
> > which the reduction > is performed. > > This was the reason to call
> > it
> > initializer in the first place. I like > `initial` and
> > `initial_value`
> > as well, and `start` also makes sense > but isn’t descriptive enough.
> > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at 12:06,
> >
> > Sebastian Berg <sebastian at sipsolutions.ne > > t> wrote: > > > >
> > Initializer or this sounds fine to me. As an other data point which >
> > > I > > think has been mentioned before, `sum` uses start and min/max
> >
> > use > > default. `start` does not work, unless we also change the
> > code
> > to > > always use the identity if given (currently that is not the
> > case), > > in > > which case it might be nice. However, "start" seems
> > a bit like > > solving > > a different issue in any case. > > > >
> > Anyway, mostly noise. I really like adding this, the only thing > >
> > worth > > discussing a bit is the name :). - Sebastian > > > > > > On
> > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It calls
> > it
> > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > >
> > >
> > unctools.html#functools.reduce > > > > > > Sent from Astro for Mac On
> > Mar 26, 2018 at 09:54, Eric Wieser <wieser.eric+numpy at gmail. > > com>
> > > > > > wrote: > > > > > > > > It turns out I mispoke -
> >
> > functools.reduce calls the argument > > > > `initial` > > > > > > >
> > On
> > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer <shoyer at gmail.com> > > >
> > wrote: > > > > > This looks like a very logical addition to the
> > reduce
> > interface. > > > > > It has my support! > > > > > > > > > I would
> > have
> > preferred the more descriptive name > > > > > "initial_value", > > >
> > >
> > > but consistency with functools.reduce makes a compelling case > > >
> > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 at
> >
> > 1:15 PM Eric Wieser <wieser.eric+nump y at gm > > > > > ail.com> wrote:
> > To reiterate my comments in the issue - I'm in favor of > this. > > >
> > > > > > > > > > > It seems seem especially valuable for identity-less
> > > > > > > >
> > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the argument
> >
> > name is consistent > > > with > `functools.reduce`. too. > > > > > >
> > >
> > > > > > > The only argument I can see against merging this would be >
> > > > > > > `kwarg`-creep of `reduce`, and I think this has enough use
> > > > > > > >
> > > > >
> > > > > cases to justify that. > > > > > > > > > > > > I'd like to
> > > > > merge
> >
> > in a few days, if no one else has any > > > > > > opinions. > > > > >
> > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 Hameer
> >
> > Abbasi <einstein.edison > > > > > > @gma > > > > > > il.com> wrote:
> > Hello, everyone. I’ve submitted a PR to add a initializer kwarg to
> > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > it
> > allows one to supply a “default” value for identity- > > > > > > >
> > less > > > > > > > ufunc reductions, and specify an initial value for
> > > > > > > > > reductions such as sum (other than zero.) > > > > > > >
> > > > > > >
> > > > > > > Please feel free to review or leave feedback, (although I >
> > > > > >
> > > > > > think Eric and Marten have picked it apart pretty well). > >
> > > > > > >
> > >
> > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > >
> >
> > Hameer > > > > Sent from Astro for Mac > > > > >
> > _______________________________________________ > > > NumPy-
> > Discussion
> > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > >
> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > >
> > _______________________________________________ > > > > >
> > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.o
> > rg
> > https://mail.python.org/mailman/listinfo/numpy-discussi on
> > _______________________________________________ > > > >
> > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > _______________________________________________ > > > >
> > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org >
> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > _______________________________________________ > > > NumPy-
> > Discussion mailing list > > > NumPy-Discussion at python.org > > >
> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > _______________________________________________ > > NumPy- Discussion
> > mailing list > > NumPy-Discussion at python.org > >
> > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > _______________________________________________ > NumPy-Discussion
> > mailing list > NumPy-Discussion at python.org >
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________ NumPy-Discussion
> > mailing list NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________ NumPy-Discussion
> > mailing list NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________ NumPy-Discussion
> > mailing list NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________ NumPy-Discussion
> > mailing list NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180326/843704cd/attachment-0001.html>


More information about the NumPy-Discussion mailing list