From ralf.gommers at gmail.com  Fri Jun  1 00:57:06 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 31 May 2018 21:57:06 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
Message-ID: <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>

On Thu, May 31, 2018 at 4:50 PM, Matti Picus <matti.picus at gmail.com> wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip) we
> spent some time brainstorming about a roadmap for NumPy, in the spirit of
> similar work that was done for Jupyter. The idea is that a document with
> wide community acceptance can guide the work of the full-time developer(s),
> and be a source of ideas for expanding development efforts.
>
> I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap,
> and hope to discuss it at a BOF session during SciPy in the middle of July
> in Austin.
>

Thanks for writing that up!


>
> Eventually it could become a NEP or formalized in another way.
>

A NEP doesn't sound quite right, but moving from wiki to somewhere more
formal and with more control over the contents (e.g. numpy.org or in the
docs) would be useful. A roadmap could/should also include things like
required effort, funding and knowledge/people required.

A couple of comments on the content:
- a mention of stability or backwards compatibility goals under philosophy
would be useful
- the "Could potentially be split out into separate packages..." should be
removed I think - the maskedarray one was already rejected, and the rest
are similarly unhelpful.
- "internal refactorings": MaskedArray yes, but the other ones no.
numpy.distutils and f2py are very hard to test, a big refactor pretty much
guarantees breakage. there's also not much need for refactoring, because
those things are not coupled to the numpy.core internals. numpy.financial
is simply uninteresting - we wish it wasn't there but it is, so now it
simply stays where it is.
- One item that I think is missing under "New functionality" is runtime
switching of backend for numpy.linalg (IIRC discussed on this list before)
and numpy.random (MKL devs are interested in this).

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180531/d5d639d9/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  1 07:43:32 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 1 Jun 2018 07:43:32 -0400
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
Message-ID: <CAJNV+9s8M6YxKPNSYwG3FC+mpFUyr2A5SrX1euCT8U1Ac4Hbrg@mail.gmail.com>

Hi Matti,

Thanks for sharing the roadmap. Overall, it looks very nice. A practical
question is on whether you want input via the mailing list, or should one
just edit the wiki and add questions or so?

As the roadmap mentioned interaction with python proper (and a possible
PEP): one thing that always slightly annoyed me is that numpy math is way
slower for scalars than python math - and duplicates all the function
names. It would seem to make sense to allow python's math module to be
overridden for non-python input, including arrays. That could be another
PEP...

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/77e961c2/attachment.html>

From nussbaum at uni-mainz.de  Fri Jun  1 08:24:48 2018
From: nussbaum at uni-mainz.de (=?UTF-8?Q?Andreas_Nu=C3=9Fbaumer?=)
Date: Fri, 1 Jun 2018 14:24:48 +0200
Subject: [Numpy-discussion] Change in default behavior of np.polyfit
Message-ID: <CANdXcwG4vVKjk70wHP1QZd+00PkbTQtQ2iEQKsx2u_8=gMJo4w@mail.gmail.com>

Hi,

in [1] the scaling factor for the covariance matrix of `np.polyfit` was
discussed. The conclusion was, that it is non-standard and a patch might be
in order to correct this. Pull request [2] changes the factor from
chisq(popt)/(M-N-2) to chisq(popt)/(M-N) (with M=number of point, N=number
of parameters) essentially removing the "-2". Clearly, this changes the
result for the covariance matrix (but not the result for the polynomial
coefficients) and therefore the current behavior if `cov=True` is set.

It should be noted, that `scipy.optimize.curve_fit` also uses the
chisq(popt)/(M-N) as scaling factor (without "-2"). Therefore, the change
would remove a discrepancy.

Additionally, patch [2] adds an option that sets the scaling factor of the
covariance matrix to 1 . This can be useful in occasions, where the weights
are given by 1/sigma with sigma being the (known) standard errors of
(Gaussian distributed) data points, in which case the un-scaled matrix is
already a correct estimate for the covariance matrix.

Best,
Andreas

[1]
http://numpy-discussion.10968.n7.nabble.com/Inconsistent-results-for-the-covariance-matrix-between-scipy-optimize-curve-fit-and-numpy-polyfit-td45582.html
[2] https://github.com/numpy/numpy/pull/11197
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/b1d098d6/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  1 08:29:39 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 1 Jun 2018 08:29:39 -0400
Subject: [Numpy-discussion] Change in default behavior of np.polyfit
In-Reply-To: <CANdXcwG4vVKjk70wHP1QZd+00PkbTQtQ2iEQKsx2u_8=gMJo4w@mail.gmail.com>
References: <CANdXcwG4vVKjk70wHP1QZd+00PkbTQtQ2iEQKsx2u_8=gMJo4w@mail.gmail.com>
Message-ID: <CAJNV+9tTXK_MG3MiH-woT3V7XKRoRPMU_pzJkN+AF2Csc6q5og@mail.gmail.com>

Hi Andreas,

Thanks for noticing and correcting this unexpected scaling! The addition to
get the unscaled version is also very welcome.

All the best,

Marten
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/f879d191/attachment.html>

From toddrjen at gmail.com  Fri Jun  1 11:27:52 2018
From: toddrjen at gmail.com (Todd)
Date: Fri, 1 Jun 2018 11:27:52 -0400
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
Message-ID: <CAFpSVpJ+tOJCYkU1jV+NJXGmhniU8DXqwzRrbBa6TMZTxmLidw@mail.gmail.com>

On Thu, May 31, 2018, 19:50 Matti Picus <matti.picus at gmail.com> wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> we spent some time brainstorming about a roadmap for NumPy, in the
> spirit of similar work that was done for Jupyter. The idea is that a
> document with wide community acceptance can guide the work of the
> full-time developer(s), and be a source of ideas for expanding
> development efforts.
>
> I put the document up at
> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> it at a BOF session during SciPy in the middle of July in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>


Some things I have seen mentioned but don't know the current plans for:

* Categorical arrays
* Releasing the GIL wherever possible
* Using multithreading internally
* making use of the next generation blas when available and stay involved
in planning to make sure it supports our needs
* Figure out where to use Cython and were not to

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/7c34219b/attachment.html>

From toddrjen at gmail.com  Fri Jun  1 11:48:32 2018
From: toddrjen at gmail.com (Todd)
Date: Fri, 1 Jun 2018 11:48:32 -0400
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CAFpSVpJ+tOJCYkU1jV+NJXGmhniU8DXqwzRrbBa6TMZTxmLidw@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CAFpSVpJ+tOJCYkU1jV+NJXGmhniU8DXqwzRrbBa6TMZTxmLidw@mail.gmail.com>
Message-ID: <CAFpSVpKdAB73E39PL0wEgq6GzHpAQiksagJ0EiVyzMf-cw9b8w@mail.gmail.com>

On Fri, Jun 1, 2018, 11:27 Todd <toddrjen at gmail.com> wrote:

>
>
> On Thu, May 31, 2018, 19:50 Matti Picus <matti.picus at gmail.com> wrote:
>
>> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
>> we spent some time brainstorming about a roadmap for NumPy, in the
>> spirit of similar work that was done for Jupyter. The idea is that a
>> document with wide community acceptance can guide the work of the
>> full-time developer(s), and be a source of ideas for expanding
>> development efforts.
>>
>> I put the document up at
>> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
>> it at a BOF session during SciPy in the middle of July in Austin.
>>
>> Eventually it could become a NEP or formalized in another way.
>>
>> Matti
>>
>
>
> Some things I have seen mentioned but don't know the current plans for:
>
> * Categorical arrays
> * Releasing the GIL wherever possible
> * Using multithreading internally
> * making use of the next generation blas when available and stay involved
> in planning to make sure it supports our needs
> * Figure out where to use Cython and were not to
>

Also:

* Figure out the best way to handle strings.  This may involve multiple
approaches for different situations but the current approach may not be the
best default approach.
* Decimal and/or rational arrays
* if yes to labeled arrays, then there should probably be a pep about
label-based indexing
* A decision about how to handle numpy 2.0

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/56c60ed2/attachment-0001.html>

From chris.barker at noaa.gov  Fri Jun  1 12:46:57 2018
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 1 Jun 2018 09:46:57 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CAJNV+9s8M6YxKPNSYwG3FC+mpFUyr2A5SrX1euCT8U1Ac4Hbrg@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <CAJNV+9s8M6YxKPNSYwG3FC+mpFUyr2A5SrX1euCT8U1Ac4Hbrg@mail.gmail.com>
Message-ID: <CALGmxEKPkX=d=QQL7RBZP3EqbaS_s3pZAv6GwDV3T4rYLuqyZw@mail.gmail.com>

On Fri, Jun 1, 2018 at 4:43 AM, Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:


>  one thing that always slightly annoyed me is that numpy math is way
> slower for scalars than python math
>

numpy is also quite a bit slower than raw python for math with (very) small
arrays:

In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
162 ns ? 0.79 ns per loop (mean ? std. dev. of 7 runs, 10000000 loops each)

In [32]: a
Out[32]: array([ 3.4,  5.6])

In [33]: % timeit a2 = a * 10
941 ns ? 7.95 ns per loop (mean ? std. dev. of 7 runs, 1000000 loops each)


(I often want to so this sort of thing, not for performance, but for ease
of computation -- say you have 2 or three coordinates that represent a
point -- it's really nice to be able to scale or shift with array
operations, rather than all that indexing -- but it is pretty slo with
numpy.

I've wondered if numpy could be optimized for small 1D arrays, and maybe
even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
special-casing / short-cutting those cases.

It would require some careful profiling to see if it would help, but it
sure seems possible.

And maybe scalars could be fit into the same system.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/ff7523d1/attachment.html>

From stefanv at berkeley.edu  Fri Jun  1 12:57:14 2018
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Fri, 1 Jun 2018 09:57:14 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
Message-ID: <20180601165714.26ykmtlmi3p75iog@carbo>

Hi Ralf,

On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> - "internal refactorings": MaskedArray yes, but the other ones no.
> numpy.distutils and f2py are very hard to test, a big refactor pretty much
> guarantees breakage. there's also not much need for refactoring, because
> those things are not coupled to the numpy.core internals. numpy.financial
> is simply uninteresting - we wish it wasn't there but it is, so now it
> simply stays where it is.

I want to clarify that in the current notes we put down ideas that
prompted active discussion, even if they weren't necessarily feasible.
I feel it is important to keep the conversation open to run its course
until we have a good understanding of the various issues at hand.

You may find that, in person, people are more willing to admit to their
support for some "heretical" ideas than they are here on the list.

E.g., you say that the financial functions "now simply stay", but that
promises a future of a NumPy that never shrinks, while there is
certainly some support for allowing NumPy to contract so that we can
release maintenance burden and allow development of other core areas
that have been neglected for a long time.

You will *always* have small, vocal proponents of any specific piece of
functionality; that doesn't necessarily mean that such functionality
contributes to the health of a project as a whole.

So, I gently urge us carefully reconsider the narrative that nothing can
change/be removed, and evaluate each suggestion carefully, not weighing
only the very evident negatives but also the longer term positives.

Best regards,
St?fan

From chris.barker at noaa.gov  Fri Jun  1 13:06:48 2018
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 1 Jun 2018 10:06:48 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CALGmxEKPkX=d=QQL7RBZP3EqbaS_s3pZAv6GwDV3T4rYLuqyZw@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <CAJNV+9s8M6YxKPNSYwG3FC+mpFUyr2A5SrX1euCT8U1Ac4Hbrg@mail.gmail.com>
 <CALGmxEKPkX=d=QQL7RBZP3EqbaS_s3pZAv6GwDV3T4rYLuqyZw@mail.gmail.com>
Message-ID: <CALGmxELpaWAsC0w8eHPZUqDgwd4pb5SRLfru0dx=CDG_AQ4H=Q@mail.gmail.com>

On Fri, Jun 1, 2018 at 9:46 AM, Chris Barker <chris.barker at noaa.gov> wrote:

> numpy is also quite a bit slower than raw python for math with (very)
> small arrays:
>

doing a bit more experimentation, the advantage is with pure python for
over 10 elements (I got bored...). but I noticed that the time for numpy
computation is pretty much constant for 2 up to around 100 elements. Which
implies that the bulk of the issue is with "startup" costs, rather than
fancy indexing or anything like that. so maybe a short cut wouldn't be
helpful.

Note if you use a list comp (the pythonic translation of an array
operation) thecrossover point is about 15 elements (in my tests, on my
machine...)

In [90]: % timeit t2 = [x * 10 for x in t]

920 ns ? 4.88 ns per loop (mean ? std. dev. of 7 runs, 1000000 loops each)

-CHB


> In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
> 162 ns ? 0.79 ns per loop (mean ? std. dev. of 7 runs, 10000000 loops each)
>
> In [32]: a
> Out[32]: array([ 3.4,  5.6])
>
> In [33]: % timeit a2 = a * 10
> 941 ns ? 7.95 ns per loop (mean ? std. dev. of 7 runs, 1000000 loops each)
>
>
> (I often want to so this sort of thing, not for performance, but for ease
> of computation -- say you have 2 or three coordinates that represent a
> point -- it's really nice to be able to scale or shift with array
> operations, rather than all that indexing -- but it is pretty slo with
> numpy.
>
> I've wondered if numpy could be optimized for small 1D arrays, and maybe
> even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
> special-casing / short-cutting those cases.
>
> It would require some careful profiling to see if it would help, but it
> sure seems possible.
>
> And maybe scalars could be fit into the same system.
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/1344c727/attachment.html>

From harrigan.matthew at gmail.com  Fri Jun  1 13:11:54 2018
From: harrigan.matthew at gmail.com (Matthew Harrigan)
Date: Fri, 1 Jun 2018 13:11:54 -0400
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <20180601165714.26ykmtlmi3p75iog@carbo>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <20180601165714.26ykmtlmi3p75iog@carbo>
Message-ID: <CAOfRF=iryCAr+n5UpZFM82o+WzXFv=v_KjE4yKHzWjGNRvc89A@mail.gmail.com>

I would love to see gufuncs become more general.  Specifically I would like
an optional prologue and epilogue function. The prologue could potentially
1) inspect parameterized dtypes 2) kwargs 3) set non-trivial output array
sizes 4) initialize data structures 5) defer processing to other functions
(BLAS).  The epilogue function could do any clean up of data structures.

On Fri, Jun 1, 2018 at 12:57 PM, Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>
> Best regards,
> St?fan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/3647a602/attachment-0001.html>

From harrigan.matthew at gmail.com  Fri Jun  1 13:19:00 2018
From: harrigan.matthew at gmail.com (Matthew Harrigan)
Date: Fri, 1 Jun 2018 13:19:00 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
Message-ID: <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>

Stephan, good point about use cases.  I think its still an odd fit.  For
example I think np.array_equal(np.zeros((3,3)), np.zeros((2,2))) or
np.array_equal([1], ['foo']) would be difficult or impossible to replicate
with a potential all_equal gufunc

On Thu, May 31, 2018 at 2:00 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Wed, May 30, 2018 at 5:01 PM Matthew Harrigan <
> harrigan.matthew at gmail.com> wrote:
>
>> "short-cut to automatically return False if m != n", that seems like a
>> silent bug
>>
>
> I guess it depends on the use-cases. This is how np.array_equal() works:
> https://docs.scipy.org/doc/numpy/reference/generated/
> numpy.array_equal.html
>
> We could even imagine incorporating this hypothetical "equality along some
> axes with broadcasting" functionality into axis/axes arguments for
> array_equal() if we choose this behavior.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/a0f39bec/attachment.html>

From gael.varoquaux at normalesup.org  Fri Jun  1 13:19:41 2018
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 1 Jun 2018 19:19:41 +0200
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CAOfRF=iryCAr+n5UpZFM82o+WzXFv=v_KjE4yKHzWjGNRvc89A@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <20180601165714.26ykmtlmi3p75iog@carbo>
 <CAOfRF=iryCAr+n5UpZFM82o+WzXFv=v_KjE4yKHzWjGNRvc89A@mail.gmail.com>
Message-ID: <20180601171941.sllo2kikmqnzk2d3@phare.normalesup.org>

While we are in the crazy wish-list: having dtypes that are universal
enough for pandas to use them and export their columns with them would be
my crazy wish. I hope that it would help adding more uniform support for
things like categorical variables in the pydata ecosystem.

Ga?l

From ralf.gommers at gmail.com  Fri Jun  1 15:11:17 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 1 Jun 2018 12:11:17 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <20180601165714.26ykmtlmi3p75iog@carbo>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <20180601165714.26ykmtlmi3p75iog@carbo>
Message-ID: <CABL7CQhDVbwkrVB+cfC7y+KBLd3_78K-uPx2y4a_hDZNt_Trrg@mail.gmail.com>

On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>

Thanks Stefan, good points. I totally agree that anything can be discussed.


>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>

I don't think there's such a narrative - e.g. the removal of np.matrix that
we've planned and getting rid of MaskedArray at some point once we have a
better new masked array implementation are *major* removals. We do plan
those things because they have major benefits. Imho "major benefits" is a
bar that needs to be passed before listing features as up for removal on a
roadmap (even a draft one).

It would be helpful maybe to find a form for the roadmap where the
essentials of such discussions (key pros/cons) can be captured. Or at least
split it in good/desirable/planned items and "wild ideas".

Re `financial`, there isn't much of a pro as far as I can tell - there's
almost zero maintenance cost now, and it doesn't hinder any of the proposed
new features. Plus it's a discussion we've had a couple of times before.

I know that the current roadmap doc is only draft, but it still says "NumPy
Roadmap" and it's the best thing we have now, so I'd prefer to not have
things there (or have them in a separate random/controversial ideas
section) that are unlikely to happen or for which it's unclear if they're
good ideas.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/d3e52e2c/attachment.html>

From charlesr.harris at gmail.com  Fri Jun  1 16:17:12 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 1 Jun 2018 14:17:12 -0600
Subject: [Numpy-discussion] Python 3 compatible examples
Message-ID: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>

Hi All,

This post is prompted by this PR <https://github.com/numpy/numpy/pull/11222>.
It would be good to come up with a timeline and plan for rewriting the
examples to be Python 3 compatible. When we do so, we should also make it
assumed that `from __future__ import print_function` has been executed when
the examples are executed in Python 2.7. Might want to include `division`
in that future import as well.

Anyway, wanted to raise the subject. Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/e6db6822/attachment.html>

From pav at iki.fi  Fri Jun  1 16:22:20 2018
From: pav at iki.fi (Pauli Virtanen)
Date: Fri, 01 Jun 2018 22:22:20 +0200
Subject: [Numpy-discussion] Python 3 compatible examples
In-Reply-To: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>
References: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>
Message-ID: <a62cdc4635310ae647f29fa592802d55c74b4cb5.camel@iki.fi>

pe, 2018-06-01 kello 14:17 -0600, Charles R Harris kirjoitti:
> This post is prompted by this PR <https://github.com/numpy/numpy/pull
> /11222>.
> It would be good to come up with a timeline and plan for rewriting
> the
> examples to be Python 3 compatible. When we do so, we should also
> make it
> assumed that `from __future__ import print_function` has been
> executed when
> the examples are executed in Python 2.7. Might want to include
> `division`
> in that future import as well.
> 
> Anyway, wanted to raise the subject. Thoughts?

For Scipy, we converted the examples in the documentation to Python 3,
and have essentially ignored Python 2 compatibility. So far, I remember
no complaints about it.

	Pauli


From jni.soma at gmail.com  Fri Jun  1 16:43:19 2018
From: jni.soma at gmail.com (Juan Nunez-Iglesias)
Date: Sat, 02 Jun 2018 06:43:19 +1000
Subject: [Numpy-discussion] Python 3 compatible examples
In-Reply-To: <a62cdc4635310ae647f29fa592802d55c74b4cb5.camel@iki.fi>
References: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>
 <a62cdc4635310ae647f29fa592802d55c74b4cb5.camel@iki.fi>
Message-ID: <1527885799.4058856.1393493968.7F81BC66@webmail.messagingengine.com>


On Sat, Jun 2, 2018, at 6:22 AM, Pauli Virtanen wrote:
> For Scipy, we converted the examples in the documentation to Python 3,
> and have essentially ignored Python 2 compatibility. So far, I remember
> no complaints about it.

I vote for what Pauli said.

From m.h.vankerkwijk at gmail.com  Fri Jun  1 17:21:32 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 1 Jun 2018 17:21:32 -0400
Subject: [Numpy-discussion] Python 3 compatible examples
In-Reply-To: <1527885799.4058856.1393493968.7F81BC66@webmail.messagingengine.com>
References: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>
 <a62cdc4635310ae647f29fa592802d55c74b4cb5.camel@iki.fi>
 <1527885799.4058856.1393493968.7F81BC66@webmail.messagingengine.com>
Message-ID: <CAJNV+9uXA4fT=-hDV9y7hhih-C9k5Y171GfPYtrHSw0VhhZLFw@mail.gmail.com>

Agreed, good to get started and stop worrying about python2 in the examples
at least.
?If someone cuts&pastes and it fails, it is just a good reminder to get
moving...
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/ef20bfd4/attachment.html>

From millman at berkeley.edu  Fri Jun  1 17:29:13 2018
From: millman at berkeley.edu (Jarrod Millman)
Date: Fri, 1 Jun 2018 14:29:13 -0700
Subject: [Numpy-discussion] Python 3 compatible examples
In-Reply-To: <1527885799.4058856.1393493968.7F81BC66@webmail.messagingengine.com>
References: <CAB6mnxLQwb6e+4viQm_81EeXB303nytKkcGKy3NY68tGD2FDOg@mail.gmail.com>
 <a62cdc4635310ae647f29fa592802d55c74b4cb5.camel@iki.fi>
 <1527885799.4058856.1393493968.7F81BC66@webmail.messagingengine.com>
Message-ID: <CAB6X4shA8Mu8sMbKPB1o=KTwvxjZNN_V6fPKeWpcbVfRdBn2qw@mail.gmail.com>

+1

On Fri, Jun 1, 2018 at 1:43 PM, Juan Nunez-Iglesias <jni.soma at gmail.com> wrote:
>
> On Sat, Jun 2, 2018, at 6:22 AM, Pauli Virtanen wrote:
>> For Scipy, we converted the examples in the documentation to Python 3,
>> and have essentially ignored Python 2 compatibility. So far, I remember
>> no complaints about it.
>
> I vote for what Pauli said.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From millman at berkeley.edu  Fri Jun  1 17:31:00 2018
From: millman at berkeley.edu (Jarrod Millman)
Date: Fri, 1 Jun 2018 14:31:00 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <CABL7CQhDVbwkrVB+cfC7y+KBLd3_78K-uPx2y4a_hDZNt_Trrg@mail.gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <CABL7CQgpFw=a9RSy+TyrpsQM7p3ZRGqqLw+woX1uiz=JUDeTsg@mail.gmail.com>
 <20180601165714.26ykmtlmi3p75iog@carbo>
 <CABL7CQhDVbwkrVB+cfC7y+KBLd3_78K-uPx2y4a_hDZNt_Trrg@mail.gmail.com>
Message-ID: <CAB6X4siZFwfh+HvsnsoJ4tPeBCdwJcMUYjM60oAWNiHZ6S8bYQ@mail.gmail.com>

I like the idea of a random/controversial ideas section.

On Fri, Jun 1, 2018 at 12:11 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt <stefanv at berkeley.edu>
> wrote:
>>
>> Hi Ralf,
>>
>> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
>> > - "internal refactorings": MaskedArray yes, but the other ones no.
>> > numpy.distutils and f2py are very hard to test, a big refactor pretty
>> > much
>> > guarantees breakage. there's also not much need for refactoring, because
>> > those things are not coupled to the numpy.core internals.
>> > numpy.financial
>> > is simply uninteresting - we wish it wasn't there but it is, so now it
>> > simply stays where it is.
>>
>> I want to clarify that in the current notes we put down ideas that
>> prompted active discussion, even if they weren't necessarily feasible.
>> I feel it is important to keep the conversation open to run its course
>> until we have a good understanding of the various issues at hand.
>>
>> You may find that, in person, people are more willing to admit to their
>> support for some "heretical" ideas than they are here on the list.
>
>
> Thanks Stefan, good points. I totally agree that anything can be discussed.
>
>>
>>
>> E.g., you say that the financial functions "now simply stay", but that
>> promises a future of a NumPy that never shrinks, while there is
>> certainly some support for allowing NumPy to contract so that we can
>> release maintenance burden and allow development of other core areas
>> that have been neglected for a long time.
>>
>> You will *always* have small, vocal proponents of any specific piece of
>> functionality; that doesn't necessarily mean that such functionality
>> contributes to the health of a project as a whole.
>>
>> So, I gently urge us carefully reconsider the narrative that nothing can
>> change/be removed, and evaluate each suggestion carefully, not weighing
>> only the very evident negatives but also the longer term positives.
>
>
> I don't think there's such a narrative - e.g. the removal of np.matrix that
> we've planned and getting rid of MaskedArray at some point once we have a
> better new masked array implementation are *major* removals. We do plan
> those things because they have major benefits. Imho "major benefits" is a
> bar that needs to be passed before listing features as up for removal on a
> roadmap (even a draft one).
>
> It would be helpful maybe to find a form for the roadmap where the
> essentials of such discussions (key pros/cons) can be captured. Or at least
> split it in good/desirable/planned items and "wild ideas".
>
> Re `financial`, there isn't much of a pro as far as I can tell - there's
> almost zero maintenance cost now, and it doesn't hinder any of the proposed
> new features. Plus it's a discussion we've had a couple of times before.
>
> I know that the current roadmap doc is only draft, but it still says "NumPy
> Roadmap" and it's the best thing we have now, so I'd prefer to not have
> things there (or have them in a separate random/controversial ideas section)
> that are unlikely to happen or for which it's unclear if they're good ideas.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

From m.h.vankerkwijk at gmail.com  Fri Jun  1 17:41:18 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 1 Jun 2018 17:41:18 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
Message-ID: <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>

Hi Nathaniel,

On Matt's prompting, I added release notes to the frozen/flexible PR [1];
see text attached below.

Having done that, I felt the examples actually justified the frozen
dimensions quite well. Given that you're the who expressed most doubts
about them, could you have a look? Ideally, I'd avoid having to write a NEP
for this, and the examples do seem to make it quite obvious that this
change to the signature is the way to go, as its meaning is dead obvious.
And the implementation is super-straightforward...

For the broadcasted core dimensions, I do agree the case is less strong and
the meaning perhaps less obvious (implementation is relatively simple), and
I think a short NEP may be called for (unless others on the list have
super-convincing use cases...). I will add here, though, that even if we
implement `all_equal` as a method on `equal`, it would still be useful to
have a signature that can actually describe it.

-- Marten

[1] https://github.com/numpy/numpy/pull/11175/files

Generalized ufunc signatures now allow fixed-size dimensions
------------------------------------------------------------
By using a numerical value in the signature of a generalized ufunc, one can
indicate that the given function requires input or output to have dimensions
with the given size. E.g., the signature of a function that converts a polar
angle to a two-dimensional cartesian unit vector would be ``()->(2)``; that
for one that converts two spherical angles to a three-dimensional unit
vector
would be ``(),()->(3)``; and that for the cross product of two
three-dimensional vectors would be ``(3),(3)->(3)``.

Note that to the elementary function these dimensions are not treated any
differently from variable ones indicated with a letter; the loop still is
passed the corresponding size, but it can now count on that being equal to
the
fixed size given in the signature.

Generalized ufunc signatures now allow flexible dimensions
----------------------------------------------------------

Some functions, in particular numpy's implementation of ``@`` as ``matmul``,
are very similar to generalized ufuncs in that they operate over core
dimensions, but one could not present them as such because they were able to
deal with inputs in which a dimension is missing. To support this, it is now
allowed to postfix a dimension name with a question mark to indicate that
that
dimension does not necessarily have to be present.

With this addition, the signature for ``matmul`` can be expressed as
``(m?,n),(n,p?)->(m?,p?)``.  This indicates that if, e.g., the second
operand
has only one dimension, for the purposes of the elementary function it will
be
treated as if that input has core shape ``(n, 1)``, and the output has the
corresponding core shape of ``(m, 1)``. The actual output array, however,
has
flexible dimension removed, i.e., it will have shape ``(..., n)``.
Similarly, if both arguments have only a single dimension, the inputs will
be
presented as having shapes ``(1, n)`` and ``(n, 1)`` to the elementary
function, and the output as ``(1, 1)``, while the actual output array
returned
will have shape ``()``. In this way, the signature thus allows one to use a
single elementary function for four related but different signatures,
``(m,n),(n,p)->(m,p)``, ``(n),(n,p)->(p)``, ``(m,n),(n)->(m)`` and
``(n),(n)->()``.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/4b166abd/attachment-0001.html>

From charlesr.harris at gmail.com  Fri Jun  1 17:43:48 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 1 Jun 2018 15:43:48 -0600
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
Message-ID: <CAB6mnxJK9JoEv5iP4j7hwyYJEQRFZObNTKNQwQ-vMT0VT4FVqA@mail.gmail.com>

On Thu, May 31, 2018 at 5:50 PM, Matti Picus <matti.picus at gmail.com> wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip) we
> spent some time brainstorming about a roadmap for NumPy, in the spirit of
> similar work that was done for Jupyter. The idea is that a document with
> wide community acceptance can guide the work of the full-time developer(s),
> and be a source of ideas for expanding development efforts.
>
> I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap,
> and hope to discuss it at a BOF session during SciPy in the middle of July
> in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>

Under maintenance we could add something about the transition to Python 3,
in particular cleaning up the code and updating the documentation examples.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/07934abc/attachment.html>

From shoyer at gmail.com  Fri Jun  1 18:45:39 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Fri, 1 Jun 2018 15:45:39 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
Message-ID: <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>

On Fri, Jun 1, 2018 at 2:42 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Having done that, I felt the examples actually justified the frozen
> dimensions quite well. Given that you're the who expressed most doubts
> about them, could you have a look? Ideally, I'd avoid having to write a NEP
> for this, and the examples do seem to make it quite obvious that this
> change to the signature is the way to go, as its meaning is dead obvious.
> And the implementation is super-straightforward...
>

I do think it would be valuable to have a brief NEP on this, especially on
the solution for matmul. NEPs don't have to be long, and don't need to go
into the full detail of implementations. But they are a nice place to
summarize design discussions.

In fact, I would say the text you have below is nearly enough for one or
two NEPs. The parts that are missing would be valuable to add anyways:
- A brief discussion (a sentence or two) of potential broader use-cases for
optional dimensions (ufuncs that act on row/column vectors and matrices).
- A brief discussion of rejected alternatives (only a few sentences for
each alternative).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/12755a12/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  1 19:38:41 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 1 Jun 2018 19:38:41 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
Message-ID: <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>

For the flexible dimensions, that would be up to Nathaniel -- it's his idea
;-) And happily that means that I don't have to spend time looking up how
this NEP business actually works, but can just copy & paste...
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180601/0f55391c/attachment.html>

From robert.kern at gmail.com  Sat Jun  2 15:04:32 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 2 Jun 2018 12:04:32 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
Message-ID: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>

As promised distressingly many months ago, I have written up a NEP about
relaxing the stream-compatibility policy that we currently have.

https://github.com/numpy/numpy/pull/11229
https://github.com/rkern/numpy/blob/nep/rng/doc/neps/nep-0019-rng-policy.rst

I particularly invite comment on the two lists of methods that we still
would make strict compatibility guarantees for.

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <robert.kern at gmail.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy
for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we
have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility
guarantee
was to provide exact reproducibility for simulations across numpy versions
in
order to promote reproducible research.  However, this policy has made it
very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the
surrounding
ecosystem of scientific software, we believe that there are now better ways
to
achieve these objectives.  We propose relaxing our strict
stream-compatibility
policy to remove the obstacles that are in the way of accepting
contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods
using the
    same parameters will always produce the same results up to roundoff
error
    except when the values were incorrect.  Incorrect values will be fixed
and
    the NumPy version in which the fix was made will be noted in the
relevant
    docstring.  Extension of existing parameter ranges and the addition of
new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of
weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it
was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by ?we? in this section, we really mean Robert Kern,
let?s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms,
or
just in a different way could cause changes in the stream, with varying
degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method
relies on
``numpy.linalg`` functions.  Even on the same platform, if one links
``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return
completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for
integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide
exact
guarantees across versions.  We still do apply bug fixes when correctness
is at
stake.  And even if we didn?t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for
our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin
all
of the versions of code of your software stack, possibly down to the OS
itself.
The landscape for accomplishing this is much easier today than it was in
2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the
exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a
faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do
so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but
become
increasingly noisy over time.  Bugs in the current state of the code will
*not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The
specifics
of the new design are out of scope for this NEP and up for much discussion,
but
we will discuss general policies that will guide the evolution of whatever
code
is adopted.

First, we will maintain API source compatibility just as we do with the
rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
is
not a bug.  Correctness bug fixes that break stream-compatibility can
happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user?s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the
`Box-Muller
transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_
method
of Gaussian variate generation to the faster `Ziggurat algorithm
<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of an
unworthy improvement would be tweaking the Ziggurat tables just a little
bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee
stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier
to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we
shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that
can
be used in unit tests so projects that currently have tests which rely on
the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution
methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current
version
of ``RandomState`` does.  Because their intended use is for unit tests, we
do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic
improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past
versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not
able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same
code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use*
correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The
way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user?s code would need
to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only
provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such
needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
alone
is superfluous.


Discussion
----------

-
https://mail.python.org/pipermail/numpy-discussion/2018-January/077608.html
- https://github.com/numpy/numpy/pull/10124#issuecomment-350876221


Copyright
---------

This document has been placed in the public domain.


-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180602/a7de37f3/attachment-0001.html>

From shoyer at gmail.com  Sat Jun  2 18:55:23 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sat, 2 Jun 2018 15:55:23 -0700
Subject: [Numpy-discussion] =?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
 =?utf-8?q?=E2=80=99s_high_level_API?=
Message-ID: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>

Matthew Rocklin and I have written NEP-18, which proposes a new dispatch
mechanism for NumPy's high level API:
http://www.numpy.org/neps/nep-0018-array-function-protocol.html

There has already been a little bit of scattered discussion on the pull
request (https://github.com/numpy/numpy/pull/11189), but per NEP-0 let's
try to keep high-level discussion here on the mailing list.

The full text of the NEP is reproduced below:

==================================================
NEP: Dispatch Mechanism for NumPy's high level API
==================================================

:Author: Stephan Hoyer <shoyer at google.com>
:Author: Matthew Rocklin <mrocklin at gmail.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-29

Abstact
-------

We propose a protocol to allow arguments of numpy functions to define
how that function operates on them. This allows other libraries that
implement NumPy's high level API to reuse Numpy functions. This allows
libraries that extend NumPy's high level API to apply to more NumPy-like
libraries.

Detailed description
--------------------

Numpy's high level ndarray API has been implemented several times
outside of NumPy itself for different architectures, such as for GPU
arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
arrays (Dask array) as well as various Numpy-like implementations in the
deep learning frameworks, like TensorFlow and PyTorch.

Similarly there are several projects that build on top of the Numpy API
for labeled and indexed arrays (XArray), automatic differentation
(Autograd, Tangent), higher order array factorizations (TensorLy), etc.
that add additional functionality on top of the Numpy API.

We would like to be able to use these libraries together, for example we
would like to be able to place a CuPy array within XArray, or perform
automatic differentiation on Dask array code. This would be easier to
accomplish if code written for NumPy ndarrays could also be used by
other NumPy-like projects.

For example, we would like for the following code example to work
equally well with any Numpy-like array object:

.. code:: python

    def f(x):
        y = np.tensordot(x, x.T)
        return np.mean(np.exp(y))

Some of this is possible today with various protocol mechanisms within
Numpy.

-  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
-  The ``.T`` method works using Python's method dispatch
-  The ``np.mean`` function explicitly checks for a ``.mean`` method on
   the argument

However other functions, like ``np.tensordot`` do not dispatch, and
instead are likely to coerce to a Numpy array (using the ``__array__``)
protocol, or err outright. To achieve enough coverage of the NumPy API
to support downstream projects like XArray and autograd we want to
support *almost all* functions within Numpy, which calls for a more
reaching protocol than just ``__array_ufunc__``. We would like a
protocol that allows arguments of a NumPy function to take control and
divert execution to another function (for example a GPU or parallel
implementation) in a way that is safe and consistent across projects.

Implementation
--------------

We propose adding support for a new protocol in NumPy,
``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that
is not covered by existing protocols, like reductions (like ``np.sum``)
or universal functions (like ``np.exp``). The semantics are very similar
to ``__array_ufunc__``, except the operation is specified by an
arbitrary callable object rather than a ufunc instance and method.

The interface
~~~~~~~~~~~~~

We propose the following signature for implementations of
``__array_function__``:

.. code-block:: python

    def __array_function__(self, func, types, args, kwargs)

-  ``func`` is an arbitrary callable exposed by NumPy's public API,
   which was called in the form ``func(*args, **kwargs)``.
-  ``types`` is a list of types for all arguments to the original NumPy
   function call that will be checked for an ``__array_function__``
   implementation.
-  The tuple ``args`` and dict ``**kwargs`` are directly passed on from the
   original call.

Unlike ``__array_ufunc__``, there are no high-level guarantees about the
type of ``func``, or about which of ``args`` and ``kwargs`` may contain
objects
implementing the array API. As a convenience for ``__array_function__``
implementors of the NumPy API, the ``types`` keyword contains a list of all
types that implement the ``__array_function__`` protocol.  This allows
downstream implementations to quickly determine if they are likely able to
support the operation.

Still be determined: what guarantees can we offer for ``types``? Should
we promise that types are unique, and appear in the order in which they
are checked?

Example for a project implementing the NumPy API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most implementations of ``__array_function__`` will start with two
checks:

1.  Is the given function something that we know how to overload?
2.  Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return
the result from calling its implementation for ``func(*args, **kwargs)``.
Otherwise, it should return the sentinel value ``NotImplemented``,
indicating
that the function is not implemented by these types.

.. code:: python

    class MyArray:
        def __array_function__(self, func, types, args, kwargs):
            if func not in HANDLED_FUNCTIONS:
                return NotImplemented
            if not all(issubclass(t, MyArray) for t in types):
                return NotImplemented
            return HANDLED_FUNCTIONS[func](*args, **kwargs)

    HANDLED_FUNCTIONS = {
        np.concatenate: my_concatenate,
        np.broadcast_to: my_broadcast_to,
        np.sum: my_sum,
        ...
    }

Necessary changes within the Numpy codebase itself
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This will require two changes within the Numpy codebase:

1. A function to inspect available inputs, look for the
   ``__array_function__`` attribute on those inputs, and call those
   methods appropriately until one succeeds.  This needs to be fast in the
   common all-NumPy case.

   This is one additional function of moderate complexity.
2. Calling this function within all relevant Numpy functions.

   This affects many parts of the Numpy codebase, although with very low
   complexity.

Finding and calling the right ``__array_function__``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a Numpy function, ``*args`` and ``**kwargs`` inputs, we need to
search through ``*args`` and ``**kwargs`` for all appropriate inputs
that might have the ``__array_function__`` attribute. Then we need to
select among those possible methods and execute the right one.
Negotiating between several possible implementations can be complex.

Finding arguments
'''''''''''''''''

Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
as in the case for ``np.tensordot(left, right, out=out)``, or they may
be nested within lists or dictionaries, such as in the case of
``np.concatenate([x, y, z])``. This can be problematic for two reasons:

1. Some functions are given long lists of values, and traversing them
   might be prohibitively expensive
2. Some function may have arguments that we don't want to inspect, even
   if they have the ``__array_function__`` method

To resolve these we ask the functions to provide an explicit list of
arguments that should be traversed. This is the ``relevant_arguments=``
keyword in the examples below.

Trying ``__array_function__`` methods until the right one works
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Many arguments may implement the ``__array_function__`` protocol. Some
of these may decide that, given the available inputs, they are unable to
determine the correct result. How do we call the right one? If several
are valid then which has precedence?

The rules for dispatch with ``__array_function__`` match those for
``__array_ufunc__`` (see
`NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
In particular:

-  NumPy will gather implementations of ``__array_function__`` from all
   specified inputs and call them in order: subclasses before
   superclasses, and otherwise left to right. Note that in some edge cases,
   this differs slightly from the
   `current behavior <https://bugs.python.org/issue30140>`_ of Python.
-  Implementations of ``__array_function__`` indicate that they can
   handle the operation by returning any value other than
   ``NotImplemented``.
-  If all ``__array_function__`` methods return ``NotImplemented``,
   NumPy will raise ``TypeError``.

Changes within Numpy functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a function defined above, for now call it
``do_array_function_dance``, we now need to call that function from
within every relevant Numpy function. This is a pervasive change, but of
fairly simple and innocuous code that should complete quickly and
without effect if no arguments implement the ``__array_function__``
protocol. Let us consider a few examples of NumPy functions and how they
might be affected by this change:

.. code:: python

    def broadcast_to(array, shape, subok=False):
        success, value = do_array_function_dance(
            func=broadcast_to,
            relevant_arguments=[array],
            args=(array,),
            kwargs=dict(shape=shape, subok=subok))
        if success:
            return value

        ... # continue with the definition of broadcast_to

    def concatenate(arrays, axis=0, out=None)
        success, value = do_array_function_dance(
            func=concatenate,
            relevant_arguments=[arrays, out],
            args=(arrays,),
            kwargs=dict(axis=axis, out=out))
        if success:
            return value

        ... # continue with the definition of concatenate

The list of objects passed to ``relevant_arguments`` are those that should
be inspected for ``__array_function__`` implementations.

Alternatively, we could write these overloads with a decorator, e.g.,

.. code:: python

    @overload_for_array_function(['array'])
    def broadcast_to(array, shape, subok=False):
        ... # continue with the definition of broadcast_to

    @overload_for_array_function(['arrays', 'out'])
    def concatenate(arrays, axis=0, out=None):
        ... # continue with the definition of concatenate

The decorator ``overload_for_array_function`` would be written in terms
of ``do_array_function_dance``.

The downside of this approach would be a loss of introspection capability
for NumPy functions on Python 2, since this requires the use of
``inspect.Signature`` (only available on Python 3). However, NumPy won't
be supporting Python 2 for `very much longer <
http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_.

Use outside of NumPy
~~~~~~~~~~~~~~~~~~~~

Nothing about this protocol that is particular to NumPy itself. Should
we enourage use of the same ``__array_function__`` protocol third-party
libraries for overloading non-NumPy functions, e.g., for making
array-implementation generic functionality in SciPy?

This would offer significant advantages (SciPy wouldn't need to invent
its own dispatch system) and no downsides that we can think of, because
every function that dispatches with ``__array_function__`` already needs
to be explicitly recognized. Libraries like Dask, CuPy, and Autograd
already wrap a limited subset of SciPy functionality (e.g.,
``scipy.linalg``) similarly to how they wrap NumPy.

If we want to do this, we should consider exposing the helper function
``do_array_function_dance()`` above as a public API.

Non-goals
---------

We are aiming for basic strategy that can be relatively mechanistically
applied to almost all functions in NumPy's API in a relatively short
period of time, the development cycle of a single NumPy release.

We hope to get both the ``__array_function__`` protocol and all specific
overloads right on the first try, but our explicit aim here is to get
something that mostly works (and can be iterated upon), rather than to
wait for an optimal implementation. The price of moving fast is that for
now **this protocol should be considered strictly experimental**. We
reserve the right to change the details of this protocol and how
specific NumPy functions use it at any time in the future -- even in
otherwise bug-fix only releases of NumPy.

In particular, we don't plan to write additional NEPs that list all
specific functions to overload, with exactly how they should be
overloaded. We will leave this up to the discretion of committers on
individual pull requests, trusting that they will surface any
controversies for discussion by interested parties.

However, we already know several families of functions that should be
explicitly exclude from ``__array_function__``. These will need their
own protocols:

-  universal functions, which already have their own protocol.
-  ``array`` and ``asarray``, because they are explicitly intended for
   coercion to actual ``numpy.ndarray`` object.
-  dispatch for methods of any kind, e.g., methods on
   ``np.random.RandomState`` objects.

As a concrete example of how we expect to break behavior in the future,
some functions such as ``np.where`` are currently not NumPy universal
functions, but conceivably could become universal functions in the
future. When/if this happens, we will change such overloads from using
``__array_function__`` to the more specialized ``__array_ufunc__``.


Backward compatibility
----------------------

This proposal does not change existing semantics, except for those arguments
that currently have ``__array_function__`` methods, which should be rare.


Alternatives
------------

Specialized protocols
~~~~~~~~~~~~~~~~~~~~~

We could (and should) continue to develop protocols like
``__array_ufunc__`` for cohesive subsets of Numpy functionality.

As mentioned above, if this means that some functions that we overload
with ``__array_function__`` should switch to a new protocol instead,
that is explicitly OK for as long as ``__array_function__`` retains its
experimental status.

Separate namespace
~~~~~~~~~~~~~~~~~~

A separate namespace for overloaded functions is another possibility,
either inside or outside of NumPy.

This has the advantage of alleviating any possible concerns about
backwards compatibility and would provide the maximum freedom for quick
experimentation. In the long term, it would provide a clean abstration
layer, separating NumPy's high level API from default implementations on
``numpy.ndarray`` objects.

The downsides are that this would require an explicit opt-in from all
existing code, e.g., ``import numpy.api as np``, and in the long term
would result in the maintainence of two separate NumPy APIs. Also, many
functions from ``numpy`` itself are already overloaded (but
inadequately), so confusion about high vs. low level APIs in NumPy would
still persist.

Multiple dispatch
~~~~~~~~~~~~~~~~~

An alternative to our suggestion of the ``__array_function__`` protocol
would be implementing NumPy's core functions as
`multi-methods <https://en.wikipedia.org/wiki/Multiple_dispatch>`_.
Although one of us wrote a `multiple dispatch
library <https://github.com/mrocklin/multipledispatch>`_ for Python, we
don't think this approach makes sense for NumPy in the near term.

The main reason is that NumPy already has a well-proven dispatching
mechanism with ``__array_ufunc__``, based on Python's own dispatching
system for arithemtic, and it would be confusing to add another
mechanism that works in a very different way. This would also be more
invasive change to NumPy itself, which would need to gain a multiple
dispatch implementation.

It is possible that multiple dispatch implementation for NumPy's high
level API could make sense in the future. Fortunately,
``__array_function__`` does not preclude this possibility, because it
would be straightforward to write a shim for a default
``__array_function__`` implementation in terms of multiple dispatch.

Implementations in terms of a limited core API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The internal implemenations of some NumPy functions is extremely simple.
For example: - ``np.stack()`` is implemented in only a few lines of code
by combining indexing with ``np.newaxis``, ``np.concatenate`` and the
``shape`` attribute. - ``np.mean()`` is implemented internally in terms
of ``np.sum()``, ``np.divide()``, ``.astype()`` and ``.shape``.

This suggests the possibility of defining a minimal "core" ndarray
interface, and relying upon it internally in NumPy to implement the full
API. This is an attractive option, because it could significantly reduce
the work required for new array implementations.

However, this also comes with several downsides: 1. The details of how
NumPy implements a high-level function in terms of overloaded functions
now becomes an implicit part of NumPy's public API. For example,
refactoring ``stack`` to use ``np.block()`` instead of
``np.concatenate()`` internally would now become a breaking change. 2.
Array libraries may prefer to implement high level functions differently
than NumPy. For example, a library might prefer to implement a
fundamental operations like ``mean()`` directly rather than relying on
``sum()`` followed by division. More generally, it's not clear yet what
exactly qualifies as core functionality, and figuring this out could be
a large project. 3. We don't yet have an overloading system for
attributes and methods on array objects, e.g., for accessing ``.dtype``
and ``.shape``. This should be the subject of a future NEP, but until
then we should be reluctant to rely on these properties.

Given these concerns, we encourage relying on this approach only in
limited cases.

Coersion to a NumPy array as a catch-all fallback
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With the current design, classes that implement ``__array_function__``
to overload at least one function implicitly declare an intent to
implement the entire NumPy API. It's not possible to implement *only*
``np.concatenate()`` on a type, but fall back to NumPy's default
behavior of casting with ``np.asarray()`` for all other functions.

This could present a backwards compatibility concern that would
discourage libraries from adopting ``__array_function__`` in an
incremental fashion. For example, currently most numpy functions will
implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
that assuredly many pandas users rely on. If pandas implemented
``__array_function__`` only for ``np.concatenate``, unrelated NumPy
functions like ``np.nanmean`` would suddenly break on pandas objects by
raising TypeError.

With ``__array_ufunc__``, it's possible to alleviate this concern by
casting all arguments to numpy arrays and re-calling the ufunc, but the
heterogeneous function signatures supported by ``__array_function__``
make it impossible to implement this generic fallback behavior for
``__array_function__``.

We could resolve this issue by change the handling of return values in
``__array_function__`` in either of two possible ways: 1. Change the
meaning of all arguments returning ``NotImplemented`` to indicate that
all arguments should be coerced to NumPy arrays instead. However, many
array libraries (e.g., scipy.sparse) really don't want implicit
conversions to NumPy arrays, and often avoid implementing ``__array__``
for exactly this reason. Implicit conversions can result in silent bugs
and performance degradation. 2. Use another sentinel value of some sort
to indicate that a class implementing part of the higher level array API
is coercible as a fallback, e.g., a return value of
``np.NotImplementedButCoercible`` from ``__array_function__``.

If we take this second approach, we would need to define additional
rules for how coercible array arguments are coerced, e.g., - Would we
try for ``__array_function__`` overloads again after coercing coercible
arguments? - If so, would we coerce coercible arguments one-at-a-time,
or all-at-once?

These are slightly tricky design questions, so for now we propose to
defer this issue. We can always implement
``np.NotImplementedButCoercible`` at some later time if it proves
critical to the numpy community in the future. Importantly, we don't
think this will stop critical libraries that desire to implement most of
the high level NumPy API from adopting this proposal.

NOTE: If you are reading this NEP in its draft state and disagree,
please speak up on the mailing list!

Drawbacks of this approach
--------------------------

Future difficulty extending NumPy's API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One downside of passing on all arguments directly on to
``__array_function__`` is that it makes it hard to extend the signatures
of overloaded NumPy functions with new arguments, because adding even an
optional keyword argument would break existing overloads.

This is not a new problem for NumPy. NumPy has occasionally changed the
signature for functions in the past, including functions like
``numpy.sum`` which support overloads.

For adding new keyword arguments that do not change default behavior, we
would only include these as keyword arguments when they have changed
from default values. This is similar to `what NumPy already has
done <
https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/fromnumeric.py#L1865-L1867
>`_,
e.g., for the optional ``keepdims`` argument in ``sum``:

.. code:: python

    def sum(array, ..., keepdims=np._NoValue):
        kwargs = {}
        if keepdims is not np._NoValue:
            kwargs['keepdims'] = keepdims
        return array.sum(..., **kwargs)

In other cases, such as deprecated arguments, preserving the existing
behavior of overloaded functions may not be possible. Libraries that use
``__array_function__`` should be aware of this risk: we don't propose to
freeze NumPy's API in stone any more than it already is.

Difficulty adding implementation specific arguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some array implementations generally follow NumPy's API, but have
additional optional keyword arguments (e.g., ``dask.array.sum()`` has
``split_every`` and ``tensorflow.reduce_sum()`` has ``name``). A generic
dispatching library could potentially pass on all unrecognized keyword
argument directly to the implementation, but extending ``np.sum()`` to
pass on ``**kwargs`` would entail public facing changes in NumPy.
Customizing the detailed behavior of array libraries will require using
library specific functions, which could be limiting in the case of
libraries that consume the NumPy API such as xarray.


Discussion
----------

Various alternatives to this proposal were discussed in a few Github issues:

1.  `pydata/sparse #1 <https://github.com/pydata/sparse/issues/1>`_
2.  `numpy/numpy #11129 <https://github.com/numpy/numpy/issues/11129>`_

Additionally it was the subject of `a blogpost
<http://matthewrocklin.com/blog/work/2018/05/27/beyond-numpy>`_ Following
this
it was discussed at a `NumPy developer sprint
<https://scisprints.github.io/#may-numpy-developer-sprint>`_ at the `UC
Berkeley Institute for Data Science (BIDS) <https://bids.berkeley.edu/>`_.


References and Footnotes
------------------------

.. [1] Each NEP must either be explicitly labeled as placed in the public
domain (see
   this NEP as an example) or licensed under the `Open Publication
License`_.

.. _Open Publication License: http://www.opencontent.org/openpub/


Copyright
---------

This document has been placed in the public domain. [1]_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180602/795dd8da/attachment-0001.html>

From nathan12343 at gmail.com  Sat Jun  2 19:58:23 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Sat, 2 Jun 2018 18:58:23 -0500
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
Message-ID: <CAJXewO=B9rE0jBJfjH+RZC8ukVu+DCyhpWQpT3bFqUrjh3hi-g@mail.gmail.com>

Perhaps I missed this but I didn?t see: what happens when both
__array_ufunc__ and __array_function__ are defined? I might want to do this
to for example add support for functions like concatenate or stack to a
class that already has an __array_ufunc__ defines.

On Sat, Jun 2, 2018 at 5:56 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> Matthew Rocklin and I have written NEP-18, which proposes a new dispatch
> mechanism for NumPy's high level API:
> http://www.numpy.org/neps/nep-0018-array-function-protocol.html
>
> There has already been a little bit of scattered discussion on the pull
> request (https://github.com/numpy/numpy/pull/11189), but per NEP-0 let's
> try to keep high-level discussion here on the mailing list.
>
> The full text of the NEP is reproduced below:
>
> ==================================================
> NEP: Dispatch Mechanism for NumPy's high level API
> ==================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Matthew Rocklin <mrocklin at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-29
>
> Abstact
> -------
>
> We propose a protocol to allow arguments of numpy functions to define
> how that function operates on them. This allows other libraries that
> implement NumPy's high level API to reuse Numpy functions. This allows
> libraries that extend NumPy's high level API to apply to more NumPy-like
> libraries.
>
> Detailed description
> --------------------
>
> Numpy's high level ndarray API has been implemented several times
> outside of NumPy itself for different architectures, such as for GPU
> arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
> arrays (Dask array) as well as various Numpy-like implementations in the
> deep learning frameworks, like TensorFlow and PyTorch.
>
> Similarly there are several projects that build on top of the Numpy API
> for labeled and indexed arrays (XArray), automatic differentation
> (Autograd, Tangent), higher order array factorizations (TensorLy), etc.
> that add additional functionality on top of the Numpy API.
>
> We would like to be able to use these libraries together, for example we
> would like to be able to place a CuPy array within XArray, or perform
> automatic differentiation on Dask array code. This would be easier to
> accomplish if code written for NumPy ndarrays could also be used by
> other NumPy-like projects.
>
> For example, we would like for the following code example to work
> equally well with any Numpy-like array object:
>
> .. code:: python
>
>     def f(x):
>         y = np.tensordot(x, x.T)
>         return np.mean(np.exp(y))
>
> Some of this is possible today with various protocol mechanisms within
> Numpy.
>
> -  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
> -  The ``.T`` method works using Python's method dispatch
> -  The ``np.mean`` function explicitly checks for a ``.mean`` method on
>    the argument
>
> However other functions, like ``np.tensordot`` do not dispatch, and
> instead are likely to coerce to a Numpy array (using the ``__array__``)
> protocol, or err outright. To achieve enough coverage of the NumPy API
> to support downstream projects like XArray and autograd we want to
> support *almost all* functions within Numpy, which calls for a more
> reaching protocol than just ``__array_ufunc__``. We would like a
> protocol that allows arguments of a NumPy function to take control and
> divert execution to another function (for example a GPU or parallel
> implementation) in a way that is safe and consistent across projects.
>
> Implementation
> --------------
>
> We propose adding support for a new protocol in NumPy,
> ``__array_function__``.
>
> This protocol is intended to be a catch-all for NumPy functionality that
> is not covered by existing protocols, like reductions (like ``np.sum``)
> or universal functions (like ``np.exp``). The semantics are very similar
> to ``__array_ufunc__``, except the operation is specified by an
> arbitrary callable object rather than a ufunc instance and method.
>
> The interface
> ~~~~~~~~~~~~~
>
> We propose the following signature for implementations of
> ``__array_function__``:
>
> .. code-block:: python
>
>     def __array_function__(self, func, types, args, kwargs)
>
> -  ``func`` is an arbitrary callable exposed by NumPy's public API,
>    which was called in the form ``func(*args, **kwargs)``.
> -  ``types`` is a list of types for all arguments to the original NumPy
>    function call that will be checked for an ``__array_function__``
>    implementation.
> -  The tuple ``args`` and dict ``**kwargs`` are directly passed on from the
>    original call.
>
> Unlike ``__array_ufunc__``, there are no high-level guarantees about the
> type of ``func``, or about which of ``args`` and ``kwargs`` may contain
> objects
> implementing the array API. As a convenience for ``__array_function__``
> implementors of the NumPy API, the ``types`` keyword contains a list of all
> types that implement the ``__array_function__`` protocol.  This allows
> downstream implementations to quickly determine if they are likely able to
> support the operation.
>
> Still be determined: what guarantees can we offer for ``types``? Should
> we promise that types are unique, and appear in the order in which they
> are checked?
>
> Example for a project implementing the NumPy API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Most implementations of ``__array_function__`` will start with two
> checks:
>
> 1.  Is the given function something that we know how to overload?
> 2.  Are all arguments of a type that we know how to handle?
>
> If these conditions hold, ``__array_function__`` should return
> the result from calling its implementation for ``func(*args, **kwargs)``.
> Otherwise, it should return the sentinel value ``NotImplemented``,
> indicating
> that the function is not implemented by these types.
>
> .. code:: python
>
>     class MyArray:
>         def __array_function__(self, func, types, args, kwargs):
>             if func not in HANDLED_FUNCTIONS:
>                 return NotImplemented
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return HANDLED_FUNCTIONS[func](*args, **kwargs)
>
>     HANDLED_FUNCTIONS = {
>         np.concatenate: my_concatenate,
>         np.broadcast_to: my_broadcast_to,
>         np.sum: my_sum,
>         ...
>     }
>
> Necessary changes within the Numpy codebase itself
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This will require two changes within the Numpy codebase:
>
> 1. A function to inspect available inputs, look for the
>    ``__array_function__`` attribute on those inputs, and call those
>    methods appropriately until one succeeds.  This needs to be fast in the
>    common all-NumPy case.
>
>    This is one additional function of moderate complexity.
> 2. Calling this function within all relevant Numpy functions.
>
>    This affects many parts of the Numpy codebase, although with very low
>    complexity.
>
> Finding and calling the right ``__array_function__``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a Numpy function, ``*args`` and ``**kwargs`` inputs, we need to
> search through ``*args`` and ``**kwargs`` for all appropriate inputs
> that might have the ``__array_function__`` attribute. Then we need to
> select among those possible methods and execute the right one.
> Negotiating between several possible implementations can be complex.
>
> Finding arguments
> '''''''''''''''''
>
> Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
> as in the case for ``np.tensordot(left, right, out=out)``, or they may
> be nested within lists or dictionaries, such as in the case of
> ``np.concatenate([x, y, z])``. This can be problematic for two reasons:
>
> 1. Some functions are given long lists of values, and traversing them
>    might be prohibitively expensive
> 2. Some function may have arguments that we don't want to inspect, even
>    if they have the ``__array_function__`` method
>
> To resolve these we ask the functions to provide an explicit list of
> arguments that should be traversed. This is the ``relevant_arguments=``
> keyword in the examples below.
>
> Trying ``__array_function__`` methods until the right one works
> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Many arguments may implement the ``__array_function__`` protocol. Some
> of these may decide that, given the available inputs, they are unable to
> determine the correct result. How do we call the right one? If several
> are valid then which has precedence?
>
> The rules for dispatch with ``__array_function__`` match those for
> ``__array_ufunc__`` (see
> `NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
> In particular:
>
> -  NumPy will gather implementations of ``__array_function__`` from all
>    specified inputs and call them in order: subclasses before
>    superclasses, and otherwise left to right. Note that in some edge cases,
>    this differs slightly from the
>    `current behavior <https://bugs.python.org/issue30140>`_ of Python.
> -  Implementations of ``__array_function__`` indicate that they can
>    handle the operation by returning any value other than
>    ``NotImplemented``.
> -  If all ``__array_function__`` methods return ``NotImplemented``,
>    NumPy will raise ``TypeError``.
>
> Changes within Numpy functions
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a function defined above, for now call it
> ``do_array_function_dance``, we now need to call that function from
> within every relevant Numpy function. This is a pervasive change, but of
> fairly simple and innocuous code that should complete quickly and
> without effect if no arguments implement the ``__array_function__``
> protocol. Let us consider a few examples of NumPy functions and how they
> might be affected by this change:
>
> .. code:: python
>
>     def broadcast_to(array, shape, subok=False):
>         success, value = do_array_function_dance(
>             func=broadcast_to,
>             relevant_arguments=[array],
>             args=(array,),
>             kwargs=dict(shape=shape, subok=subok))
>         if success:
>             return value
>
>         ... # continue with the definition of broadcast_to
>
>     def concatenate(arrays, axis=0, out=None)
>         success, value = do_array_function_dance(
>             func=concatenate,
>             relevant_arguments=[arrays, out],
>             args=(arrays,),
>             kwargs=dict(axis=axis, out=out))
>         if success:
>             return value
>
>         ... # continue with the definition of concatenate
>
> The list of objects passed to ``relevant_arguments`` are those that should
> be inspected for ``__array_function__`` implementations.
>
> Alternatively, we could write these overloads with a decorator, e.g.,
>
> .. code:: python
>
>     @overload_for_array_function(['array'])
>     def broadcast_to(array, shape, subok=False):
>         ... # continue with the definition of broadcast_to
>
>     @overload_for_array_function(['arrays', 'out'])
>     def concatenate(arrays, axis=0, out=None):
>         ... # continue with the definition of concatenate
>
> The decorator ``overload_for_array_function`` would be written in terms
> of ``do_array_function_dance``.
>
> The downside of this approach would be a loss of introspection capability
> for NumPy functions on Python 2, since this requires the use of
> ``inspect.Signature`` (only available on Python 3). However, NumPy won't
> be supporting Python 2 for `very much longer <
> http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_.
>
> Use outside of NumPy
> ~~~~~~~~~~~~~~~~~~~~
>
> Nothing about this protocol that is particular to NumPy itself. Should
> we enourage use of the same ``__array_function__`` protocol third-party
> libraries for overloading non-NumPy functions, e.g., for making
> array-implementation generic functionality in SciPy?
>
> This would offer significant advantages (SciPy wouldn't need to invent
> its own dispatch system) and no downsides that we can think of, because
> every function that dispatches with ``__array_function__`` already needs
> to be explicitly recognized. Libraries like Dask, CuPy, and Autograd
> already wrap a limited subset of SciPy functionality (e.g.,
> ``scipy.linalg``) similarly to how they wrap NumPy.
>
> If we want to do this, we should consider exposing the helper function
> ``do_array_function_dance()`` above as a public API.
>
> Non-goals
> ---------
>
> We are aiming for basic strategy that can be relatively mechanistically
> applied to almost all functions in NumPy's API in a relatively short
> period of time, the development cycle of a single NumPy release.
>
> We hope to get both the ``__array_function__`` protocol and all specific
> overloads right on the first try, but our explicit aim here is to get
> something that mostly works (and can be iterated upon), rather than to
> wait for an optimal implementation. The price of moving fast is that for
> now **this protocol should be considered strictly experimental**. We
> reserve the right to change the details of this protocol and how
> specific NumPy functions use it at any time in the future -- even in
> otherwise bug-fix only releases of NumPy.
>
> In particular, we don't plan to write additional NEPs that list all
> specific functions to overload, with exactly how they should be
> overloaded. We will leave this up to the discretion of committers on
> individual pull requests, trusting that they will surface any
> controversies for discussion by interested parties.
>
> However, we already know several families of functions that should be
> explicitly exclude from ``__array_function__``. These will need their
> own protocols:
>
> -  universal functions, which already have their own protocol.
> -  ``array`` and ``asarray``, because they are explicitly intended for
>    coercion to actual ``numpy.ndarray`` object.
> -  dispatch for methods of any kind, e.g., methods on
>    ``np.random.RandomState`` objects.
>
> As a concrete example of how we expect to break behavior in the future,
> some functions such as ``np.where`` are currently not NumPy universal
> functions, but conceivably could become universal functions in the
> future. When/if this happens, we will change such overloads from using
> ``__array_function__`` to the more specialized ``__array_ufunc__``.
>
>
> Backward compatibility
> ----------------------
>
> This proposal does not change existing semantics, except for those
> arguments
> that currently have ``__array_function__`` methods, which should be rare.
>
>
> Alternatives
> ------------
>
> Specialized protocols
> ~~~~~~~~~~~~~~~~~~~~~
>
> We could (and should) continue to develop protocols like
> ``__array_ufunc__`` for cohesive subsets of Numpy functionality.
>
> As mentioned above, if this means that some functions that we overload
> with ``__array_function__`` should switch to a new protocol instead,
> that is explicitly OK for as long as ``__array_function__`` retains its
> experimental status.
>
> Separate namespace
> ~~~~~~~~~~~~~~~~~~
>
> A separate namespace for overloaded functions is another possibility,
> either inside or outside of NumPy.
>
> This has the advantage of alleviating any possible concerns about
> backwards compatibility and would provide the maximum freedom for quick
> experimentation. In the long term, it would provide a clean abstration
> layer, separating NumPy's high level API from default implementations on
> ``numpy.ndarray`` objects.
>
> The downsides are that this would require an explicit opt-in from all
> existing code, e.g., ``import numpy.api as np``, and in the long term
> would result in the maintainence of two separate NumPy APIs. Also, many
> functions from ``numpy`` itself are already overloaded (but
> inadequately), so confusion about high vs. low level APIs in NumPy would
> still persist.
>
> Multiple dispatch
> ~~~~~~~~~~~~~~~~~
>
> An alternative to our suggestion of the ``__array_function__`` protocol
> would be implementing NumPy's core functions as
> `multi-methods <https://en.wikipedia.org/wiki/Multiple_dispatch>`_.
> Although one of us wrote a `multiple dispatch
> library <https://github.com/mrocklin/multipledispatch>`_ for Python, we
> don't think this approach makes sense for NumPy in the near term.
>
> The main reason is that NumPy already has a well-proven dispatching
> mechanism with ``__array_ufunc__``, based on Python's own dispatching
> system for arithemtic, and it would be confusing to add another
> mechanism that works in a very different way. This would also be more
> invasive change to NumPy itself, which would need to gain a multiple
> dispatch implementation.
>
> It is possible that multiple dispatch implementation for NumPy's high
> level API could make sense in the future. Fortunately,
> ``__array_function__`` does not preclude this possibility, because it
> would be straightforward to write a shim for a default
> ``__array_function__`` implementation in terms of multiple dispatch.
>
> Implementations in terms of a limited core API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The internal implemenations of some NumPy functions is extremely simple.
> For example: - ``np.stack()`` is implemented in only a few lines of code
> by combining indexing with ``np.newaxis``, ``np.concatenate`` and the
> ``shape`` attribute. - ``np.mean()`` is implemented internally in terms
> of ``np.sum()``, ``np.divide()``, ``.astype()`` and ``.shape``.
>
> This suggests the possibility of defining a minimal "core" ndarray
> interface, and relying upon it internally in NumPy to implement the full
> API. This is an attractive option, because it could significantly reduce
> the work required for new array implementations.
>
> However, this also comes with several downsides: 1. The details of how
> NumPy implements a high-level function in terms of overloaded functions
> now becomes an implicit part of NumPy's public API. For example,
> refactoring ``stack`` to use ``np.block()`` instead of
> ``np.concatenate()`` internally would now become a breaking change. 2.
> Array libraries may prefer to implement high level functions differently
> than NumPy. For example, a library might prefer to implement a
> fundamental operations like ``mean()`` directly rather than relying on
> ``sum()`` followed by division. More generally, it's not clear yet what
> exactly qualifies as core functionality, and figuring this out could be
> a large project. 3. We don't yet have an overloading system for
> attributes and methods on array objects, e.g., for accessing ``.dtype``
> and ``.shape``. This should be the subject of a future NEP, but until
> then we should be reluctant to rely on these properties.
>
> Given these concerns, we encourage relying on this approach only in
> limited cases.
>
> Coersion to a NumPy array as a catch-all fallback
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> With the current design, classes that implement ``__array_function__``
> to overload at least one function implicitly declare an intent to
> implement the entire NumPy API. It's not possible to implement *only*
> ``np.concatenate()`` on a type, but fall back to NumPy's default
> behavior of casting with ``np.asarray()`` for all other functions.
>
> This could present a backwards compatibility concern that would
> discourage libraries from adopting ``__array_function__`` in an
> incremental fashion. For example, currently most numpy functions will
> implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
> that assuredly many pandas users rely on. If pandas implemented
> ``__array_function__`` only for ``np.concatenate``, unrelated NumPy
> functions like ``np.nanmean`` would suddenly break on pandas objects by
> raising TypeError.
>
> With ``__array_ufunc__``, it's possible to alleviate this concern by
> casting all arguments to numpy arrays and re-calling the ufunc, but the
> heterogeneous function signatures supported by ``__array_function__``
> make it impossible to implement this generic fallback behavior for
> ``__array_function__``.
>
> We could resolve this issue by change the handling of return values in
> ``__array_function__`` in either of two possible ways: 1. Change the
> meaning of all arguments returning ``NotImplemented`` to indicate that
> all arguments should be coerced to NumPy arrays instead. However, many
> array libraries (e.g., scipy.sparse) really don't want implicit
> conversions to NumPy arrays, and often avoid implementing ``__array__``
> for exactly this reason. Implicit conversions can result in silent bugs
> and performance degradation. 2. Use another sentinel value of some sort
> to indicate that a class implementing part of the higher level array API
> is coercible as a fallback, e.g., a return value of
> ``np.NotImplementedButCoercible`` from ``__array_function__``.
>
> If we take this second approach, we would need to define additional
> rules for how coercible array arguments are coerced, e.g., - Would we
> try for ``__array_function__`` overloads again after coercing coercible
> arguments? - If so, would we coerce coercible arguments one-at-a-time,
> or all-at-once?
>
> These are slightly tricky design questions, so for now we propose to
> defer this issue. We can always implement
> ``np.NotImplementedButCoercible`` at some later time if it proves
> critical to the numpy community in the future. Importantly, we don't
> think this will stop critical libraries that desire to implement most of
> the high level NumPy API from adopting this proposal.
>
> NOTE: If you are reading this NEP in its draft state and disagree,
> please speak up on the mailing list!
>
> Drawbacks of this approach
> --------------------------
>
> Future difficulty extending NumPy's API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> One downside of passing on all arguments directly on to
> ``__array_function__`` is that it makes it hard to extend the signatures
> of overloaded NumPy functions with new arguments, because adding even an
> optional keyword argument would break existing overloads.
>
> This is not a new problem for NumPy. NumPy has occasionally changed the
> signature for functions in the past, including functions like
> ``numpy.sum`` which support overloads.
>
> For adding new keyword arguments that do not change default behavior, we
> would only include these as keyword arguments when they have changed
> from default values. This is similar to `what NumPy already has
> done <
> https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/fromnumeric.py#L1865-L1867
> >`_,
> e.g., for the optional ``keepdims`` argument in ``sum``:
>
> .. code:: python
>
>     def sum(array, ..., keepdims=np._NoValue):
>         kwargs = {}
>         if keepdims is not np._NoValue:
>             kwargs['keepdims'] = keepdims
>         return array.sum(..., **kwargs)
>
> In other cases, such as deprecated arguments, preserving the existing
> behavior of overloaded functions may not be possible. Libraries that use
> ``__array_function__`` should be aware of this risk: we don't propose to
> freeze NumPy's API in stone any more than it already is.
>
> Difficulty adding implementation specific arguments
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Some array implementations generally follow NumPy's API, but have
> additional optional keyword arguments (e.g., ``dask.array.sum()`` has
> ``split_every`` and ``tensorflow.reduce_sum()`` has ``name``). A generic
> dispatching library could potentially pass on all unrecognized keyword
> argument directly to the implementation, but extending ``np.sum()`` to
> pass on ``**kwargs`` would entail public facing changes in NumPy.
> Customizing the detailed behavior of array libraries will require using
> library specific functions, which could be limiting in the case of
> libraries that consume the NumPy API such as xarray.
>
>
> Discussion
> ----------
>
> Various alternatives to this proposal were discussed in a few Github
> issues:
>
> 1.  `pydata/sparse #1 <https://github.com/pydata/sparse/issues/1>`_
> 2.  `numpy/numpy #11129 <https://github.com/numpy/numpy/issues/11129>`_
>
> Additionally it was the subject of `a blogpost
> <http://matthewrocklin.com/blog/work/2018/05/27/beyond-numpy>`_ Following
> this
> it was discussed at a `NumPy developer sprint
> <https://scisprints.github.io/#may-numpy-developer-sprint>`_ at the `UC
> Berkeley Institute for Data Science (BIDS) <https://bids.berkeley.edu/>`_.
>
>
> References and Footnotes
> ------------------------
>
> .. [1] Each NEP must either be explicitly labeled as placed in the public
> domain (see
>    this NEP as an example) or licensed under the `Open Publication
> License`_.
>
> .. _Open Publication License: http://www.opencontent.org/openpub/
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain. [1]_
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180602/dddf5851/attachment-0001.html>

From einstein.edison at gmail.com  Sun Jun  3 00:45:55 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sat, 2 Jun 2018 21:45:55 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJXewO=B9rE0jBJfjH+RZC8ukVu+DCyhpWQpT3bFqUrjh3hi-g@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAJXewO=B9rE0jBJfjH+RZC8ukVu+DCyhpWQpT3bFqUrjh3hi-g@mail.gmail.com>
Message-ID: <CADViA5B12d2AY9nLPGZ4NhJoONcyVv68g_SbcMg3qV8Q=xYPfw@mail.gmail.com>

Perhaps I missed this but I didn?t see: what happens when both
__array_ufunc__ and __array_function__ are defined? I might want to do this
to for example add support for functions like concatenate or stack to a
class that already has an __array_ufunc__ defines.


This is mentioned in the section ?Non-goals?, which says that ufuncs and
their methods should be excluded, along with a few other classes of
functions/methods.

Sent from Astro <https://www.helloastro.com> for Mac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180602/3c04fff7/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  3 11:19:01 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 3 Jun 2018 11:19:01 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
Message-ID: <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>

Hi Stephan,

Thanks for posting. Overall, this is great!

My more general comment is one of speed: for *normal* operation performance
should be impacted as minimally as possible. I think this is a serious
issue and feel strongly it *has* to be possible to avoid all arguments
being checked for the `__array_function__` attribute, i.e., there should be
an obvious way to ensure no type checking dance is done. Some possible
solutions (which I think should be in the NEP, even if as discounted
options):

A. Two "namespaces", one for the undecorated base functions, and one
completely trivial one for the decorated ones. The idea would be that if
one knows one is dealing with arrays only, one would do `import
numpy.array_only as np` (i.e., the reverse of the suggestion currently in
the NEP, where the decorated ones are in their own namespace - I agree with
the reasons for discounting that one). Note that in this suggestion the
array-only namespace serves as the one used for
`ndarray.__array_function__`.

B. Automatic insertion by the decorator of an `array_only=np._NoValue` (or
`coerce` and perhaps `subok=...` if not present) in the function signature,
so that users who know that they have arrays only could pass
`array_only=True` (name to be decided). This would be most useful if there
were also some type of configuration parameter that could set the default
of `array_only`.

Note that both A and B could also address, at least partially, the problem
of sometimes wanting to just use the old coercion methods, i.e., not having
to implement every possible numpy function in one go in a new
`__array_function__` on one's class.

Two other general comments:

1. I'm rather unclear about the use of `types`. It can help me decide what
to do, but I would still have to find the argument in question (e.g., for
Quantity, the unit of the relevant argument). I'd recommend passing instead
a tuple of all arguments that were inspected, in the inspection order;
after all, it is just a `arg.__class__` away from the type, and in your
example you'd only have to replace `issubclass` by `isinstance`.

2. For subclasses, it would be very handy to have
`ndarray.__array_function__`, so one can call super after changing
arguments. (For `__array_ufunc__`, there was lots of question about whether
this was useful, but it really is!!). [I think you already agreed with
this, but want to have it in-place, as for subclasses of ndarray this is
just as useful as it would be for subclasses of dask arrays.)

Note that any `ndarray.__array_function__` might also help solve the
problem of cases where coercion is fine: it could have an extra keyword
argument (say `coerce`) that would call the function with coercion in
place. Indeed, if the `ndarray.__array_function__` were used inside the
"dance" function, and then the actual implementation of a given function
would just be a separate, private one.

Again, overall a great idea, and thanks to all those involved for taking it
on.
All the best,

Marten


On Sat, Jun 2, 2018 at 6:55 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> Matthew Rocklin and I have written NEP-18, which proposes a new dispatch
> mechanism for NumPy's high level API: http://www.numpy.org/neps/nep-
> 0018-array-function-protocol.html
>
> There has already been a little bit of scattered discussion on the pull
> request (https://github.com/numpy/numpy/pull/11189), but per NEP-0 let's
> try to keep high-level discussion here on the mailing list.
>
> The full text of the NEP is reproduced below:
>
> ==================================================
> NEP: Dispatch Mechanism for NumPy's high level API
> ==================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Matthew Rocklin <mrocklin at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-29
>
> Abstact
> -------
>
> We propose a protocol to allow arguments of numpy functions to define
> how that function operates on them. This allows other libraries that
> implement NumPy's high level API to reuse Numpy functions. This allows
> libraries that extend NumPy's high level API to apply to more NumPy-like
> libraries.
>
> Detailed description
> --------------------
>
> Numpy's high level ndarray API has been implemented several times
> outside of NumPy itself for different architectures, such as for GPU
> arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
> arrays (Dask array) as well as various Numpy-like implementations in the
> deep learning frameworks, like TensorFlow and PyTorch.
>
> Similarly there are several projects that build on top of the Numpy API
> for labeled and indexed arrays (XArray), automatic differentation
> (Autograd, Tangent), higher order array factorizations (TensorLy), etc.
> that add additional functionality on top of the Numpy API.
>
> We would like to be able to use these libraries together, for example we
> would like to be able to place a CuPy array within XArray, or perform
> automatic differentiation on Dask array code. This would be easier to
> accomplish if code written for NumPy ndarrays could also be used by
> other NumPy-like projects.
>
> For example, we would like for the following code example to work
> equally well with any Numpy-like array object:
>
> .. code:: python
>
>     def f(x):
>         y = np.tensordot(x, x.T)
>         return np.mean(np.exp(y))
>
> Some of this is possible today with various protocol mechanisms within
> Numpy.
>
> -  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
> -  The ``.T`` method works using Python's method dispatch
> -  The ``np.mean`` function explicitly checks for a ``.mean`` method on
>    the argument
>
> However other functions, like ``np.tensordot`` do not dispatch, and
> instead are likely to coerce to a Numpy array (using the ``__array__``)
> protocol, or err outright. To achieve enough coverage of the NumPy API
> to support downstream projects like XArray and autograd we want to
> support *almost all* functions within Numpy, which calls for a more
> reaching protocol than just ``__array_ufunc__``. We would like a
> protocol that allows arguments of a NumPy function to take control and
> divert execution to another function (for example a GPU or parallel
> implementation) in a way that is safe and consistent across projects.
>
> Implementation
> --------------
>
> We propose adding support for a new protocol in NumPy,
> ``__array_function__``.
>
> This protocol is intended to be a catch-all for NumPy functionality that
> is not covered by existing protocols, like reductions (like ``np.sum``)
> or universal functions (like ``np.exp``). The semantics are very similar
> to ``__array_ufunc__``, except the operation is specified by an
> arbitrary callable object rather than a ufunc instance and method.
>
> The interface
> ~~~~~~~~~~~~~
>
> We propose the following signature for implementations of
> ``__array_function__``:
>
> .. code-block:: python
>
>     def __array_function__(self, func, types, args, kwargs)
>
> -  ``func`` is an arbitrary callable exposed by NumPy's public API,
>    which was called in the form ``func(*args, **kwargs)``.
> -  ``types`` is a list of types for all arguments to the original NumPy
>    function call that will be checked for an ``__array_function__``
>    implementation.
> -  The tuple ``args`` and dict ``**kwargs`` are directly passed on from the
>    original call.
>
> Unlike ``__array_ufunc__``, there are no high-level guarantees about the
> type of ``func``, or about which of ``args`` and ``kwargs`` may contain
> objects
> implementing the array API. As a convenience for ``__array_function__``
> implementors of the NumPy API, the ``types`` keyword contains a list of all
> types that implement the ``__array_function__`` protocol.  This allows
> downstream implementations to quickly determine if they are likely able to
> support the operation.
>
> Still be determined: what guarantees can we offer for ``types``? Should
> we promise that types are unique, and appear in the order in which they
> are checked?
>
> Example for a project implementing the NumPy API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Most implementations of ``__array_function__`` will start with two
> checks:
>
> 1.  Is the given function something that we know how to overload?
> 2.  Are all arguments of a type that we know how to handle?
>
> If these conditions hold, ``__array_function__`` should return
> the result from calling its implementation for ``func(*args, **kwargs)``.
> Otherwise, it should return the sentinel value ``NotImplemented``,
> indicating
> that the function is not implemented by these types.
>
> .. code:: python
>
>     class MyArray:
>         def __array_function__(self, func, types, args, kwargs):
>             if func not in HANDLED_FUNCTIONS:
>                 return NotImplemented
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return HANDLED_FUNCTIONS[func](*args, **kwargs)
>
>     HANDLED_FUNCTIONS = {
>         np.concatenate: my_concatenate,
>         np.broadcast_to: my_broadcast_to,
>         np.sum: my_sum,
>         ...
>     }
>
> Necessary changes within the Numpy codebase itself
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This will require two changes within the Numpy codebase:
>
> 1. A function to inspect available inputs, look for the
>    ``__array_function__`` attribute on those inputs, and call those
>    methods appropriately until one succeeds.  This needs to be fast in the
>    common all-NumPy case.
>
>    This is one additional function of moderate complexity.
> 2. Calling this function within all relevant Numpy functions.
>
>    This affects many parts of the Numpy codebase, although with very low
>    complexity.
>
> Finding and calling the right ``__array_function__``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a Numpy function, ``*args`` and ``**kwargs`` inputs, we need to
> search through ``*args`` and ``**kwargs`` for all appropriate inputs
> that might have the ``__array_function__`` attribute. Then we need to
> select among those possible methods and execute the right one.
> Negotiating between several possible implementations can be complex.
>
> Finding arguments
> '''''''''''''''''
>
> Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
> as in the case for ``np.tensordot(left, right, out=out)``, or they may
> be nested within lists or dictionaries, such as in the case of
> ``np.concatenate([x, y, z])``. This can be problematic for two reasons:
>
> 1. Some functions are given long lists of values, and traversing them
>    might be prohibitively expensive
> 2. Some function may have arguments that we don't want to inspect, even
>    if they have the ``__array_function__`` method
>
> To resolve these we ask the functions to provide an explicit list of
> arguments that should be traversed. This is the ``relevant_arguments=``
> keyword in the examples below.
>
> Trying ``__array_function__`` methods until the right one works
> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Many arguments may implement the ``__array_function__`` protocol. Some
> of these may decide that, given the available inputs, they are unable to
> determine the correct result. How do we call the right one? If several
> are valid then which has precedence?
>
> The rules for dispatch with ``__array_function__`` match those for
> ``__array_ufunc__`` (see
> `NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
> In particular:
>
> -  NumPy will gather implementations of ``__array_function__`` from all
>    specified inputs and call them in order: subclasses before
>    superclasses, and otherwise left to right. Note that in some edge cases,
>    this differs slightly from the
>    `current behavior <https://bugs.python.org/issue30140>`_ of Python.
> -  Implementations of ``__array_function__`` indicate that they can
>    handle the operation by returning any value other than
>    ``NotImplemented``.
> -  If all ``__array_function__`` methods return ``NotImplemented``,
>    NumPy will raise ``TypeError``.
>
> Changes within Numpy functions
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a function defined above, for now call it
> ``do_array_function_dance``, we now need to call that function from
> within every relevant Numpy function. This is a pervasive change, but of
> fairly simple and innocuous code that should complete quickly and
> without effect if no arguments implement the ``__array_function__``
> protocol. Let us consider a few examples of NumPy functions and how they
> might be affected by this change:
>
> .. code:: python
>
>     def broadcast_to(array, shape, subok=False):
>         success, value = do_array_function_dance(
>             func=broadcast_to,
>             relevant_arguments=[array],
>             args=(array,),
>             kwargs=dict(shape=shape, subok=subok))
>         if success:
>             return value
>
>         ... # continue with the definition of broadcast_to
>
>     def concatenate(arrays, axis=0, out=None)
>         success, value = do_array_function_dance(
>             func=concatenate,
>             relevant_arguments=[arrays, out],
>             args=(arrays,),
>             kwargs=dict(axis=axis, out=out))
>         if success:
>             return value
>
>         ... # continue with the definition of concatenate
>
> The list of objects passed to ``relevant_arguments`` are those that should
> be inspected for ``__array_function__`` implementations.
>
> Alternatively, we could write these overloads with a decorator, e.g.,
>
> .. code:: python
>
>     @overload_for_array_function(['array'])
>     def broadcast_to(array, shape, subok=False):
>         ... # continue with the definition of broadcast_to
>
>     @overload_for_array_function(['arrays', 'out'])
>     def concatenate(arrays, axis=0, out=None):
>         ... # continue with the definition of concatenate
>
> The decorator ``overload_for_array_function`` would be written in terms
> of ``do_array_function_dance``.
>
> The downside of this approach would be a loss of introspection capability
> for NumPy functions on Python 2, since this requires the use of
> ``inspect.Signature`` (only available on Python 3). However, NumPy won't
> be supporting Python 2 for `very much longer <http://www.numpy.org/neps/
> nep-0014-dropping-python2.7-proposal.html>`_.
>
> Use outside of NumPy
> ~~~~~~~~~~~~~~~~~~~~
>
> Nothing about this protocol that is particular to NumPy itself. Should
> we enourage use of the same ``__array_function__`` protocol third-party
> libraries for overloading non-NumPy functions, e.g., for making
> array-implementation generic functionality in SciPy?
>
> This would offer significant advantages (SciPy wouldn't need to invent
> its own dispatch system) and no downsides that we can think of, because
> every function that dispatches with ``__array_function__`` already needs
> to be explicitly recognized. Libraries like Dask, CuPy, and Autograd
> already wrap a limited subset of SciPy functionality (e.g.,
> ``scipy.linalg``) similarly to how they wrap NumPy.
>
> If we want to do this, we should consider exposing the helper function
> ``do_array_function_dance()`` above as a public API.
>
> Non-goals
> ---------
>
> We are aiming for basic strategy that can be relatively mechanistically
> applied to almost all functions in NumPy's API in a relatively short
> period of time, the development cycle of a single NumPy release.
>
> We hope to get both the ``__array_function__`` protocol and all specific
> overloads right on the first try, but our explicit aim here is to get
> something that mostly works (and can be iterated upon), rather than to
> wait for an optimal implementation. The price of moving fast is that for
> now **this protocol should be considered strictly experimental**. We
> reserve the right to change the details of this protocol and how
> specific NumPy functions use it at any time in the future -- even in
> otherwise bug-fix only releases of NumPy.
>
> In particular, we don't plan to write additional NEPs that list all
> specific functions to overload, with exactly how they should be
> overloaded. We will leave this up to the discretion of committers on
> individual pull requests, trusting that they will surface any
> controversies for discussion by interested parties.
>
> However, we already know several families of functions that should be
> explicitly exclude from ``__array_function__``. These will need their
> own protocols:
>
> -  universal functions, which already have their own protocol.
> -  ``array`` and ``asarray``, because they are explicitly intended for
>    coercion to actual ``numpy.ndarray`` object.
> -  dispatch for methods of any kind, e.g., methods on
>    ``np.random.RandomState`` objects.
>
> As a concrete example of how we expect to break behavior in the future,
> some functions such as ``np.where`` are currently not NumPy universal
> functions, but conceivably could become universal functions in the
> future. When/if this happens, we will change such overloads from using
> ``__array_function__`` to the more specialized ``__array_ufunc__``.
>
>
> Backward compatibility
> ----------------------
>
> This proposal does not change existing semantics, except for those
> arguments
> that currently have ``__array_function__`` methods, which should be rare.
>
>
> Alternatives
> ------------
>
> Specialized protocols
> ~~~~~~~~~~~~~~~~~~~~~
>
> We could (and should) continue to develop protocols like
> ``__array_ufunc__`` for cohesive subsets of Numpy functionality.
>
> As mentioned above, if this means that some functions that we overload
> with ``__array_function__`` should switch to a new protocol instead,
> that is explicitly OK for as long as ``__array_function__`` retains its
> experimental status.
>
> Separate namespace
> ~~~~~~~~~~~~~~~~~~
>
> A separate namespace for overloaded functions is another possibility,
> either inside or outside of NumPy.
>
> This has the advantage of alleviating any possible concerns about
> backwards compatibility and would provide the maximum freedom for quick
> experimentation. In the long term, it would provide a clean abstration
> layer, separating NumPy's high level API from default implementations on
> ``numpy.ndarray`` objects.
>
> The downsides are that this would require an explicit opt-in from all
> existing code, e.g., ``import numpy.api as np``, and in the long term
> would result in the maintainence of two separate NumPy APIs. Also, many
> functions from ``numpy`` itself are already overloaded (but
> inadequately), so confusion about high vs. low level APIs in NumPy would
> still persist.
>
> Multiple dispatch
> ~~~~~~~~~~~~~~~~~
>
> An alternative to our suggestion of the ``__array_function__`` protocol
> would be implementing NumPy's core functions as
> `multi-methods <https://en.wikipedia.org/wiki/Multiple_dispatch>`_.
> Although one of us wrote a `multiple dispatch
> library <https://github.com/mrocklin/multipledispatch>`_ for Python, we
> don't think this approach makes sense for NumPy in the near term.
>
> The main reason is that NumPy already has a well-proven dispatching
> mechanism with ``__array_ufunc__``, based on Python's own dispatching
> system for arithemtic, and it would be confusing to add another
> mechanism that works in a very different way. This would also be more
> invasive change to NumPy itself, which would need to gain a multiple
> dispatch implementation.
>
> It is possible that multiple dispatch implementation for NumPy's high
> level API could make sense in the future. Fortunately,
> ``__array_function__`` does not preclude this possibility, because it
> would be straightforward to write a shim for a default
> ``__array_function__`` implementation in terms of multiple dispatch.
>
> Implementations in terms of a limited core API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The internal implemenations of some NumPy functions is extremely simple.
> For example: - ``np.stack()`` is implemented in only a few lines of code
> by combining indexing with ``np.newaxis``, ``np.concatenate`` and the
> ``shape`` attribute. - ``np.mean()`` is implemented internally in terms
> of ``np.sum()``, ``np.divide()``, ``.astype()`` and ``.shape``.
>
> This suggests the possibility of defining a minimal "core" ndarray
> interface, and relying upon it internally in NumPy to implement the full
> API. This is an attractive option, because it could significantly reduce
> the work required for new array implementations.
>
> However, this also comes with several downsides: 1. The details of how
> NumPy implements a high-level function in terms of overloaded functions
> now becomes an implicit part of NumPy's public API. For example,
> refactoring ``stack`` to use ``np.block()`` instead of
> ``np.concatenate()`` internally would now become a breaking change. 2.
> Array libraries may prefer to implement high level functions differently
> than NumPy. For example, a library might prefer to implement a
> fundamental operations like ``mean()`` directly rather than relying on
> ``sum()`` followed by division. More generally, it's not clear yet what
> exactly qualifies as core functionality, and figuring this out could be
> a large project. 3. We don't yet have an overloading system for
> attributes and methods on array objects, e.g., for accessing ``.dtype``
> and ``.shape``. This should be the subject of a future NEP, but until
> then we should be reluctant to rely on these properties.
>
> Given these concerns, we encourage relying on this approach only in
> limited cases.
>
> Coersion to a NumPy array as a catch-all fallback
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> With the current design, classes that implement ``__array_function__``
> to overload at least one function implicitly declare an intent to
> implement the entire NumPy API. It's not possible to implement *only*
> ``np.concatenate()`` on a type, but fall back to NumPy's default
> behavior of casting with ``np.asarray()`` for all other functions.
>
> This could present a backwards compatibility concern that would
> discourage libraries from adopting ``__array_function__`` in an
> incremental fashion. For example, currently most numpy functions will
> implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
> that assuredly many pandas users rely on. If pandas implemented
> ``__array_function__`` only for ``np.concatenate``, unrelated NumPy
> functions like ``np.nanmean`` would suddenly break on pandas objects by
> raising TypeError.
>
> With ``__array_ufunc__``, it's possible to alleviate this concern by
> casting all arguments to numpy arrays and re-calling the ufunc, but the
> heterogeneous function signatures supported by ``__array_function__``
> make it impossible to implement this generic fallback behavior for
> ``__array_function__``.
>
> We could resolve this issue by change the handling of return values in
> ``__array_function__`` in either of two possible ways: 1. Change the
> meaning of all arguments returning ``NotImplemented`` to indicate that
> all arguments should be coerced to NumPy arrays instead. However, many
> array libraries (e.g., scipy.sparse) really don't want implicit
> conversions to NumPy arrays, and often avoid implementing ``__array__``
> for exactly this reason. Implicit conversions can result in silent bugs
> and performance degradation. 2. Use another sentinel value of some sort
> to indicate that a class implementing part of the higher level array API
> is coercible as a fallback, e.g., a return value of
> ``np.NotImplementedButCoercible`` from ``__array_function__``.
>
> If we take this second approach, we would need to define additional
> rules for how coercible array arguments are coerced, e.g., - Would we
> try for ``__array_function__`` overloads again after coercing coercible
> arguments? - If so, would we coerce coercible arguments one-at-a-time,
> or all-at-once?
>
> These are slightly tricky design questions, so for now we propose to
> defer this issue. We can always implement
> ``np.NotImplementedButCoercible`` at some later time if it proves
> critical to the numpy community in the future. Importantly, we don't
> think this will stop critical libraries that desire to implement most of
> the high level NumPy API from adopting this proposal.
>
> NOTE: If you are reading this NEP in its draft state and disagree,
> please speak up on the mailing list!
>
> Drawbacks of this approach
> --------------------------
>
> Future difficulty extending NumPy's API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> One downside of passing on all arguments directly on to
> ``__array_function__`` is that it makes it hard to extend the signatures
> of overloaded NumPy functions with new arguments, because adding even an
> optional keyword argument would break existing overloads.
>
> This is not a new problem for NumPy. NumPy has occasionally changed the
> signature for functions in the past, including functions like
> ``numpy.sum`` which support overloads.
>
> For adding new keyword arguments that do not change default behavior, we
> would only include these as keyword arguments when they have changed
> from default values. This is similar to `what NumPy already has
> done <https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/
> fromnumeric.py#L1865-L1867>`_,
> e.g., for the optional ``keepdims`` argument in ``sum``:
>
> .. code:: python
>
>     def sum(array, ..., keepdims=np._NoValue):
>         kwargs = {}
>         if keepdims is not np._NoValue:
>             kwargs['keepdims'] = keepdims
>         return array.sum(..., **kwargs)
>
> In other cases, such as deprecated arguments, preserving the existing
> behavior of overloaded functions may not be possible. Libraries that use
> ``__array_function__`` should be aware of this risk: we don't propose to
> freeze NumPy's API in stone any more than it already is.
>
> Difficulty adding implementation specific arguments
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Some array implementations generally follow NumPy's API, but have
> additional optional keyword arguments (e.g., ``dask.array.sum()`` has
> ``split_every`` and ``tensorflow.reduce_sum()`` has ``name``). A generic
> dispatching library could potentially pass on all unrecognized keyword
> argument directly to the implementation, but extending ``np.sum()`` to
> pass on ``**kwargs`` would entail public facing changes in NumPy.
> Customizing the detailed behavior of array libraries will require using
> library specific functions, which could be limiting in the case of
> libraries that consume the NumPy API such as xarray.
>
>
> Discussion
> ----------
>
> Various alternatives to this proposal were discussed in a few Github
> issues:
>
> 1.  `pydata/sparse #1 <https://github.com/pydata/sparse/issues/1>`_
> 2.  `numpy/numpy #11129 <https://github.com/numpy/numpy/issues/11129>`_
>
> Additionally it was the subject of `a blogpost
> <http://matthewrocklin.com/blog/work/2018/05/27/beyond-numpy>`_ Following
> this
> it was discussed at a `NumPy developer sprint
> <https://scisprints.github.io/#may-numpy-developer-sprint>`_ at the `UC
> Berkeley Institute for Data Science (BIDS) <https://bids.berkeley.edu/>`_.
>
>
> References and Footnotes
> ------------------------
>
> .. [1] Each NEP must either be explicitly labeled as placed in the public
> domain (see
>    this NEP as an example) or licensed under the `Open Publication
> License`_.
>
> .. _Open Publication License: http://www.opencontent.org/openpub/
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain. [1]_
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/359c9a72/attachment-0001.html>

From einstein.edison at gmail.com  Sun Jun  3 14:00:32 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 3 Jun 2018 11:00:32 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
Message-ID: <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>

The rules for dispatch with ``__array_function__`` match those for
``__array_ufunc__`` (see
`NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
In particular:

-  NumPy will gather implementations of ``__array_function__`` from all
   specified inputs and call them in order: subclasses before
   superclasses, and otherwise left to right. Note that in some edge cases,
   this differs slightly from the
   `current behavior <https://bugs.python.org/issue30140>`_ of Python.
-  Implementations of ``__array_function__`` indicate that they can
   handle the operation by returning any value other than
   ``NotImplemented``.
-  If all ``__array_function__`` methods return ``NotImplemented``,
   NumPy will raise ``TypeError``.


I?d like to propose two changes to this:

   - ``np.NotImplementedButCoercible`` be a part of the standard from the
   start.
      - If all implementations return this, only then should it be coerced.
         - In the future, it might be good to mark something as coercible
         to coerce it to ``ndarray`` before passing to another object?s
         ``__array_ufunc__``.
      - This is necessary if libraries want to keep old behaviour for some
      functions, while overriding others.
      - Otherwise they have to implement overloads for all functions. This
      seems rather like an all-or-nothing choice, which I?d like to avoid.
      - It isn?t too hard to implement in practice.
   - Objects that don?t implement ``__array_function__`` should be treated
   as having returned ``np.NotImplementedButCoercible``.
      - This has the effect of coercing ``list``, etc.
      - At a minimum, to maintain compatibility, if all objects don?t
      implement ``__array_function__``, the old behaviour should stay.

Also, I?m +1 on Marten?s suggestion that ``ndarray`` itself should
implement ``__array_function__``.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/85524fcc/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  3 14:10:37 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 3 Jun 2018 14:10:37 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
Message-ID: <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>

On Sun, Jun 3, 2018 at 2:00 PM, Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> The rules for dispatch with ``__array_function__`` match those for
> ``__array_ufunc__`` (see
> `NEP-13 <http://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
> In particular:
>
> -  NumPy will gather implementations of ``__array_function__`` from all
>    specified inputs and call them in order: subclasses before
>    superclasses, and otherwise left to right. Note that in some edge cases,
>    this differs slightly from the
>    `current behavior <https://bugs.python.org/issue30140>`_ of Python.
> -  Implementations of ``__array_function__`` indicate that they can
>    handle the operation by returning any value other than
>    ``NotImplemented``.
> -  If all ``__array_function__`` methods return ``NotImplemented``,
>    NumPy will raise ``TypeError``.
>
>
> I?d like to propose two changes to this:
>
>    - ``np.NotImplementedButCoercible`` be a part of the standard from the
>    start.
>       - If all implementations return this, only then should it be
>       coerced.
>          - In the future, it might be good to mark something as coercible
>          to coerce it to ``ndarray`` before passing to another object?s
>          ``__array_ufunc__``.
>       - This is necessary if libraries want to keep old behaviour for
>       some functions, while overriding others.
>       - Otherwise they have to implement overloads for all functions.
>       This seems rather like an all-or-nothing choice, which I?d like to avoid.
>       - It isn?t too hard to implement in practice.
>
> I think the issue is real but I would be slightly worried about adding
multiple possible things to return - there is a benefit to an answer being
either "I cannot do this" or "here's the result". I also am not sure there
is an actual problem: In the scheme as proposed, implementations could just
coerce themselves to array and call the routine again. (Or, in the scheme I
proposed, call the routine again but with `coerce=True`.)


>
>    - Objects that don?t implement ``__array_function__`` should be
>    treated as having returned ``np.NotImplementedButCoercible``.
>       - This has the effect of coercing ``list``, etc.
>       - At a minimum, to maintain compatibility, if all objects don?t
>       implement ``__array_function__``, the old behaviour should stay.
>
> I think that in the proposed scheme this is effectively what happens.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/ac22be3f/attachment.html>

From einstein.edison at gmail.com  Sun Jun  3 14:52:43 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 3 Jun 2018 11:52:43 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
Message-ID: <CADViA5CWFkB1EuZyhSfdPPeJjgSqocKmUOdu+DdGHiCn_qa=VQ@mail.gmail.com>

I also am not sure there is an actual problem: In the scheme as proposed,
implementations could just coerce themselves to array and call the routine
again. (Or, in the scheme I proposed, call the routine again but with
`coerce=True`.)


Ah, I didn?t think of the first solution. `coerce=True` may not produce the
desired solution in cases where some arguments can be coerced and some
can?t.

However, such a design may still have some benefits. For example:

   - ``array1.HANDLED_TYPES = [array1]``
   - ``array2.HANDLED_TYPES = [array1, array2]``
   - ``array1`` is coercible.
   - None of these is a sub/super class of the other or of ``ndarray``
   - When calling ``np.func(array1(), array2())``, ``array1`` would be
   coerced with your solution (because of the left-to-right rule and
   ``array1`` choosing to coerce itself) but not with
   ``np.NotImplementedButCoercible``.

I think that in the proposed scheme this is effectively what happens.


Not really, the current scheme is unclear on what happens if none of the
arguments implement ``__array_function__`` (or at least it doesn?t
explicitly state it that I can see).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/4c2ce32f/attachment-0001.html>

From shoyer at gmail.com  Sun Jun  3 19:00:08 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 16:00:08 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>
Message-ID: <CAEQ_Tvcq=QBMG-SH5BcsQ1FLa0oAHnRgQenHsr54Sg4qtEgSGw@mail.gmail.com>

On Sun, Jun 3, 2018 at 8:19 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> My more general comment is one of speed: for *normal* operation
> performance should be impacted as minimally as possible. I think this is a
> serious issue and feel strongly it *has* to be possible to avoid all
> arguments being checked for the `__array_function__` attribute, i.e., there
> should be an obvious way to ensure no type checking dance is done.
>

I agree that all we should try minimize the impact of dispatching on normal
operations. It would be helpful to identify examples of real workflows, so
we can measure the impact of doing these checks empirically. That said, I
think a small degradation in performance for code that works with small
arrays should be acceptable, because performance is an already an accepted
limitations of using NumPy/Python for these use cases.

In most cases, I suspect that the overhead of a function call and checking
several arguments for "__array_function__" will be negligible, like the
situation for __array_ufunc__. I'm not strongly opposed to either of your
proposed solutions, but I do think it would be a little strange to insist
that we need a solution for __array_function__ when __array_ufunc__ was
fine.


> A. Two "namespaces", one for the undecorated base functions, and one
> completely trivial one for the decorated ones. The idea would be that if
> one knows one is dealing with arrays only, one would do `import
> numpy.array_only as np` (i.e., the reverse of the suggestion currently in
> the NEP, where the decorated ones are in their own namespace - I agree with
> the reasons for discounting that one).
>

I will mention this as a possibility.

I do think there is something to be said for clear separation of overloaded
and non-overloaded APIs. But f I were to choose between adding numpy.api
and numpy.array_only, I would pick numpy.api, because of the virtue of
preserving the existing numpy namespace as it currently exists.


> B. Automatic insertion by the decorator of an `array_only=np._NoValue` (or
> `coerce` and perhaps `subok=...` if not present) in the function signature,
> so that users who know that they have arrays only could pass
> `array_only=True` (name to be decided).
>

Rather than adding another argument to every NumPy function, I would rather
encourage writing np.asarray() explicitly.


> Note that both A and B could also address, at least partially, the problem
> of sometimes wanting to just use the old coercion methods, i.e., not having
> to implement every possible numpy function in one go in a new
> `__array_function__` on one's class.
>

Yes, agreed.


> 1. I'm rather unclear about the use of `types`. It can help me decide what
> to do, but I would still have to find the argument in question (e.g., for
> Quantity, the unit of the relevant argument). I'd recommend passing instead
> a tuple of all arguments that were inspected, in the inspection order;
> after all, it is just a `arg.__class__` away from the type, and in your
> example you'd only have to replace `issubclass` by `isinstance`.
>

The virtue of a `types` argument is that we can deduplicate arguments once,
rather than in each __array_function__ check. This could result in
significantly more efficient code, e.g,. when np.concatenate() is called on
10,000 arrays with only two unique types, we don't need to loop through all
10,000 again objects to check that overloading is valid.

Even for Quantity, I suspect you will want two layers of checks:
1. A check to verify that every argument is a Quantity (or something
coercible to a Quantity). This could use `types` and return
`NotImplemented` when it fails.
2. A check to verify that units match. This will have custom logic for
different operations and will require checking all arguments -- not just
their unique types.

For many Quantity functions, the second check will indeed probably be super
simple (i.e., verifying that all units match). But the first check (with
`types`) really is something that basically very overload should do.


> 2. For subclasses, it would be very handy to have
> `ndarray.__array_function__`, so one can call super after changing
> arguments. (For `__array_ufunc__`, there was lots of question about whether
> this was useful, but it really is!!). [I think you already agreed with
> this, but want to have it in-place, as for subclasses of ndarray this is
> just as useful as it would be for subclasses of dask arrays.)
>

Yes, indeed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/30514f06/attachment.html>

From warren.weckesser at gmail.com  Sun Jun  3 19:09:41 2018
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Sun, 3 Jun 2018 19:09:41 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAGzF1ueLVjs1S3QEOodPkmawtUuD=3ZidrkwXWC5gMW5irACKg@mail.gmail.com>

On Sat, Jun 2, 2018 at 3:04 PM, Robert Kern <robert.kern at gmail.com> wrote:

> As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/
> nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>


Thanks, Robert.   It looks like you are neatly cutting the Gordian Knot of
API versioning in numpy.random!  I don't have any specific comments, except
that it will be great to have *something* other than the status quo, so we
can starting improving the existing numpy.random functions.

Warren


> ==============================
> Random Number Generator Policy
> ==============================
>
> :Author: Robert Kern <robert.kern at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> --------
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --------------
>
> Our current policy, in full:
>
>     A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
>     same parameters will always produce the same results up to roundoff
> error
>     except when the values were incorrect.  Incorrect values will be fixed
> and
>     the NumPy version in which the fix was made will be noted in the
> relevant
>     docstring.  Extension of existing parameter ranges and the addition of
> new
>     parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by ?we? in this section, we really mean Robert Kern,
> let?s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.  We use C ``long`` integers internally for
> integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stake.  And even if we didn?t do that, any nontrivial program does more
> than
> just draw random numbers.  They do computations on those numbers, transform
> those with numerical algorithms from the rest of ``numpy``, which is not
> subject to so strict a policy.  Trying to maintain stream-compatibility
> for our
> random number distributions does not help reproducible research for these
> reasons.
>
> The standard practice now for bit-for-bit reproducible research is to pin
> all
> of the versions of code of your software stack, possibly down to the OS
> itself.
> The landscape for accomplishing this is much easier today than it was in
> 2008.
> We now have ``pip``.  We now have virtual machines.  Those who need to
> reproduce simulations exactly now can (and ought to) do so by using the
> exact
> same version of ``numpy``.  We do not need to maintain stream-compatibility
> across ``numpy`` versions to help them.
>
> Our stream-compatibility guarantee has hindered our ability to make
> improvements to ``numpy.random``.  Several first-time contributors have
> submitted PRs to improve the distributions, usually by implementing a
> faster,
> or more accurate algorithm than the one that is currently there.
> Unfortunately, most of them would have required breaking the stream to do
> so.
> Blocked by our policy, and our inability to work around that policy, many
> of
> those contributors simply walked away.
>
>
> Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.  Deprecation warnings will start silent but
> become
> increasingly noisy over time.  Bugs in the current state of the code will
> *not*
> be fixed if fixing them would impact the stream.  However, if changes in
> the
> rest of ``numpy`` would break something in the ``RandomState`` code, we
> will
> fix ``RandomState`` to continue working (for example, some change in the
> C API).  No new features will be added to ``RandomState``.  Users should
> migrate to the new subsystem as they are able to.
>
> Work on a proposed `new PRNG subsystem
> <https://github.com/bashtage/randomgen>`_ is already underway.  The
> specifics
> of the new design are out of scope for this NEP and up for much
> discussion, but
> we will discuss general policies that will guide the evolution of whatever
> code
> is adopted.
>
> First, we will maintain API source compatibility just as we do with the
> rest of
> ``numpy``.  If we *must* make a breaking change, we will only do so with an
> appropriate deprecation period and warnings.
>
> Second, breaking stream-compatibility in order to introduce new features or
> improve performance will be *allowed* with *caution*.  Such changes will be
> considered features, and as such will be no faster than the standard
> release
> cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
> is
> not a bug.  Correctness bug fixes that break stream-compatibility can
> happen on
> bugfix releases, per usual, but developers should consider if they can wait
> until the next feature release.  We encourage developers to strongly weight
> user?s pain from the break in stream-compatibility against the
> improvements.
> One example of a worthwhile improvement would be to change algorithms for
> a significant increase in performance, for example, moving from the
> `Box-Muller
> transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_
> method
> of Gaussian variate generation to the faster `Ziggurat algorithm
> <https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of an
> unworthy improvement would be tweaking the Ziggurat tables just a little
> bit.
>
> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>     * ``.random_sample()``
>
> Furthermore, the new design should also provide one generator class (we
> shall
> call it ``StableRandom`` for discussion purposes) that provides a slightly
> broader subset of distribution methods for which stream-compatibility is
> *guaranteed*.  The point of ``StableRandom`` is to provide something that
> can
> be used in unit tests so projects that currently have tests which rely on
> the
> precise stream can be migrated off of ``RandomState``.  For the best
> transition, ``StableRandom`` should use as its core uniform PRNG the
> current
> MT19937 algorithm.  As best as possible, the API for the distribution
> methods
> that are provided on ``StableRandom`` should match their counterparts on
> ``RandomState``.  They should provide the same stream that the current
> version
> of ``RandomState`` does.  Because their intended use is for unit tests, we
> do
> not need the performance improvements from the new algorithms that will be
> introduced by the new subsystem.
>
> The list of ``StableRandom`` methods should be chosen to support unit
> tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>
>
> Not Versioning
> --------------
>
> For a long time, we considered that the way to allow algorithmic
> improvements
> while maintaining the stream was to apply some form of versioning.  That
> is,
> every time we make a stream change in one of the distributions, we
> increment
> some version number somewhere.  ``numpy.random`` would keep all past
> versions
> of the code, and there would be a way to get the old versions.  Proposals
> of
> how to do this exactly varied widely, but we will not exhaustively list
> them
> here.  We spent years going back and forth on these designs and were not
> able
> to find one that sufficed.  Let that time lost, and more importantly, the
> contributors that we lost while we dithered, serve as evidence against the
> notion.
>
> Concretely, adding in versioning makes maintenance of ``numpy.random``
> difficult.  Necessarily, we would be keeping lots of versions of the same
> code
> around.  Adding a new algorithm safely would still be quite hard.
>
> But most importantly, versioning is fundamentally difficult to *use*
> correctly.
> We want to make it easy and straightforward to get the latest, fastest,
> best
> versions of the distribution algorithms; otherwise, what's the point?  The
> way
> to make that easy is to make the latest the default.  But the default will
> necessarily change from release to release, so the user?s code would need
> to be
> altered anyway to specify the specific version that one wants to replicate.
>
> Adding in versioning to maintain stream-compatibility would still only
> provide
> the same level of stream-compatibility that we currently do, with all of
> the
> limitations described earlier.  Given that the standard practice for such
> needs
> is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
> alone
> is superfluous.
>
>
> Discussion
> ----------
>
> - https://mail.python.org/pipermail/numpy-discussion/
> 2018-January/077608.html
> - https://github.com/numpy/numpy/pull/10124#issuecomment-350876221
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain.
>
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/bbbdd8f0/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  3 19:23:58 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 3 Jun 2018 19:23:58 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvcq=QBMG-SH5BcsQ1FLa0oAHnRgQenHsr54Sg4qtEgSGw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>
 <CAEQ_Tvcq=QBMG-SH5BcsQ1FLa0oAHnRgQenHsr54Sg4qtEgSGw@mail.gmail.com>
Message-ID: <CAJNV+9uxOwUJzQieoeCwxd+6RnsHzR+ifd2cKgr0OBg=fiGMNw@mail.gmail.com>

>
> In most cases, I suspect that the overhead of a function call and checking
> several arguments for "__array_function__" will be negligible, like the
> situation for __array_ufunc__. I'm not strongly opposed to either of your
> proposed solutions, but I do think it would be a little strange to insist
> that we need a solution for __array_function__ when __array_ufunc__ was
> fine.
>

Ufuncs actually do try to speed-up array checks - but indeed the same can
(and should) be done for `__array_ufunc__`. They also do have `subok`. This
currently ignored but that is mostly because looking for it in `kwargs` is
so damn slow!

Anyway, my main point was that it should be explicitly mentioned as a
constraint that for pure ndarray input, things should be really fast.


>
> A. Two "namespaces", one for the undecorated base functions, and one
>> completely trivial one for the decorated ones. The idea would be that if
>> one knows one is dealing with arrays only, one would do `import
>> numpy.array_only as np` (i.e., the reverse of the suggestion currently in
>> the NEP, where the decorated ones are in their own namespace - I agree with
>> the reasons for discounting that one).
>>
>
> I will mention this as a possibility.
>
> I do think there is something to be said for clear separation of
> overloaded and non-overloaded APIs. But f I were to choose between adding
> numpy.api and numpy.array_only, I would pick numpy.api, because of the
> virtue of preserving the existing numpy namespace as it currently exists.
>

Good point. Overall, the separate namespaces probably is not the way to do.


>
> B. Automatic insertion by the decorator of an `array_only=np._NoValue` (or
>> `coerce` and perhaps `subok=...` if not present) in the function signature,
>> so that users who know that they have arrays only could pass
>> `array_only=True` (name to be decided).
>>
>
> Rather than adding another argument to every NumPy function, I would
> rather encourage writing np.asarray() explicitly.
>

Good point - just as good as long as the check for all-array is very fast
(which it should be - `arg.__class__ is np.ndarray` is fast!).


> Note that both A and B could also address, at least partially, the problem
>> of sometimes wanting to just use the old coercion methods, i.e., not having
>> to implement every possible numpy function in one go in a new
>> `__array_function__` on one's class.
>>
>
> Yes, agreed.
>
>
>> 1. I'm rather unclear about the use of `types`. It can help me decide
>> what to do, but I would still have to find the argument in question (e.g.,
>> for Quantity, the unit of the relevant argument). I'd recommend passing
>> instead a tuple of all arguments that were inspected, in the inspection
>> order; after all, it is just a `arg.__class__` away from the type, and in
>> your example you'd only have to replace `issubclass` by `isinstance`.
>>
>
> The virtue of a `types` argument is that we can deduplicate arguments
> once, rather than in each __array_function__ check. This could result in
> significantly more efficient code, e.g,. when np.concatenate() is called on
> 10,000 arrays with only two unique types, we don't need to loop through all
> 10,000 again objects to check that overloading is valid.
>

I think one might still want to know *where* the type occurs (e.g., as an
output or index would have different implications). Possibly, a solution
would rely on the same structure as used for the "dance". But as a general
point, I don't see the advantage of passing types rather than arguments -
less information for no benefit.


> Even for Quantity, I suspect you will want two layers of checks:
> 1. A check to verify that every argument is a Quantity (or something
> coercible to a Quantity). This could use `types` and return
> `NotImplemented` when it fails.
> 2. A check to verify that units match. This will have custom logic for
> different operations and will require checking all arguments -- not just
> their unique types.
>

Not sure. With, Quantity I generally do not worry about other types, but
rather look at units attributes, assume anything without is dimensionless,
cast Quantity to array with the right unit, and then defer to `ndarray`.


> For many Quantity functions, the second check will indeed probably be
> super simple (i.e., verifying that all units match). But the first check
> (with `types`) really is something that basically very overload should do.
>
>
>> 2. For subclasses, it would be very handy to have
>> `ndarray.__array_function__`, so one can call super after changing
>> arguments. (For `__array_ufunc__`, there was lots of question about whether
>> this was useful, but it really is!!). [I think you already agreed with
>> this, but want to have it in-place, as for subclasses of ndarray this is
>> just as useful as it would be for subclasses of dask arrays.)
>>
>
> Yes, indeed.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/55f6e218/attachment.html>

From shoyer at gmail.com  Sun Jun  3 19:28:37 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 16:28:37 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
Message-ID: <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>

On Sun, Jun 3, 2018 at 11:12 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 2:00 PM, Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>>
>>    - Objects that don?t implement ``__array_function__`` should be
>>    treated as having returned ``np.NotImplementedButCoercible``.
>>       - This has the effect of coercing ``list``, etc.
>>       - At a minimum, to maintain compatibility, if all objects don?t
>>       implement ``__array_function__``, the old behaviour should stay.
>>
>> I think that in the proposed scheme this is effectively what happens.
>

The current proposal is to copy the behavior of __array_ufunc__. So the
non-existence of an __array_function__ attribute is indeed *not* equivalent
to returning NotImplemented: if no arguments implement __array_function__,
then yes they will all be coerced to NumPy arrays.

I do think there is elegance in defining a return value of
np.NotImplementedButCoercible as equivalent to the existence of
__array_function__. This resolves my design question about how coercible
arguments would be coerced with NotImplementedButCoercible: we would fall
back to the current behavior, which in most cases means all arguments are
coerced to NumPy arrays directly. Mixed return values of
NotImplementedButCoercible and NotImplemented would still result in
TypeError, and there would be no second chances for overloads.

This is simple enough that I am inclined to update the NEP to incorporate
the suggestion (thank you!).

My main question is whether we should also update __array_ufunc__ to
support returning NotImplementedButCoercible for consistency. My
inclination is yes: even though it's easy to implement a fallback of
converting all arguments to NumPy arrays for ufuncs, it is hard to do this
correctly from an __array_ufunc__ implementation, because __array_ufunc__
implementations do not know in what order they have been called.

The counter-argument would be that it's not worth adding new features to
__array_ufunc__ if use-cases haven't come up yet. But my guess is that most
users/implementors of __array_ufunc__ are ignorant of these finer details,
and not really worrying about them. Also, the list of binary operators in
Python is short enough that most implementations are OK with supporting
either all or none.

Actually, a return value of NotImplementedButCoercible would probably be
the right answer for some cases in xarray's current __array_ufunc__ method,
when we encounter ufunc methods for which we haven't written an
implementation (e.g., "outer" or "at").
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/106b6fb5/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Sun Jun  3 19:33:32 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 3 Jun 2018 16:33:32 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>

You make a bunch of good points refuting reproducible research as an
argument for not changing the random number streams.

However, there?s a second use-case you don?t address - unit tests. For
better or worse, downstream, or even our own
<https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
unit tests use a seeded random number generator as a shorthand to produce
some arbirary array, and then hard-code the expected output in their tests.
Breaking stream compatibility will break these tests.

I don?t think writing tests in this way is particularly good idea, but
unfortunately they do still exist.

It would be good to address this use case in the NEP, even if the
conclusion is just ?changing the stream will break tests of this form?

Eric

On Sat, 2 Jun 2018 at 12:05 Robert Kern robert.kern at gmail.com
<http://mailto:robert.kern at gmail.com> wrote:

As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
>
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>
> ==============================
> Random Number Generator Policy
> ==============================
>
> :Author: Robert Kern <robert.kern at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> --------
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --------------
>
> Our current policy, in full:
>
>     A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
>     same parameters will always produce the same results up to roundoff
> error
>     except when the values were incorrect.  Incorrect values will be fixed
> and
>     the NumPy version in which the fix was made will be noted in the
> relevant
>     docstring.  Extension of existing parameter ranges and the addition of
> new
>     parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by ?we? in this section, we really mean Robert Kern,
> let?s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.  We use C ``long`` integers internally for
> integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stake.  And even if we didn?t do that, any nontrivial program does more
> than
> just draw random numbers.  They do computations on those numbers, transform
> those with numerical algorithms from the rest of ``numpy``, which is not
> subject to so strict a policy.  Trying to maintain stream-compatibility
> for our
> random number distributions does not help reproducible research for these
> reasons.
>
> The standard practice now for bit-for-bit reproducible research is to pin
> all
> of the versions of code of your software stack, possibly down to the OS
> itself.
> The landscape for accomplishing this is much easier today than it was in
> 2008.
> We now have ``pip``.  We now have virtual machines.  Those who need to
> reproduce simulations exactly now can (and ought to) do so by using the
> exact
> same version of ``numpy``.  We do not need to maintain stream-compatibility
> across ``numpy`` versions to help them.
>
> Our stream-compatibility guarantee has hindered our ability to make
> improvements to ``numpy.random``.  Several first-time contributors have
> submitted PRs to improve the distributions, usually by implementing a
> faster,
> or more accurate algorithm than the one that is currently there.
> Unfortunately, most of them would have required breaking the stream to do
> so.
> Blocked by our policy, and our inability to work around that policy, many
> of
> those contributors simply walked away.
>
>
> Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.  Deprecation warnings will start silent but
> become
> increasingly noisy over time.  Bugs in the current state of the code will
> *not*
> be fixed if fixing them would impact the stream.  However, if changes in
> the
> rest of ``numpy`` would break something in the ``RandomState`` code, we
> will
> fix ``RandomState`` to continue working (for example, some change in the
> C API).  No new features will be added to ``RandomState``.  Users should
> migrate to the new subsystem as they are able to.
>
> Work on a proposed `new PRNG subsystem
> <https://github.com/bashtage/randomgen>`_ is already underway.  The
> specifics
> of the new design are out of scope for this NEP and up for much
> discussion, but
> we will discuss general policies that will guide the evolution of whatever
> code
> is adopted.
>
> First, we will maintain API source compatibility just as we do with the
> rest of
> ``numpy``.  If we *must* make a breaking change, we will only do so with an
> appropriate deprecation period and warnings.
>
> Second, breaking stream-compatibility in order to introduce new features or
> improve performance will be *allowed* with *caution*.  Such changes will be
> considered features, and as such will be no faster than the standard
> release
> cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
> is
> not a bug.  Correctness bug fixes that break stream-compatibility can
> happen on
> bugfix releases, per usual, but developers should consider if they can wait
> until the next feature release.  We encourage developers to strongly weight
> user?s pain from the break in stream-compatibility against the
> improvements.
> One example of a worthwhile improvement would be to change algorithms for
> a significant increase in performance, for example, moving from the
> `Box-Muller
> transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_
> method
> of Gaussian variate generation to the faster `Ziggurat algorithm
> <https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of an
> unworthy improvement would be tweaking the Ziggurat tables just a little
> bit.
>
> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>     * ``.random_sample()``
>
> Furthermore, the new design should also provide one generator class (we
> shall
> call it ``StableRandom`` for discussion purposes) that provides a slightly
> broader subset of distribution methods for which stream-compatibility is
> *guaranteed*.  The point of ``StableRandom`` is to provide something that
> can
> be used in unit tests so projects that currently have tests which rely on
> the
> precise stream can be migrated off of ``RandomState``.  For the best
> transition, ``StableRandom`` should use as its core uniform PRNG the
> current
> MT19937 algorithm.  As best as possible, the API for the distribution
> methods
> that are provided on ``StableRandom`` should match their counterparts on
> ``RandomState``.  They should provide the same stream that the current
> version
> of ``RandomState`` does.  Because their intended use is for unit tests, we
> do
> not need the performance improvements from the new algorithms that will be
> introduced by the new subsystem.
>
> The list of ``StableRandom`` methods should be chosen to support unit
> tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>
>
> Not Versioning
> --------------
>
> For a long time, we considered that the way to allow algorithmic
> improvements
> while maintaining the stream was to apply some form of versioning.  That
> is,
> every time we make a stream change in one of the distributions, we
> increment
> some version number somewhere.  ``numpy.random`` would keep all past
> versions
> of the code, and there would be a way to get the old versions.  Proposals
> of
> how to do this exactly varied widely, but we will not exhaustively list
> them
> here.  We spent years going back and forth on these designs and were not
> able
> to find one that sufficed.  Let that time lost, and more importantly, the
> contributors that we lost while we dithered, serve as evidence against the
> notion.
>
> Concretely, adding in versioning makes maintenance of ``numpy.random``
> difficult.  Necessarily, we would be keeping lots of versions of the same
> code
> around.  Adding a new algorithm safely would still be quite hard.
>
> But most importantly, versioning is fundamentally difficult to *use*
> correctly.
> We want to make it easy and straightforward to get the latest, fastest,
> best
> versions of the distribution algorithms; otherwise, what's the point?  The
> way
> to make that easy is to make the latest the default.  But the default will
> necessarily change from release to release, so the user?s code would need
> to be
> altered anyway to specify the specific version that one wants to replicate.
>
> Adding in versioning to maintain stream-compatibility would still only
> provide
> the same level of stream-compatibility that we currently do, with all of
> the
> limitations described earlier.  Given that the standard practice for such
> needs
> is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
> alone
> is superfluous.
>
>
> Discussion
> ----------
>
> -
> https://mail.python.org/pipermail/numpy-discussion/2018-January/077608.html
> - https://github.com/numpy/numpy/pull/10124#issuecomment-350876221
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain.
>
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/589d1623/attachment-0001.html>

From robert.kern at gmail.com  Sun Jun  3 19:36:00 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 16:36:00 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
Message-ID: <CAF6FJiv31QrnXqRZzDWXc73SKssMwyGwaUh8x3g8--+k9gXB6A@mail.gmail.com>

On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> You make a bunch of good points refuting reproducible research as an
> argument for not changing the random number streams.
>
> However, there?s a second use-case you don?t address - unit tests. For
> better or worse, downstream, or even our own
> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
> unit tests use a seeded random number generator as a shorthand to produce
> some arbirary array, and then hard-code the expected output in their tests.
> Breaking stream compatibility will break these tests.
>
> I don?t think writing tests in this way is particularly good idea, but
> unfortunately they do still exist.
>
> It would be good to address this use case in the NEP, even if the
> conclusion is just ?changing the stream will break tests of this form?
>

I do! Search for "unit test" or "StableRandom". :-)

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/94af514a/attachment.html>

From shoyer at gmail.com  Sun Jun  3 19:45:54 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 16:45:54 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9uxOwUJzQieoeCwxd+6RnsHzR+ifd2cKgr0OBg=fiGMNw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>
 <CAEQ_Tvcq=QBMG-SH5BcsQ1FLa0oAHnRgQenHsr54Sg4qtEgSGw@mail.gmail.com>
 <CAJNV+9uxOwUJzQieoeCwxd+6RnsHzR+ifd2cKgr0OBg=fiGMNw@mail.gmail.com>
Message-ID: <CAEQ_Tvc_3LSoRqKVJBShQipz1AAbTSinKqMd++Mg4t1LgixBcw@mail.gmail.com>

On Sun, Jun 3, 2018 at 4:25 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> I think one might still want to know *where* the type occurs (e.g., as an
> output or index would have different implications).
>

This in certainly true in general, but given the complete flexibility of
__array_function__ there's no way we can make every check convenient. The
best we can do is make it easy to handle the common cases, where the
argument position does not matter.


> Possibly, a solution would rely on the same structure as used for the
> "dance". But as a general point, I don't see the advantage of passing types
> rather than arguments - less information for no benefit.
>

Maybe this is premature optimization, but there will certainly be fewer
unique types than arguments to check for types. I suspect this may make for
a noticeable difference in performance in use cases involving a large
number of argument.

For example, suppose np.concatenate() is called on a list of 10,000 dask
arrays. Now dask.array.Array.__array_function__ needs to check all
arguments to decide whether it can use dask.array.concatenate() or needs to
return NotImplemented. By using the `types` argument, it only needs to do
isinstance() checks on the single argument in `types`, rather than all
10,000 overloaded function arguments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/25653d72/attachment.html>

From shoyer at gmail.com  Sun Jun  3 20:18:38 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 17:18:38 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAEQ_Tvdhf9gd9zkj5jwE2z6XFJXSw3vjXZ_xoTxNPZSVpSbiPg@mail.gmail.com>

On Sat, Jun 2, 2018 at 12:06 PM Robert Kern <robert.kern at gmail.com> wrote:

> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.  Deprecation warnings will start silent but
> become
> increasingly noisy over time.  Bugs in the current state of the code will
> *not*
> be fixed if fixing them would impact the stream.  However, if changes in
> the
> rest of ``numpy`` would break something in the ``RandomState`` code, we
> will
> fix ``RandomState`` to continue working (for example, some change in the
> C API).  No new features will be added to ``RandomState``.  Users should
> migrate to the new subsystem as they are able to.
>

Robert, thanks for this proposal. I think it makes a lot of sense and will
help maintain the long-term viability of numpy.random.

The main clarification I would like to see addressed is what "freezing
RandomState" means for top level functions in numpy.random. I think we
could safely swap out the underlying implementation if numpy.random.seed()
is not explicitly called, but how would we handle cases where a seed is
explicitly set?

You and I both agree that this is an anti-pattern for numpy.random, but
certainly there is plenty of code that relies on the stability of random
numbers when seeds are set by np.random.seed(). Similar to the case for
RandomState, we would presumably need to start issuing warnings when seed()
is explicitly called, which begs the question of what (if anything) we
propose to replace seed() with. I suppose this will be your next NEP :).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/d29c2e73/attachment.html>

From robert.kern at gmail.com  Sun Jun  3 20:21:56 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 17:21:56 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>

Moving some of the Github PR comments here:

Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.
>

https://github.com/numpy/numpy/pull/11229#discussion_r192604195
@bashtage writes:
> RandomState could pretty easily be spun out into a stand-alone package,
if useful. It is effectively a stand-alone submodule already.

Indeed. That would be a graceful forever-home for the code for anyone who
needs it. However, I'd still only make that switch after at least a few
years of deprecation inside numpy. And maybe a 2.0.0 release.


> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>
    * ``.random_sample()``
>

BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`,
and I am referring to its semantics here.
https://github.com/bashtage/randomgen/blob/master/randomgen/generator.pyx#L191

https://github.com/numpy/numpy/pull/11229#discussion_r192604275
@bashtage writes:
> One of these (bytes, uintegers) seems redundant. uintegers should
probably by 64 bit.

Because different core generators have different "native" outputs (MT19937,
PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I hope we
never implement natively output doubles), there are some simple, but
non-trivial choices to make to support each of these. I would like the core
generator's author to make those choices and maintain them. They're not
hard, but they are the kind of thing that ought to be decided once and
consistently.

I am of the opinion that `uintegers` should support at least `uint32` and
`uint64` as those are the most common native outputs among core generators.
There should be a maintained way to get that native format (and yes, I'd
rather have the user be explicit about it than have `random_native_uint()`
in addition to `random_uint64()`).

This argument extends to `.bytes()`, too, now that I think about it. A
stream of bytes is a native format for some generators, too, like if we
decide to hook up /dev/urandom or other file-backed interface.

Hmm, what do you think about adding `random_interval()` to this list? And
raising that up to the Python API level (a la what Python 3 did with
exposing `secrets.randbelow()` as a primitive)?
https://github.com/bashtage/randomgen/blob/master/randomgen/src/distributions/distributions.c#L1164-L1200

Many, many uses of this method would be with numbers much less than 1<<32
(e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean
using half as many core PRNG draws if `random_interval()` is implemented
along with the core PRNG to make use of that fact.

The list of ``StableRandom`` methods should be chosen to support unit tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>

https://github.com/numpy/numpy/pull/11229#discussion_r192604311
@bashtage writes:
> standard_gamma and standard_exponential are important enough to be
included here IMO.

"Importance" was not my criterion, only whether they are used in unit test
suites. This list was just off the top of my head for methods that I think
were actually used in test suites, so I'd be happy to be shown live tests
that use other methods. I'd like to be a *little* conservative about what
methods we stick in here, but we don't have to be *too* conservative, since
we are explicitly never going to be modifying these.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/cd3ee699/attachment-0001.html>

From robert.kern at gmail.com  Sun Jun  3 20:36:58 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 17:36:58 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
Message-ID: <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>

On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> You make a bunch of good points refuting reproducible research as an
> argument for not changing the random number streams.
>
> However, there?s a second use-case you don?t address - unit tests. For
> better or worse, downstream, or even our own
> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
> unit tests use a seeded random number generator as a shorthand to produce
> some arbirary array, and then hard-code the expected output in their tests.
> Breaking stream compatibility will break these tests.
>
By the way, the reason that I didn't mention this use case as a motivation
in the Status Quo section because, as I reviewed my mail archive, this
wasn't actually a motivating use case for the policy. It's certainly a use
case that developed once we did make these (*cough*extravagant*cough*)
guarantees, though, as people started to rely on it, and I hope that my
StableRandom proposal addresses it to your satisfaction. I could add some
more details about that history if you think it would be useful.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/83302e03/attachment.html>

From robert.kern at gmail.com  Sun Jun  3 20:37:45 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 17:37:45 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAEQ_Tvdhf9gd9zkj5jwE2z6XFJXSw3vjXZ_xoTxNPZSVpSbiPg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAEQ_Tvdhf9gd9zkj5jwE2z6XFJXSw3vjXZ_xoTxNPZSVpSbiPg@mail.gmail.com>
Message-ID: <CAF6FJitafWqU0=D5qEAwSQ4eftqOLdYqrJaAZOL9DP2+3byssg@mail.gmail.com>

On Sun, Jun 3, 2018 at 5:23 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Sat, Jun 2, 2018 at 12:06 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> We propose first freezing ``RandomState`` as it is and developing a new
>> RNG
>> subsystem alongside it.  This allows anyone who has been relying on our
>> old
>> stream-compatibility guarantee to have plenty of time to migrate.
>> ``RandomState`` will be considered deprecated, but with a long deprecation
>> cycle, at least a few years.  Deprecation warnings will start silent but
>> become
>> increasingly noisy over time.  Bugs in the current state of the code will
>> *not*
>> be fixed if fixing them would impact the stream.  However, if changes in
>> the
>> rest of ``numpy`` would break something in the ``RandomState`` code, we
>> will
>> fix ``RandomState`` to continue working (for example, some change in the
>> C API).  No new features will be added to ``RandomState``.  Users should
>> migrate to the new subsystem as they are able to.
>>
>
> Robert, thanks for this proposal. I think it makes a lot of sense and will
> help maintain the long-term viability of numpy.random.
>
> The main clarification I would like to see addressed is what "freezing
> RandomState" means for top level functions in numpy.random. I think we
> could safely swap out the underlying implementation if numpy.random.seed()
> is not explicitly called, but how would we handle cases where a seed is
> explicitly set?
>
> You and I both agree that this is an anti-pattern for numpy.random, but
> certainly there is plenty of code that relies on the stability of random
> numbers when seeds are set by np.random.seed(). Similar to the case for
> RandomState, we would presumably need to start issuing warnings when seed()
> is explicitly called, which begs the question of what (if anything) we
> propose to replace seed() with.
>

Well, *I* propose `AttributeError`, myself?


> I suppose this will be your next NEP :).
>

I deliberately left it out of this one as it may, depending on our choices,
impinge upon the design of the new PRNG subsystem, which I declared out of
scope for this NEP. I have ideas (besides the glib "Let them eat
AttributeErrors!"), and now that I think more about it, that does seem like
it might be in scope just like the discussion of freezing RandomState and
StableRandom are. But I think I'd like to hold that thought a little bit
and get a little more screaming^Wfeedback on the core proposal first. I'll
return to this in a few days if not sooner.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/07ffc5fb/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  3 20:35:55 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 3 Jun 2018 20:35:55 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
Message-ID: <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>

Although I'm still not 100% convinced by NotImplementedButCoercible, I do
like the idea that this is the default for items that do not implement
`__array_function__`. And it might help avoid trying to find oneself in a
possibly long list.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/90e61d85/attachment.html>

From josef.pktd at gmail.com  Sun Jun  3 20:45:31 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 3 Jun 2018 20:45:31 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
Message-ID: <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>

On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com> wrote:

> Moving some of the Github PR comments here:
>
> Implementation
>> --------------
>>
>> We propose first freezing ``RandomState`` as it is and developing a new
>> RNG
>> subsystem alongside it.  This allows anyone who has been relying on our
>> old
>> stream-compatibility guarantee to have plenty of time to migrate.
>> ``RandomState`` will be considered deprecated, but with a long deprecation
>> cycle, at least a few years.
>>
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604195
> @bashtage writes:
> > RandomState could pretty easily be spun out into a stand-alone package,
> if useful. It is effectively a stand-alone submodule already.
>
> Indeed. That would be a graceful forever-home for the code for anyone who
> needs it. However, I'd still only make that switch after at least a few
> years of deprecation inside numpy. And maybe a 2.0.0 release.
>
>
>> Any new design for the RNG subsystem will provide a choice of different
>> core
>> uniform PRNG algorithms.  We will be more strict about a select subset of
>> methods on these core PRNG objects.  They MUST guarantee
>> stream-compatibility
>> for a minimal, specified set of methods which are chosen to make it
>> easier to
>> compose them to build other distributions.  Namely,
>>
>>     * ``.bytes()``
>>     * ``.random_uintegers()``
>>
>     * ``.random_sample()``
>>
>
> BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`,
> and I am referring to its semantics here.
> https://github.com/bashtage/randomgen/blob/master/
> randomgen/generator.pyx#L191
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604275
> @bashtage writes:
> > One of these (bytes, uintegers) seems redundant. uintegers should
> probably by 64 bit.
>
> Because different core generators have different "native" outputs
> (MT19937, PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I
> hope we never implement natively output doubles), there are some simple,
> but non-trivial choices to make to support each of these. I would like the
> core generator's author to make those choices and maintain them. They're
> not hard, but they are the kind of thing that ought to be decided once and
> consistently.
>
> I am of the opinion that `uintegers` should support at least `uint32` and
> `uint64` as those are the most common native outputs among core generators.
> There should be a maintained way to get that native format (and yes, I'd
> rather have the user be explicit about it than have `random_native_uint()`
> in addition to `random_uint64()`).
>
> This argument extends to `.bytes()`, too, now that I think about it. A
> stream of bytes is a native format for some generators, too, like if we
> decide to hook up /dev/urandom or other file-backed interface.
>
> Hmm, what do you think about adding `random_interval()` to this list? And
> raising that up to the Python API level (a la what Python 3 did with
> exposing `secrets.randbelow()` as a primitive)?
> https://github.com/bashtage/randomgen/blob/master/
> randomgen/src/distributions/distributions.c#L1164-L1200
>
> Many, many uses of this method would be with numbers much less than 1<<32
> (e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean
> using half as many core PRNG draws if `random_interval()` is implemented
> along with the core PRNG to make use of that fact.
>
> The list of ``StableRandom`` methods should be chosen to support unit
>> tests:
>>
>>     * ``.randint()``
>>     * ``.uniform()``
>>     * ``.normal()``
>>     * ``.standard_normal()``
>>     * ``.choice()``
>>     * ``.shuffle()``
>>     * ``.permutation()``
>>
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
> @bashtage writes:
> > standard_gamma and standard_exponential are important enough to be
> included here IMO.
>
> "Importance" was not my criterion, only whether they are used in unit test
> suites. This list was just off the top of my head for methods that I think
> were actually used in test suites, so I'd be happy to be shown live tests
> that use other methods. I'd like to be a *little* conservative about what
> methods we stick in here, but we don't have to be *too* conservative, since
> we are explicitly never going to be modifying these.
>

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some
distributions.

However, according to the Wikipedia description Poisson doesn't look easy.
I just wrote a unit test for statsmodels using Poisson random numbers with
hard coded numbers for the regression tests.
I'm not sure which other distributions are common enough and not easily
reproducible by transformation. E.g. negative binomial can be reproduces by
a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

Would it be difficult to keep this list large, given that it should be
frozen, low maintenance code ?


Josef


>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/2a7437f3/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  3 20:45:56 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 3 Jun 2018 20:45:56 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvc_3LSoRqKVJBShQipz1AAbTSinKqMd++Mg4t1LgixBcw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAJNV+9vST0zdZtokH8uAg0cZZssOSwDGvimEnZCDa3rvJNcBhg@mail.gmail.com>
 <CAEQ_Tvcq=QBMG-SH5BcsQ1FLa0oAHnRgQenHsr54Sg4qtEgSGw@mail.gmail.com>
 <CAJNV+9uxOwUJzQieoeCwxd+6RnsHzR+ifd2cKgr0OBg=fiGMNw@mail.gmail.com>
 <CAEQ_Tvc_3LSoRqKVJBShQipz1AAbTSinKqMd++Mg4t1LgixBcw@mail.gmail.com>
Message-ID: <CAJNV+9sfSiuTj5Ez2O+ty0it+bKR4gUMroZ6B4x6ej395U6b5A@mail.gmail.com>

This in certainly true in general, but given the complete flexibility of
__array_function__ there's no way we can make every check convenient. The
best we can do is make it easy to handle the common cases, where the
argument position does not matter.

I think those cases may not be as common as you think - most functions are
not like `concatenate` & co... Indeed, it might be good to add some other
examples to the NEP. Looing at the list of functions which do not work with
Quantity currently: Maybe `np.dot`, `np.choose`, and `np.vectorize`?


> Possibly, a solution would rely on the same structure as used for the
> "dance". But as a general point, I don't see the advantage of passing types
> rather than arguments - less information for no benefit.
>

> Maybe this is premature optimization, but there will certainly be fewer
unique types than arguments to check for types. I suspect this may make for
a noticeable difference in performance in use cases involving a large
number of argument.

One also needs to worry about the cost of contructing `types`, though I
guess this could be minimal if it is a `set`. Or should it be the keys of a
`dict`, with the value something meaningful that has to be calculated
anyway (like a list of sequence numbers); this may all depend a bit on the
implementation of "dance" - the information it gathers might as well get
passed on.

> For example, suppose np.concatenate() is called on a list of 10,000 dask
arrays. Now dask.array.Array.__array_function__ needs to check all
arguments to decide whether it can use dask.array.concatenate() or needs to
return NotImplemented. By using the `types` argument, it only needs to do
isinstance() checks on the single argument in `types`, rather than all
10,000 overloaded function arguments

It is probably a good idea to add some of these considerations to the NEP.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/a902d5b4/attachment.html>

From josef.pktd at gmail.com  Sun Jun  3 20:57:12 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 3 Jun 2018 20:57:12 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
 <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>
Message-ID: <CAMMTP+ChqO-5dDJLcK=reqiW-R5B=B15=3rDo1OWybid6C9M0Q@mail.gmail.com>

On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> You make a bunch of good points refuting reproducible research as an
>> argument for not changing the random number streams.
>>
>> However, there?s a second use-case you don?t address - unit tests. For
>> better or worse, downstream, or even our own
>> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
>> unit tests use a seeded random number generator as a shorthand to produce
>> some arbirary array, and then hard-code the expected output in their tests.
>> Breaking stream compatibility will break these tests.
>>
> By the way, the reason that I didn't mention this use case as a motivation
> in the Status Quo section because, as I reviewed my mail archive, this
> wasn't actually a motivating use case for the policy. It's certainly a use
> case that developed once we did make these (*cough*extravagant*cough*)
> guarantees, though, as people started to rely on it, and I hope that my
> StableRandom proposal addresses it to your satisfaction. I could add some
> more details about that history if you think it would be useful.
>

I don't think that's accurate.
The unit tests for stable random numbers were added when Enthought silently
changed the normal random numbers and we got messages from users that the
unit tests fail and they cannot reproduce our results.

6/12/10
[SciPy-Dev] seeded randn gets different values on osx

(I don't find an online copy, this is from my own mail archive)

AFAIR

Josef


>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/31671c25/attachment.html>

From robert.kern at gmail.com  Sun Jun  3 21:04:55 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 18:04:55 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+ChqO-5dDJLcK=reqiW-R5B=B15=3rDo1OWybid6C9M0Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
 <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>
 <CAMMTP+ChqO-5dDJLcK=reqiW-R5B=B15=3rDo1OWybid6C9M0Q@mail.gmail.com>
Message-ID: <CAF6FJit72r6d3cbmBC8n-LhrL-ybDB1Tok_kxOCSm56YZ-c3CQ@mail.gmail.com>

On Sun, Jun 3, 2018 at 6:01 PM <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
>> wrote:
>>
>>> You make a bunch of good points refuting reproducible research as an
>>> argument for not changing the random number streams.
>>>
>>> However, there?s a second use-case you don?t address - unit tests. For
>>> better or worse, downstream, or even our own
>>> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
>>> unit tests use a seeded random number generator as a shorthand to produce
>>> some arbirary array, and then hard-code the expected output in their tests.
>>> Breaking stream compatibility will break these tests.
>>>
>> By the way, the reason that I didn't mention this use case as a
>> motivation in the Status Quo section because, as I reviewed my mail
>> archive, this wasn't actually a motivating use case for the policy. It's
>> certainly a use case that developed once we did make these
>> (*cough*extravagant*cough*) guarantees, though, as people started to rely
>> on it, and I hope that my StableRandom proposal addresses it to your
>> satisfaction. I could add some more details about that history if you
>> think it would be useful.
>>
>
> I don't think that's accurate.
> The unit tests for stable random numbers were added when Enthought
> silently changed the normal random numbers and we got messages from users
> that the unit tests fail and they cannot reproduce our results.
>
> 6/12/10
> [SciPy-Dev] seeded randn gets different values on osx
>
> (I don't find an online copy, this is from my own mail archive)
>

The policy was in place Nov 2008.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/eaa200ff/attachment-0001.html>

From robert.kern at gmail.com  Sun Jun  3 21:08:38 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 18:08:38 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
Message-ID: <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>

On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>>
>> The list of ``StableRandom`` methods should be chosen to support unit
>>> tests:
>>>
>>>     * ``.randint()``
>>>     * ``.uniform()``
>>>     * ``.normal()``
>>>     * ``.standard_normal()``
>>>     * ``.choice()``
>>>     * ``.shuffle()``
>>>     * ``.permutation()``
>>>
>>
>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>> @bashtage writes:
>> > standard_gamma and standard_exponential are important enough to be
>> included here IMO.
>>
>> "Importance" was not my criterion, only whether they are used in unit
>> test suites. This list was just off the top of my head for methods that I
>> think were actually used in test suites, so I'd be happy to be shown live
>> tests that use other methods. I'd like to be a *little* conservative about
>> what methods we stick in here, but we don't have to be *too* conservative,
>> since we are explicitly never going to be modifying these.
>>
>
> That's one area where I thought the selection is too narrow.
> We should be able to get a stable stream from the uniform for some
> distributions.
>
> However, according to the Wikipedia description Poisson doesn't look easy.
> I just wrote a unit test for statsmodels using Poisson random numbers with
> hard coded numbers for the regression tests.
>

I'd really rather people do this than use StableRandom; this is best
practice, as I see it, if your tests involve making precise comparisons to
expected results.

StableRandom is intended as a crutch so that the pain of moving existing
unit tests away from the deprecated RandomState is less onerous. I'd really
rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions
(aside from shuffle/permutation/choice) as these are the ones that have the
platform-dependency issues with respect to 32/64-bit `long` integers.
They'd be unreliable for unit tests even if we kept them stable over time.


> I'm not sure which other distributions are common enough and not easily
> reproducible by transformation. E.g. negative binomial can be reproduces by
> a gamma-poisson mixture.
>
> On the other hand normal can be easily recreated from standard_normal.
>

I was mostly motivated by making it a bit easier to mechanically replace
uses of randn(), which is probably even more common than normal() and
standard_normal() in unit tests.


> Would it be difficult to keep this list large, given that it should be
> frozen, low maintenance code ?
>

I admit that I had in mind non-statistical unit tests. That is, tests that
didn't depend on the precise distribution of the inputs.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/ab900a60/attachment.html>

From josef.pktd at gmail.com  Sun Jun  3 21:11:30 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 3 Jun 2018 21:11:30 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAMMTP+ByCxHPKRTGXWvY-bf+8PpcdJJN=qur2V+ivsg=Y4Z-eQ@mail.gmail.com>

On Sat, Jun 2, 2018 at 3:04 PM, Robert Kern <robert.kern at gmail.com> wrote:

> As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/
> nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>
> ==============================
> Random Number Generator Policy
> ==============================
>
> :Author: Robert Kern <robert.kern at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> --------
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --------------
>
> Our current policy, in full:
>
>     A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
>     same parameters will always produce the same results up to roundoff
> error
>     except when the values were incorrect.  Incorrect values will be fixed
> and
>     the NumPy version in which the fix was made will be noted in the
> relevant
>     docstring.  Extension of existing parameter ranges and the addition of
> new
>     parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by ?we? in this section, we really mean Robert Kern,
> let?s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.
>

AFAIK, I have never seen this. Except for some corner cases (like singular
transformation)
the "noise" from different linalg packages is in the range of floating
point noise which is not relevant if we unit test, for example, pvalues at
rtol=1e-10.

Based on the unit test that don't fail, "may well return completely
different results" seems exaggerated.

(There can be huge jumps in results from linalg operations like svd around
the near singular/singular threshold, i.e. when floating point noise is in
the range of the rcond threshold, but that's independent of np.random and
can happen in many cases when we want to have reproducible numerical noise
which is not possible, but doesn't affect stability of results in well
defined cases.)

Josef


> We use C ``long`` integers internally for integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stake.  And even if we didn?t do that, any nontrivial program does more
> than
> just draw random numbers.  They do computations on those numbers, transform
> those with numerical algorithms from the rest of ``numpy``, which is not
> subject to so strict a policy.  Trying to maintain stream-compatibility
> for our
> random number distributions does not help reproducible research for these
> reasons.
>
> The standard practice now for bit-for-bit reproducible research is to pin
> all
> of the versions of code of your software stack, possibly down to the OS
> itself.
> The landscape for accomplishing this is much easier today than it was in
> 2008.
> We now have ``pip``.  We now have virtual machines.  Those who need to
> reproduce simulations exactly now can (and ought to) do so by using the
> exact
> same version of ``numpy``.  We do not need to maintain stream-compatibility
> across ``numpy`` versions to help them.
>
> Our stream-compatibility guarantee has hindered our ability to make
> improvements to ``numpy.random``.  Several first-time contributors have
> submitted PRs to improve the distributions, usually by implementing a
> faster,
> or more accurate algorithm than the one that is currently there.
> Unfortunately, most of them would have required breaking the stream to do
> so.
> Blocked by our policy, and our inability to work around that policy, many
> of
> those contributors simply walked away.
>
>
> Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.  Deprecation warnings will start silent but
> become
> increasingly noisy over time.  Bugs in the current state of the code will
> *not*
> be fixed if fixing them would impact the stream.  However, if changes in
> the
> rest of ``numpy`` would break something in the ``RandomState`` code, we
> will
> fix ``RandomState`` to continue working (for example, some change in the
> C API).  No new features will be added to ``RandomState``.  Users should
> migrate to the new subsystem as they are able to.
>
> Work on a proposed `new PRNG subsystem
> <https://github.com/bashtage/randomgen>`_ is already underway.  The
> specifics
> of the new design are out of scope for this NEP and up for much
> discussion, but
> we will discuss general policies that will guide the evolution of whatever
> code
> is adopted.
>
> First, we will maintain API source compatibility just as we do with the
> rest of
> ``numpy``.  If we *must* make a breaking change, we will only do so with an
> appropriate deprecation period and warnings.
>
> Second, breaking stream-compatibility in order to introduce new features or
> improve performance will be *allowed* with *caution*.  Such changes will be
> considered features, and as such will be no faster than the standard
> release
> cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
> is
> not a bug.  Correctness bug fixes that break stream-compatibility can
> happen on
> bugfix releases, per usual, but developers should consider if they can wait
> until the next feature release.  We encourage developers to strongly weight
> user?s pain from the break in stream-compatibility against the
> improvements.
> One example of a worthwhile improvement would be to change algorithms for
> a significant increase in performance, for example, moving from the
> `Box-Muller
> transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_
> method
> of Gaussian variate generation to the faster `Ziggurat algorithm
> <https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of an
> unworthy improvement would be tweaking the Ziggurat tables just a little
> bit.
>
> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>     * ``.random_sample()``
>
> Furthermore, the new design should also provide one generator class (we
> shall
> call it ``StableRandom`` for discussion purposes) that provides a slightly
> broader subset of distribution methods for which stream-compatibility is
> *guaranteed*.  The point of ``StableRandom`` is to provide something that
> can
> be used in unit tests so projects that currently have tests which rely on
> the
> precise stream can be migrated off of ``RandomState``.  For the best
> transition, ``StableRandom`` should use as its core uniform PRNG the
> current
> MT19937 algorithm.  As best as possible, the API for the distribution
> methods
> that are provided on ``StableRandom`` should match their counterparts on
> ``RandomState``.  They should provide the same stream that the current
> version
> of ``RandomState`` does.  Because their intended use is for unit tests, we
> do
> not need the performance improvements from the new algorithms that will be
> introduced by the new subsystem.
>
> The list of ``StableRandom`` methods should be chosen to support unit
> tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>
>
> Not Versioning
> --------------
>
> For a long time, we considered that the way to allow algorithmic
> improvements
> while maintaining the stream was to apply some form of versioning.  That
> is,
> every time we make a stream change in one of the distributions, we
> increment
> some version number somewhere.  ``numpy.random`` would keep all past
> versions
> of the code, and there would be a way to get the old versions.  Proposals
> of
> how to do this exactly varied widely, but we will not exhaustively list
> them
> here.  We spent years going back and forth on these designs and were not
> able
> to find one that sufficed.  Let that time lost, and more importantly, the
> contributors that we lost while we dithered, serve as evidence against the
> notion.
>
> Concretely, adding in versioning makes maintenance of ``numpy.random``
> difficult.  Necessarily, we would be keeping lots of versions of the same
> code
> around.  Adding a new algorithm safely would still be quite hard.
>
> But most importantly, versioning is fundamentally difficult to *use*
> correctly.
> We want to make it easy and straightforward to get the latest, fastest,
> best
> versions of the distribution algorithms; otherwise, what's the point?  The
> way
> to make that easy is to make the latest the default.  But the default will
> necessarily change from release to release, so the user?s code would need
> to be
> altered anyway to specify the specific version that one wants to replicate.
>
> Adding in versioning to maintain stream-compatibility would still only
> provide
> the same level of stream-compatibility that we currently do, with all of
> the
> limitations described earlier.  Given that the standard practice for such
> needs
> is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
> alone
> is superfluous.
>
>
> Discussion
> ----------
>
> - https://mail.python.org/pipermail/numpy-discussion/
> 2018-January/077608.html
> - https://github.com/numpy/numpy/pull/10124#issuecomment-350876221
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain.
>
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/fd46e5df/attachment-0001.html>

From josef.pktd at gmail.com  Sun Jun  3 21:25:15 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 3 Jun 2018 21:25:15 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJit72r6d3cbmBC8n-LhrL-ybDB1Tok_kxOCSm56YZ-c3CQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
 <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>
 <CAMMTP+ChqO-5dDJLcK=reqiW-R5B=B15=3rDo1OWybid6C9M0Q@mail.gmail.com>
 <CAF6FJit72r6d3cbmBC8n-LhrL-ybDB1Tok_kxOCSm56YZ-c3CQ@mail.gmail.com>
Message-ID: <CAMMTP+CX6BUM1Q47Pkf-sCU+P-aLtBvv0Mp6pn5WSHM4ggKTEQ@mail.gmail.com>

On Sun, Jun 3, 2018 at 9:04 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 6:01 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
>>> wrote:
>>>
>>>> You make a bunch of good points refuting reproducible research as an
>>>> argument for not changing the random number streams.
>>>>
>>>> However, there?s a second use-case you don?t address - unit tests. For
>>>> better or worse, downstream, or even our own
>>>> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
>>>> unit tests use a seeded random number generator as a shorthand to produce
>>>> some arbirary array, and then hard-code the expected output in their tests.
>>>> Breaking stream compatibility will break these tests.
>>>>
>>> By the way, the reason that I didn't mention this use case as a
>>> motivation in the Status Quo section because, as I reviewed my mail
>>> archive, this wasn't actually a motivating use case for the policy. It's
>>> certainly a use case that developed once we did make these
>>> (*cough*extravagant*cough*) guarantees, though, as people started to rely
>>> on it, and I hope that my StableRandom proposal addresses it to your
>>> satisfaction. I could add some more details about that history if you
>>> think it would be useful.
>>>
>>
>> I don't think that's accurate.
>> The unit tests for stable random numbers were added when Enthought
>> silently changed the normal random numbers and we got messages from users
>> that the unit tests fail and they cannot reproduce our results.
>>
>> 6/12/10
>> [SciPy-Dev] seeded randn gets different values on osx
>>
>> (I don't find an online copy, this is from my own mail archive)
>>
>
> The policy was in place Nov 2008.
>

only for the underlying stream, but those unit tests didn't guarantee it
for the actual distributions
https://github.com/numpy/numpy/commit/898e6bdc625cdd3c97865ef99f8d51c5f43eafff

So maybe there was a discussion in 2008 which was mostly before my time.
The guarantee for distributions was added in 2010/2011, at least in terms
of unit tests in numpy
in order to protect the unit tests in scipy.stats and by analogy for
similar cases in other packages
and across users.


Josef


>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/fbbb6e71/attachment.html>

From robert.kern at gmail.com  Sun Jun  3 21:52:19 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 18:52:19 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+CX6BUM1Q47Pkf-sCU+P-aLtBvv0Mp6pn5WSHM4ggKTEQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAL1kJvDARo0G7jjFBM1szWsR_CtNEEan-f=enzGXnpjwoKJHmQ@mail.gmail.com>
 <CAF6FJisHcSgTf+KeZugHAjGyW4nsYKkCX72zoJ7DuO0=ohr4pQ@mail.gmail.com>
 <CAMMTP+ChqO-5dDJLcK=reqiW-R5B=B15=3rDo1OWybid6C9M0Q@mail.gmail.com>
 <CAF6FJit72r6d3cbmBC8n-LhrL-ybDB1Tok_kxOCSm56YZ-c3CQ@mail.gmail.com>
 <CAMMTP+CX6BUM1Q47Pkf-sCU+P-aLtBvv0Mp6pn5WSHM4ggKTEQ@mail.gmail.com>
Message-ID: <CAF6FJitSzJ7Ftzi6_txGm9_KjtZzRr1-qq59t0uzxtEcbj5oaA@mail.gmail.com>

On Sun, Jun 3, 2018 at 6:26 PM <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Jun 3, 2018 at 9:04 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Sun, Jun 3, 2018 at 6:01 PM <josef.pktd at gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>>
>>>> On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <wieser.eric+numpy at gmail.com>
>>>> wrote:
>>>>
>>>>> You make a bunch of good points refuting reproducible research as an
>>>>> argument for not changing the random number streams.
>>>>>
>>>>> However, there?s a second use-case you don?t address - unit tests. For
>>>>> better or worse, downstream, or even our own
>>>>> <https://github.com/numpy/numpy/blob/c4813a9/numpy/core/tests/test_multiarray.py#L5093-L5108>,
>>>>> unit tests use a seeded random number generator as a shorthand to produce
>>>>> some arbirary array, and then hard-code the expected output in their tests.
>>>>> Breaking stream compatibility will break these tests.
>>>>>
>>>> By the way, the reason that I didn't mention this use case as a
>>>> motivation in the Status Quo section because, as I reviewed my mail
>>>> archive, this wasn't actually a motivating use case for the policy. It's
>>>> certainly a use case that developed once we did make these
>>>> (*cough*extravagant*cough*) guarantees, though, as people started to rely
>>>> on it, and I hope that my StableRandom proposal addresses it to your
>>>> satisfaction. I could add some more details about that history if you
>>>> think it would be useful.
>>>>
>>>
>>> I don't think that's accurate.
>>> The unit tests for stable random numbers were added when Enthought
>>> silently changed the normal random numbers and we got messages from users
>>> that the unit tests fail and they cannot reproduce our results.
>>>
>>> 6/12/10
>>> [SciPy-Dev] seeded randn gets different values on osx
>>>
>>> (I don't find an online copy, this is from my own mail archive)
>>>
>>
>> The policy was in place Nov 2008.
>>
>
> only for the underlying stream, but those unit tests didn't guarantee it
> for the actual distributions
>
> https://github.com/numpy/numpy/commit/898e6bdc625cdd3c97865ef99f8d51c5f43eafff
>
> So maybe there was a discussion in 2008 which was mostly before my time.
> The guarantee for distributions was added in 2010/2011, at least in terms
> of unit tests in numpy
> in order to protect the unit tests in scipy.stats and by analogy for
> similar cases in other packages
> and across users.
>

The policy existed for the distributions regardless of whether or not we
had a test suite that ensured it. I cannot share internal emails, of
course, but please be assured that the existence of the policy was one of
my arguments for rolling back that addition to EPD (and would have been
what I argued to prevent it from going out, had I been aware of it).

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/7e80b50a/attachment.html>

From josef.pktd at gmail.com  Sun Jun  3 21:54:03 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 3 Jun 2018 21:54:03 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
Message-ID: <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>

On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>>
>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>> tests:
>>>>
>>>>     * ``.randint()``
>>>>     * ``.uniform()``
>>>>     * ``.normal()``
>>>>     * ``.standard_normal()``
>>>>     * ``.choice()``
>>>>     * ``.shuffle()``
>>>>     * ``.permutation()``
>>>>
>>>
>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>> @bashtage writes:
>>> > standard_gamma and standard_exponential are important enough to be
>>> included here IMO.
>>>
>>> "Importance" was not my criterion, only whether they are used in unit
>>> test suites. This list was just off the top of my head for methods that I
>>> think were actually used in test suites, so I'd be happy to be shown live
>>> tests that use other methods. I'd like to be a *little* conservative about
>>> what methods we stick in here, but we don't have to be *too* conservative,
>>> since we are explicitly never going to be modifying these.
>>>
>>
>> That's one area where I thought the selection is too narrow.
>> We should be able to get a stable stream from the uniform for some
>> distributions.
>>
>> However, according to the Wikipedia description Poisson doesn't look
>> easy. I just wrote a unit test for statsmodels using Poisson random numbers
>> with hard coded numbers for the regression tests.
>>
>
> I'd really rather people do this than use StableRandom; this is best
> practice, as I see it, if your tests involve making precise comparisons to
> expected results.
>

I hardcoded the results not the random data. So the unit tests rely on a
reproducible stream of Poisson random numbers.
I don't want to save 500 (100 or 1000) observations in a csv file for every
variation of the unit test that I run.


>
> StableRandom is intended as a crutch so that the pain of moving existing
> unit tests away from the deprecated RandomState is less onerous. I'd really
> rather people write better unit tests!
>
> In particular, I do not want to add any of the integer-domain
> distributions (aside from shuffle/permutation/choice) as these are the ones
> that have the platform-dependency issues with respect to 32/64-bit `long`
> integers. They'd be unreliable for unit tests even if we kept them stable
> over time.
>
>
>> I'm not sure which other distributions are common enough and not easily
>> reproducible by transformation. E.g. negative binomial can be reproduces by
>> a gamma-poisson mixture.
>>
>> On the other hand normal can be easily recreated from standard_normal.
>>
>
> I was mostly motivated by making it a bit easier to mechanically replace
> uses of randn(), which is probably even more common than normal() and
> standard_normal() in unit tests.
>
>
>> Would it be difficult to keep this list large, given that it should be
>> frozen, low maintenance code ?
>>
>
> I admit that I had in mind non-statistical unit tests. That is, tests that
> didn't depend on the precise distribution of the inputs.
>

The problem is that the unit test in `stats` rely on precise inputs (up to
some numerical noise).
For example p-values themselves are uniformly distributed if the hypothesis
test works correctly. That mean if I don't have control over the inputs,
then my p-value could be anything in (0, 1). So either we need a real
dataset, save all the random numbers in a file or have a reproducible set
of random numbers.

95% of the unit tests that I write are for statistics. A large fraction of
them don't rely on the exact distribution, but do rely on a random numbers
that are "good enough".
For example, when writing unit test, then I get every once in a while or
sometimes more often a "bad" stream of random numbers, for which
convergence might fail or where the estimated numbers are far away from the
true numbers, so test tolerance would have to be very high.
If I pick one of the seeds that looks good, then I can have tighter unit
test tolerance to insure results are good in a nice case.

The problem is that we cannot write robust unit tests for regression tests
without stable inputs.
E.g. I verified my results with a Monte Carlo with 5000 replications and
1000 Poisson observations in each.
Results look close to expected and won't depend much on the exact stream of
random variables.
But the Monte Carlo for each variant of the test took about 40 seconds.
Doing this for all option combination and dataset specification takes too
long to be feasible in a unit test suite.
So I rely on numpy's stable random numbers and hard code the results for a
specific random sample in the regression unit tests.

Josef


>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/cc0ffe7d/attachment-0001.html>

From shoyer at gmail.com  Sun Jun  3 22:08:34 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 19:08:34 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJitafWqU0=D5qEAwSQ4eftqOLdYqrJaAZOL9DP2+3byssg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAEQ_Tvdhf9gd9zkj5jwE2z6XFJXSw3vjXZ_xoTxNPZSVpSbiPg@mail.gmail.com>
 <CAF6FJitafWqU0=D5qEAwSQ4eftqOLdYqrJaAZOL9DP2+3byssg@mail.gmail.com>
Message-ID: <CAEQ_TveCHWvhonUX+kH+DfQors-T0p5RmABut8x+qztMh3koew@mail.gmail.com>

On Sun, Jun 3, 2018 at 5:39 PM Robert Kern <robert.kern at gmail.com> wrote:

> You and I both agree that this is an anti-pattern for numpy.random, but
>> certainly there is plenty of code that relies on the stability of random
>> numbers when seeds are set by np.random.seed(). Similar to the case for
>> RandomState, we would presumably need to start issuing warnings when seed()
>> is explicitly called, which begs the question of what (if anything) we
>> propose to replace seed() with.
>>
>
> Well, *I* propose `AttributeError`, myself?
>
>
>> I suppose this will be your next NEP :).
>>
>
> I deliberately left it out of this one as it may, depending on our
> choices, impinge upon the design of the new PRNG subsystem, which I
> declared out of scope for this NEP. I have ideas (besides the glib "Let
> them eat AttributeErrors!"), and now that I think more about it, that does
> seem like it might be in scope just like the discussion of freezing
> RandomState and StableRandom are. But I think I'd like to hold that thought
> a little bit and get a little more screaming^Wfeedback on the core proposal
> first. I'll return to this in a few days if not sooner.
>

For this NEP, it might be enough here to say that the current behavior of
np.random.seed() will be deprecated just like np.random.RandomState(),
since the current implementation of np.random.seed() is intimately tied to
RandomState.

The natural of the exact replacement (if any) can be left for future
discussion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/94bf6bde/attachment.html>

From shoyer at gmail.com  Sun Jun  3 22:31:13 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 19:31:13 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
Message-ID: <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>

On Sun, Jun 3, 2018 at 5:44 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Although I'm still not 100% convinced by NotImplementedButCoercible, I do
> like the idea that this is the default for items that do not implement
> `__array_function__`. And it might help avoid trying to find oneself in a
> possibly long list.
>

Another potential consideration in favor of NotImplementedButCoercible is
for subclassing: we could use it to write the default implementations of
ndarray.__array_ufunc__ and ndarray.__array_function__, e.g.,

class ndarray:
    def __array_ufunc__(self, *args, **kwargs):
        return NotIImplementedButCoercible
    def __array_function__(self, *args, **kwargs):
        return NotIImplementedButCoercible

I think (not 100% sure yet) this would result in exactly equivalent
behavior to what ndarray.__array_ufunc__ currently does:
http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/d2fe23d0/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  3 23:20:23 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 3 Jun 2018 20:20:23 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
Message-ID: <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>

On Sun, Jun 3, 2018 at 6:54 PM, <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>>
>>>>
>>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>>> tests:
>>>>>
>>>>>     * ``.randint()``
>>>>>     * ``.uniform()``
>>>>>     * ``.normal()``
>>>>>     * ``.standard_normal()``
>>>>>     * ``.choice()``
>>>>>     * ``.shuffle()``
>>>>>     * ``.permutation()``
>>>>>
>>>>
>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>>> @bashtage writes:
>>>> > standard_gamma and standard_exponential are important enough to be
>>>> included here IMO.
>>>>
>>>> "Importance" was not my criterion, only whether they are used in unit
>>>> test suites. This list was just off the top of my head for methods that I
>>>> think were actually used in test suites, so I'd be happy to be shown live
>>>> tests that use other methods. I'd like to be a *little* conservative about
>>>> what methods we stick in here, but we don't have to be *too* conservative,
>>>> since we are explicitly never going to be modifying these.
>>>>
>>>
>>> That's one area where I thought the selection is too narrow.
>>> We should be able to get a stable stream from the uniform for some
>>> distributions.
>>>
>>> However, according to the Wikipedia description Poisson doesn't look
>>> easy. I just wrote a unit test for statsmodels using Poisson random numbers
>>> with hard coded numbers for the regression tests.
>>>
>>
>> I'd really rather people do this than use StableRandom; this is best
>> practice, as I see it, if your tests involve making precise comparisons to
>> expected results.
>>
>
> I hardcoded the results not the random data. So the unit tests rely on a
> reproducible stream of Poisson random numbers.
> I don't want to save 500 (100 or 1000) observations in a csv file for
> every variation of the unit test that I run.
>

I agree, hardcoding numbers in every place where seeded random numbers are
now used is quite unrealistic.

It may be worth having a look at test suites for scipy, statsmodels,
scikit-learn, etc. and estimate how much work this NEP causes those
projects. If the devs of those packages are forced to do large scale
migrations from RandomState to StableState, then why not instead keep
RandomState and just add a new API next to it?

Ralf


>
>
>>
>> StableRandom is intended as a crutch so that the pain of moving existing
>> unit tests away from the deprecated RandomState is less onerous. I'd really
>> rather people write better unit tests!
>>
>> In particular, I do not want to add any of the integer-domain
>> distributions (aside from shuffle/permutation/choice) as these are the ones
>> that have the platform-dependency issues with respect to 32/64-bit `long`
>> integers. They'd be unreliable for unit tests even if we kept them stable
>> over time.
>>
>>
>>> I'm not sure which other distributions are common enough and not easily
>>> reproducible by transformation. E.g. negative binomial can be reproduces by
>>> a gamma-poisson mixture.
>>>
>>> On the other hand normal can be easily recreated from standard_normal.
>>>
>>
>> I was mostly motivated by making it a bit easier to mechanically replace
>> uses of randn(), which is probably even more common than normal() and
>> standard_normal() in unit tests.
>>
>>
>>> Would it be difficult to keep this list large, given that it should be
>>> frozen, low maintenance code ?
>>>
>>
>> I admit that I had in mind non-statistical unit tests. That is, tests
>> that didn't depend on the precise distribution of the inputs.
>>
>
> The problem is that the unit test in `stats` rely on precise inputs (up to
> some numerical noise).
> For example p-values themselves are uniformly distributed if the
> hypothesis test works correctly. That mean if I don't have control over the
> inputs, then my p-value could be anything in (0, 1). So either we need a
> real dataset, save all the random numbers in a file or have a reproducible
> set of random numbers.
>
> 95% of the unit tests that I write are for statistics. A large fraction of
> them don't rely on the exact distribution, but do rely on a random numbers
> that are "good enough".
> For example, when writing unit test, then I get every once in a while or
> sometimes more often a "bad" stream of random numbers, for which
> convergence might fail or where the estimated numbers are far away from the
> true numbers, so test tolerance would have to be very high.
> If I pick one of the seeds that looks good, then I can have tighter unit
> test tolerance to insure results are good in a nice case.
>
> The problem is that we cannot write robust unit tests for regression tests
> without stable inputs.
> E.g. I verified my results with a Monte Carlo with 5000 replications and
> 1000 Poisson observations in each.
> Results look close to expected and won't depend much on the exact stream
> of random variables.
> But the Monte Carlo for each variant of the test took about 40 seconds.
> Doing this for all option combination and dataset specification takes too
> long to be feasible in a unit test suite.
> So I rely on numpy's stable random numbers and hard code the results for a
> specific random sample in the regression unit tests.
>
> Josef
>
>
>
>>
>> --
>> Robert Kern
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/cd8b099b/attachment-0001.html>

From charlesr.harris at gmail.com  Mon Jun  4 00:22:17 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 3 Jun 2018 22:22:17 -0600
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>

On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <robert.kern at gmail.com> wrote:

> As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/
> nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>
> ==============================
> Random Number Generator Policy
> ==============================
>
> :Author: Robert Kern <robert.kern at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> --------
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --------------
>
> Our current policy, in full:
>
>     A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
>     same parameters will always produce the same results up to roundoff
> error
>     except when the values were incorrect.  Incorrect values will be fixed
> and
>     the NumPy version in which the fix was made will be noted in the
> relevant
>     docstring.  Extension of existing parameter ranges and the addition of
> new
>     parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
>

Instituted?


> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by ?we? in this section, we really mean Robert Kern,
> let?s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.  We use C ``long`` integers internally for
> integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stake.  And even if we didn?t do that, any nontrivial program does more
> than
> just draw random numbers.  They do computations on those numbers, transform
> those with numerical algorithms from the rest of ``numpy``, which is not
> subject to so strict a policy.  Trying to maintain stream-compatibility
> for our
> random number distributions does not help reproducible research for these
> reasons.
>
> The standard practice now for bit-for-bit reproducible research is to pin
> all
> of the versions of code of your software stack, possibly down to the OS
> itself.
> The landscape for accomplishing this is much easier today than it was in
> 2008.
> We now have ``pip``.  We now have virtual machines.  Those who need to
> reproduce simulations exactly now can (and ought to) do so by using the
> exact
> same version of ``numpy``.  We do not need to maintain stream-compatibility
> across ``numpy`` versions to help them.
>
> Our stream-compatibility guarantee has hindered our ability to make
> improvements to ``numpy.random``.  Several first-time contributors have
> submitted PRs to improve the distributions, usually by implementing a
> faster,
> or more accurate algorithm than the one that is currently there.
> Unfortunately, most of them would have required breaking the stream to do
> so.
> Blocked by our policy, and our inability to work around that policy, many
> of
> those contributors simply walked away.
>
>
> Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.  Deprecation warnings will start silent but
> become
> increasingly noisy over time.  Bugs in the current state of the code will
> *not*
> be fixed if fixing them would impact the stream.  However, if changes in
> the
> rest of ``numpy`` would break something in the ``RandomState`` code, we
> will
> fix ``RandomState`` to continue working (for example, some change in the
> C API).  No new features will be added to ``RandomState``.  Users should
> migrate to the new subsystem as they are able to.
>
> Work on a proposed `new PRNG subsystem
> <https://github.com/bashtage/randomgen>`_ is already underway.  The
> specifics
> of the new design are out of scope for this NEP and up for much
> discussion, but
> we will discuss general policies that will guide the evolution of whatever
> code
> is adopted.
>
> First, we will maintain API source compatibility just as we do with the
> rest of
> ``numpy``.  If we *must* make a breaking change, we will only do so with an
> appropriate deprecation period and warnings.
>
> Second, breaking stream-compatibility in order to introduce new features or
> improve performance will be *allowed* with *caution*.  Such changes will be
> considered features, and as such will be no faster than the standard
> release
> cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
> is
> not a bug.  Correctness bug fixes that break stream-compatibility can
> happen on
> bugfix releases, per usual, but developers should consider if they can wait
> until the next feature release.  We encourage developers to strongly weight
> user?s pain from the break in stream-compatibility against the
> improvements.
> One example of a worthwhile improvement would be to change algorithms for
> a significant increase in performance, for example, moving from the
> `Box-Muller
> transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_
> method
> of Gaussian variate generation to the faster `Ziggurat algorithm
> <https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of an
> unworthy improvement would be tweaking the Ziggurat tables just a little
> bit.
>
> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>     * ``.random_sample()``
>
> Furthermore, the new design should also provide one generator class (we
> shall
> call it ``StableRandom`` for discussion purposes) that provides a slightly
> broader subset of distribution methods for which stream-compatibility is
> *guaranteed*.  The point of ``StableRandom`` is to provide something that
> can
> be used in unit tests so projects that currently have tests which rely on
> the
> precise stream can be migrated off of ``RandomState``.  For the best
> transition, ``StableRandom`` should use as its core uniform PRNG the
> current
> MT19937 algorithm.  As best as possible, the API for the distribution
> methods
> that are provided on ``StableRandom`` should match their counterparts on
> ``RandomState``.  They should provide the same stream that the current
> version
> of ``RandomState`` does.  Because their intended use is for unit tests, we
> do
> not need the performance improvements from the new algorithms that will be
> introduced by the new subsystem.
>
> The list of ``StableRandom`` methods should be chosen to support unit
> tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>
>
> Not Versioning
> --------------
>
> For a long time, we considered that the way to allow algorithmic
> improvements
> while maintaining the stream was to apply some form of versioning.  That
> is,
> every time we make a stream change in one of the distributions, we
> increment
> some version number somewhere.  ``numpy.random`` would keep all past
> versions
> of the code, and there would be a way to get the old versions.  Proposals
> of
> how to do this exactly varied widely, but we will not exhaustively list
> them
> here.  We spent years going back and forth on these designs and were not
> able
> to find one that sufficed.  Let that time lost, and more importantly, the
> contributors that we lost while we dithered, serve as evidence against the
> notion.
>
> Concretely, adding in versioning makes maintenance of ``numpy.random``
> difficult.  Necessarily, we would be keeping lots of versions of the same
> code
> around.  Adding a new algorithm safely would still be quite hard.
>
> But most importantly, versioning is fundamentally difficult to *use*
> correctly.
> We want to make it easy and straightforward to get the latest, fastest,
> best
> versions of the distribution algorithms; otherwise, what's the point?  The
> way
> to make that easy is to make the latest the default.  But the default will
> necessarily change from release to release, so the user?s code would need
> to be
> altered anyway to specify the specific version that one wants to replicate.
>
> Adding in versioning to maintain stream-compatibility would still only
> provide
> the same level of stream-compatibility that we currently do, with all of
> the
> limitations described earlier.  Given that the standard practice for such
> needs
> is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
> alone
> is superfluous.
>

This section is a bit unclear. Would it be correct to say that the rng
version is the numpy version? If so, it might be best to say that up front
before justifying it.


>
>
> Discussion
> ----------
>
> - https://mail.python.org/pipermail/numpy-discussion/
> 2018-January/077608.html
> - https://github.com/numpy/numpy/pull/10124#issuecomment-350876221
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain.
>
>
>
Mostly off topic, but I note that the new module proposes integers of
various lengths using the Python half open ranges. I would like to suggest
that we modify that just a hair so we can specify the whole range in the
integer interval specification. For instance, the full range of an 8 bit
unsigned integer could be given as `(0, 0)`, i.e., (0, 255 + 1). This would
be most useful for the biggest (64 bit) types, but I am more thinking of
the case where sequences of ranges can be used.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/d00e0a6f/attachment-0001.html>

From warren.weckesser at gmail.com  Mon Jun  4 00:23:23 2018
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Mon, 4 Jun 2018 00:23:23 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
Message-ID: <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>

On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sun, Jun 3, 2018 at 6:54 PM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>>>> tests:
>>>>>>
>>>>>>     * ``.randint()``
>>>>>>     * ``.uniform()``
>>>>>>     * ``.normal()``
>>>>>>     * ``.standard_normal()``
>>>>>>     * ``.choice()``
>>>>>>     * ``.shuffle()``
>>>>>>     * ``.permutation()``
>>>>>>
>>>>>
>>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>>>> @bashtage writes:
>>>>> > standard_gamma and standard_exponential are important enough to be
>>>>> included here IMO.
>>>>>
>>>>> "Importance" was not my criterion, only whether they are used in unit
>>>>> test suites. This list was just off the top of my head for methods that I
>>>>> think were actually used in test suites, so I'd be happy to be shown live
>>>>> tests that use other methods. I'd like to be a *little* conservative about
>>>>> what methods we stick in here, but we don't have to be *too* conservative,
>>>>> since we are explicitly never going to be modifying these.
>>>>>
>>>>
>>>> That's one area where I thought the selection is too narrow.
>>>> We should be able to get a stable stream from the uniform for some
>>>> distributions.
>>>>
>>>> However, according to the Wikipedia description Poisson doesn't look
>>>> easy. I just wrote a unit test for statsmodels using Poisson random numbers
>>>> with hard coded numbers for the regression tests.
>>>>
>>>
>>> I'd really rather people do this than use StableRandom; this is best
>>> practice, as I see it, if your tests involve making precise comparisons to
>>> expected results.
>>>
>>
>> I hardcoded the results not the random data. So the unit tests rely on a
>> reproducible stream of Poisson random numbers.
>> I don't want to save 500 (100 or 1000) observations in a csv file for
>> every variation of the unit test that I run.
>>
>
> I agree, hardcoding numbers in every place where seeded random numbers are
> now used is quite unrealistic.
>
> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>
>

As a quick and imperfect test, I monkey-patched numpy so that a call to
numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests
using the `runtests.py` script:

*seed+1000, using 'python runtests.py -n' in the source directory:*

  236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed


Most of the failures are in scipy.stats:

*seed+1000, using 'python runtests.py -n -s stats' in the source directory:*

  203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed


Changing the amount added to the seed or running the tests using the
function `scipy.test("full")` gives different (but similar magnitude)
results:

*seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:*

  269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed

*seed+1, using 'python runtests.py -n' in the source directory:*

  305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed


I suspect many of the tests will be easy to update, so fixing 300 or so
tests does not seem like a monumental task.  I haven't looked into why
there are 585 deselected tests; maybe there are many more tests lurking
there that will have to be updated.

Warren


Ralf
>
>
>
>>
>>
>>>
>>> StableRandom is intended as a crutch so that the pain of moving existing
>>> unit tests away from the deprecated RandomState is less onerous. I'd really
>>> rather people write better unit tests!
>>>
>>> In particular, I do not want to add any of the integer-domain
>>> distributions (aside from shuffle/permutation/choice) as these are the ones
>>> that have the platform-dependency issues with respect to 32/64-bit `long`
>>> integers. They'd be unreliable for unit tests even if we kept them stable
>>> over time.
>>>
>>>
>>>> I'm not sure which other distributions are common enough and not easily
>>>> reproducible by transformation. E.g. negative binomial can be reproduces by
>>>> a gamma-poisson mixture.
>>>>
>>>> On the other hand normal can be easily recreated from standard_normal.
>>>>
>>>
>>> I was mostly motivated by making it a bit easier to mechanically replace
>>> uses of randn(), which is probably even more common than normal() and
>>> standard_normal() in unit tests.
>>>
>>>
>>>> Would it be difficult to keep this list large, given that it should be
>>>> frozen, low maintenance code ?
>>>>
>>>
>>> I admit that I had in mind non-statistical unit tests. That is, tests
>>> that didn't depend on the precise distribution of the inputs.
>>>
>>
>> The problem is that the unit test in `stats` rely on precise inputs (up
>> to some numerical noise).
>> For example p-values themselves are uniformly distributed if the
>> hypothesis test works correctly. That mean if I don't have control over the
>> inputs, then my p-value could be anything in (0, 1). So either we need a
>> real dataset, save all the random numbers in a file or have a reproducible
>> set of random numbers.
>>
>> 95% of the unit tests that I write are for statistics. A large fraction
>> of them don't rely on the exact distribution, but do rely on a random
>> numbers that are "good enough".
>> For example, when writing unit test, then I get every once in a while or
>> sometimes more often a "bad" stream of random numbers, for which
>> convergence might fail or where the estimated numbers are far away from the
>> true numbers, so test tolerance would have to be very high.
>> If I pick one of the seeds that looks good, then I can have tighter unit
>> test tolerance to insure results are good in a nice case.
>>
>> The problem is that we cannot write robust unit tests for regression
>> tests without stable inputs.
>> E.g. I verified my results with a Monte Carlo with 5000 replications and
>> 1000 Poisson observations in each.
>> Results look close to expected and won't depend much on the exact stream
>> of random variables.
>> But the Monte Carlo for each variant of the test took about 40 seconds.
>> Doing this for all option combination and dataset specification takes too
>> long to be feasible in a unit test suite.
>> So I rely on numpy's stable random numbers and hard code the results for
>> a specific random sample in the regression unit tests.
>>
>> Josef
>>
>>
>>
>>>
>>> --
>>> Robert Kern
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/9a01dbca/attachment.html>

From einstein.edison at gmail.com  Mon Jun  4 00:47:15 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 3 Jun 2018 21:47:15 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
Message-ID: <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>

Mixed return values of NotImplementedButCoercible and NotImplemented would
still result in TypeError, and there would be no second chances for
overloads.


I would like to differ with you here: It can be quite useful to have second
chances for overloads. Think ``np.func(list, custom_array))``: If second
rounds did not exist, custom_array would need to have a list of coercible
types (which is not nice IMO).

It can also help in cases where performance/feature degradation isn?t an
issue, so coercing all arguments that returned
``NotImplementedButCoercible`` would allow ``__array_function__`` to
succeed where it wouldn?t normally. I mean, that?s one of the major uses of
this sentinel right?

If done in a for loop, it wouldn?t even slow down the nominal cases. It
would have the adverse effect of not allowing for a default implementation
to be as simple as you stated, though.

One thing we could do is manually (inside ``__array_function__``) coerce
anything that didn?t implement ``__array_function__``, and that?s
acceptable to me too.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/e292b4ef/attachment-0001.html>

From shoyer at gmail.com  Mon Jun  4 00:53:25 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 21:53:25 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
Message-ID: <CAEQ_Tvf_tAJ0QV1UDuADjWT=c-N6oNdP8Wybf=wDNbCJOcb4OA@mail.gmail.com>

On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>

Tests that explicitly create RandomState objects would not be difficult to
migrate. The goal of "StableState" is that it could be used directly in
cases where RandomState is current used in tests, so I would guess that
"RandomState" could be almost mechanistically replaced by "StableState".

The challenging case are calls to np.random.seed(). If no replacement API
is planned, then these would need to be manually converted to use
StableState instead. This is probably not too onerous (and is a good
cleanup to do anyways) but it would be a bit of work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/fcd6ed7c/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 01:03:28 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 22:03:28 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>
Message-ID: <CAF6FJityxsJ4xjqUChdA1D=JynfTjGf-mGWM9JObYB+3decUsw@mail.gmail.com>

On Sun, Jun 3, 2018 at 9:24 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
> On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> This policy was first instated in Nov 2008 (in essence; the full set of
>> weasel
>>
>
> Instituted?
>

I meant "instated"; c.f. for another usage:
https://www.youredm.com/2018/06/01/spotify-new-policy-update/

But "instituted" would work just as well. It may be that "instated a
policy" is just an idiosyncratic back-formation of "reinstated a policy",
which even to me feels more right.

Not Versioning
>> --------------
>>
>> For a long time, we considered that the way to allow algorithmic
>> improvements
>> while maintaining the stream was to apply some form of versioning.  That
>> is,
>> every time we make a stream change in one of the distributions, we
>> increment
>> some version number somewhere.  ``numpy.random`` would keep all past
>> versions
>> of the code, and there would be a way to get the old versions.  Proposals
>> of
>> how to do this exactly varied widely, but we will not exhaustively list
>> them
>> here.  We spent years going back and forth on these designs and were not
>> able
>> to find one that sufficed.  Let that time lost, and more importantly, the
>> contributors that we lost while we dithered, serve as evidence against the
>> notion.
>>
>> Concretely, adding in versioning makes maintenance of ``numpy.random``
>> difficult.  Necessarily, we would be keeping lots of versions of the same
>> code
>> around.  Adding a new algorithm safely would still be quite hard.
>>
>> But most importantly, versioning is fundamentally difficult to *use*
>> correctly.
>> We want to make it easy and straightforward to get the latest, fastest,
>> best
>> versions of the distribution algorithms; otherwise, what's the point?
>> The way
>> to make that easy is to make the latest the default.  But the default will
>> necessarily change from release to release, so the user?s code would need
>> to be
>> altered anyway to specify the specific version that one wants to
>> replicate.
>>
>> Adding in versioning to maintain stream-compatibility would still only
>> provide
>> the same level of stream-compatibility that we currently do, with all of
>> the
>> limitations described earlier.  Given that the standard practice for such
>> needs
>> is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
>> alone
>> is superfluous.
>>
>
> This section is a bit unclear. Would it be correct to say that the rng
> version is the numpy version? If so, it might be best to say that up front
> before justifying it.
>

I'm sorry, I'm unclear on what you are asking me to make clearer. There is
currently no such thing as "the rng version". The thrust of this section of
the NEP is to reject the previously floated idea of introducing the concept
at all. So I would certainly not say anything along the lines that "the rng
version is the numpy version". I do say, here and earlier, that the way to
get the same RNG code is to get the same version of numpy.

Mostly off topic, but I note that the new module proposes integers of
> various lengths using the Python half open ranges. I would like to suggest
> that we modify that just a hair so we can specify the whole range in the
> integer interval specification. For instance, the full range of an 8 bit
> unsigned integer could be given as `(0, 0)`, i.e., (0, 255 + 1). This would
> be most useful for the biggest (64 bit) types, but I am more thinking of
> the case where sequences of ranges can be used.
>

That is indeed something out of scope for this NEP discussion. Feel free to
open an issue on the randomgen Github. But suffice it to say that I intend
to make sure that the new subsystem has at least feature parity with the
current code, and that is one of the features in the current code.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/260ebcf7/attachment.html>

From josef.pktd at gmail.com  Mon Jun  4 01:25:53 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 4 Jun 2018 01:25:53 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAEQ_Tvf_tAJ0QV1UDuADjWT=c-N6oNdP8Wybf=wDNbCJOcb4OA@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAEQ_Tvf_tAJ0QV1UDuADjWT=c-N6oNdP8Wybf=wDNbCJOcb4OA@mail.gmail.com>
Message-ID: <CAMMTP+CJBjJUcy6VauMrPXfSWhaEwoCFbb0-RdnNGe9ObFvm+w@mail.gmail.com>

On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> It may be worth having a look at test suites for scipy, statsmodels,
>> scikit-learn, etc. and estimate how much work this NEP causes those
>> projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
>>
>
> Tests that explicitly create RandomState objects would not be difficult to
> migrate. The goal of "StableState" is that it could be used directly in
> cases where RandomState is current used in tests, so I would guess that
> "RandomState" could be almost mechanistically replaced by "StableState".
>
> The challenging case are calls to np.random.seed(). If no replacement API
> is planned, then these would need to be manually converted to use
> StableState instead. This is probably not too onerous (and is a good
> cleanup to do anyways) but it would be a bit of work.
>

I agree with this. Statsmodels uses mostly np.random.seed. That cleanup is
planned, but postponed so far as not high priority. We will have to do it
eventually.

The main work will come when StableState doesn't include specific
distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that
we don't even use yet, like Beta.
I don't want to migrate random number generation for the distributions
abandoned by numpy Stable to statsmodels.

Josef


>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/c7dbbb0d/attachment-0001.html>

From charlesr.harris at gmail.com  Mon Jun  4 01:26:08 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 3 Jun 2018 23:26:08 -0600
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJityxsJ4xjqUChdA1D=JynfTjGf-mGWM9JObYB+3decUsw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>
 <CAF6FJityxsJ4xjqUChdA1D=JynfTjGf-mGWM9JObYB+3decUsw@mail.gmail.com>
Message-ID: <CAB6mnxKGG2YFh-mkcxEk-NZSkqW6+wt_uVX3Oy5kviPO4bWwUg@mail.gmail.com>

On Sun, Jun 3, 2018 at 11:03 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 9:24 PM Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>
>>
>> On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>>
>>> This policy was first instated in Nov 2008 (in essence; the full set of
>>> weasel
>>>
>>
>> Instituted?
>>
>
> I meant "instated"; c.f. for another usage: https://www.youredm.com/2018/
> 06/01/spotify-new-policy-update/
>
> But "instituted" would work just as well. It may be that "instated a
> policy" is just an idiosyncratic back-formation of "reinstated a policy",
> which even to me feels more right.
>
> Not Versioning
>>> --------------
>>>
>>> For a long time, we considered that the way to allow algorithmic
>>> improvements
>>> while maintaining the stream was to apply some form of versioning.  That
>>> is,
>>> every time we make a stream change in one of the distributions, we
>>> increment
>>> some version number somewhere.  ``numpy.random`` would keep all past
>>> versions
>>> of the code, and there would be a way to get the old versions.
>>> Proposals of
>>> how to do this exactly varied widely, but we will not exhaustively list
>>> them
>>> here.  We spent years going back and forth on these designs and were not
>>> able
>>> to find one that sufficed.  Let that time lost, and more importantly, the
>>> contributors that we lost while we dithered, serve as evidence against
>>> the
>>> notion.
>>>
>>> Concretely, adding in versioning makes maintenance of ``numpy.random``
>>> difficult.  Necessarily, we would be keeping lots of versions of the
>>> same code
>>> around.  Adding a new algorithm safely would still be quite hard.
>>>
>>> But most importantly, versioning is fundamentally difficult to *use*
>>> correctly.
>>> We want to make it easy and straightforward to get the latest, fastest,
>>> best
>>> versions of the distribution algorithms; otherwise, what's the point?
>>> The way
>>> to make that easy is to make the latest the default.  But the default
>>> will
>>> necessarily change from release to release, so the user?s code would
>>> need to be
>>> altered anyway to specify the specific version that one wants to
>>> replicate.
>>>
>>> Adding in versioning to maintain stream-compatibility would still only
>>> provide
>>> the same level of stream-compatibility that we currently do, with all of
>>> the
>>> limitations described earlier.  Given that the standard practice for
>>> such needs
>>> is to pin the release of ``numpy`` as a whole, versioning
>>> ``RandomState`` alone
>>> is superfluous.
>>>
>>
>> This section is a bit unclear. Would it be correct to say that the rng
>> version is the numpy version? If so, it might be best to say that up front
>> before justifying it.
>>
>
> I'm sorry, I'm unclear on what you are asking me to make clearer. There is
> currently no such thing as "the rng version". The thrust of this section of
> the NEP is to reject the previously floated idea of introducing the concept
> at all. So I would certainly not say anything along the lines that "the rng
> version is the numpy version". I do say, here and earlier, that the way to
> get the same RNG code is to get the same version of numpy.
>

Just so, and you could make that clearer, as you do here.


>
> Mostly off topic, but I note that the new module proposes integers of
>> various lengths using the Python half open ranges. I would like to suggest
>> that we modify that just a hair so we can specify the whole range in the
>> integer interval specification. For instance, the full range of an 8 bit
>> unsigned integer could be given as `(0, 0)`, i.e., (0, 255 + 1). This would
>> be most useful for the biggest (64 bit) types, but I am more thinking of
>> the case where sequences of ranges can be used.
>>
>
> That is indeed something out of scope for this NEP discussion. Feel free
> to open an issue on the randomgen Github. But suffice it to say that I
> intend to make sure that the new subsystem has at least feature parity with
> the current code, and that is one of the features in the current code.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/975f3035/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 01:47:34 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 22:47:34 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAB6mnxKGG2YFh-mkcxEk-NZSkqW6+wt_uVX3Oy5kviPO4bWwUg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>
 <CAF6FJityxsJ4xjqUChdA1D=JynfTjGf-mGWM9JObYB+3decUsw@mail.gmail.com>
 <CAB6mnxKGG2YFh-mkcxEk-NZSkqW6+wt_uVX3Oy5kviPO4bWwUg@mail.gmail.com>
Message-ID: <CAF6FJivUE0SWA8vzUG6OdxTesU+3azDN+fL3eRQcVfiuRd1ZyA@mail.gmail.com>

On Sun, Jun 3, 2018 at 10:29 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Sun, Jun 3, 2018 at 11:03 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Sun, Jun 3, 2018 at 9:24 PM Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>> On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>>>
>>>> This policy was first instated in Nov 2008 (in essence; the full set of
>>>> weasel
>>>>
>>>
>>> Instituted?
>>>
>>
>> I meant "instated"; c.f. for another usage:
>> https://www.youredm.com/2018/06/01/spotify-new-policy-update/
>>
>> But "instituted" would work just as well. It may be that "instated a
>> policy" is just an idiosyncratic back-formation of "reinstated a policy",
>> which even to me feels more right.
>>
>> Not Versioning
>>>> --------------
>>>>
>>>> For a long time, we considered that the way to allow algorithmic
>>>> improvements
>>>> while maintaining the stream was to apply some form of versioning.
>>>> That is,
>>>> every time we make a stream change in one of the distributions, we
>>>> increment
>>>> some version number somewhere.  ``numpy.random`` would keep all past
>>>> versions
>>>> of the code, and there would be a way to get the old versions.
>>>> Proposals of
>>>> how to do this exactly varied widely, but we will not exhaustively list
>>>> them
>>>> here.  We spent years going back and forth on these designs and were
>>>> not able
>>>> to find one that sufficed.  Let that time lost, and more importantly,
>>>> the
>>>> contributors that we lost while we dithered, serve as evidence against
>>>> the
>>>> notion.
>>>>
>>>> Concretely, adding in versioning makes maintenance of ``numpy.random``
>>>> difficult.  Necessarily, we would be keeping lots of versions of the
>>>> same code
>>>> around.  Adding a new algorithm safely would still be quite hard.
>>>>
>>>> But most importantly, versioning is fundamentally difficult to *use*
>>>> correctly.
>>>> We want to make it easy and straightforward to get the latest, fastest,
>>>> best
>>>> versions of the distribution algorithms; otherwise, what's the point?
>>>> The way
>>>> to make that easy is to make the latest the default.  But the default
>>>> will
>>>> necessarily change from release to release, so the user?s code would
>>>> need to be
>>>> altered anyway to specify the specific version that one wants to
>>>> replicate.
>>>>
>>>> Adding in versioning to maintain stream-compatibility would still only
>>>> provide
>>>> the same level of stream-compatibility that we currently do, with all
>>>> of the
>>>> limitations described earlier.  Given that the standard practice for
>>>> such needs
>>>> is to pin the release of ``numpy`` as a whole, versioning
>>>> ``RandomState`` alone
>>>> is superfluous.
>>>>
>>>
>>> This section is a bit unclear. Would it be correct to say that the rng
>>> version is the numpy version? If so, it might be best to say that up front
>>> before justifying it.
>>>
>>
>> I'm sorry, I'm unclear on what you are asking me to make clearer. There
>> is currently no such thing as "the rng version". The thrust of this section
>> of the NEP is to reject the previously floated idea of introducing the
>> concept at all. So I would certainly not say anything along the lines that
>> "the rng version is the numpy version". I do say, here and earlier, that
>> the way to get the same RNG code is to get the same version of numpy.
>>
>
> Just so, and you could make that clearer, as you do here.
>

I don't understand. All I did was repeat what I already said twice. If
you'd like to provide some text that would have clarified things for you,
I'll see about inserting it, but I'm at a loss for writing that text.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/e36568bc/attachment-0001.html>

From einstein.edison at gmail.com  Mon Jun  4 01:55:17 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 3 Jun 2018 22:55:17 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivUE0SWA8vzUG6OdxTesU+3azDN+fL3eRQcVfiuRd1ZyA@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAB6mnxLTHP+tZehQUALgVgk4q0-AQhcbfBa++2L793uYJw1z=Q@mail.gmail.com>
 <CAF6FJityxsJ4xjqUChdA1D=JynfTjGf-mGWM9JObYB+3decUsw@mail.gmail.com>
 <CAB6mnxKGG2YFh-mkcxEk-NZSkqW6+wt_uVX3Oy5kviPO4bWwUg@mail.gmail.com>
 <CAF6FJivUE0SWA8vzUG6OdxTesU+3azDN+fL3eRQcVfiuRd1ZyA@mail.gmail.com>
Message-ID: <CADViA5CEOYiKSq46EN3rOJH-h1+YOSqwCAuecDOc6sNfqP_5tQ@mail.gmail.com>

 How about this:

"There will be no concept of a separate RNG version. In order to get
consistent or reproducible results from the RNG, it will be necessary to
specify the NumPy version that was used to generate those results. Results
from the RNG may change across different releases of Num Py."

Sent from Astro <https://www.helloastro.com> for Mac

On 4. Jun 2018 at 10:47, Robert Kern <robert.kern at gmail.com> wrote:


On Sun, Jun 3, 2018 at 10:29 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Sun, Jun 3, 2018 at 11:03 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Sun, Jun 3, 2018 at 9:24 PM Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>> On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>>>
>>>> This policy was first instated in Nov 2008 (in essence; the full set of
>>>> weasel
>>>>
>>>
>>> Instituted?
>>>
>>
>> I meant "instated"; c.f. for another usage:
>> https://www.youredm.com/2018/06/01/spotify-new-policy-update/
>>
>> But "instituted" would work just as well. It may be that "instated a
>> policy" is just an idiosyncratic back-formation of "reinstated a policy",
>> which even to me feels more right.
>>
>> Not Versioning
>>>> --------------
>>>>
>>>> For a long time, we considered that the way to allow algorithmic
>>>> improvements
>>>> while maintaining the stream was to apply some form of versioning.
>>>> That is,
>>>> every time we make a stream change in one of the distributions, we
>>>> increment
>>>> some version number somewhere.  ``numpy.random`` would keep all past
>>>> versions
>>>> of the code, and there would be a way to get the old versions.
>>>> Proposals of
>>>> how to do this exactly varied widely, but we will not exhaustively list
>>>> them
>>>> here.  We spent years going back and forth on these designs and were
>>>> not able
>>>> to find one that sufficed.  Let that time lost, and more importantly,
>>>> the
>>>> contributors that we lost while we dithered, serve as evidence against
>>>> the
>>>> notion.
>>>>
>>>> Concretely, adding in versioning makes maintenance of ``numpy.random``
>>>> difficult.  Necessarily, we would be keeping lots of versions of the
>>>> same code
>>>> around.  Adding a new algorithm safely would still be quite hard.
>>>>
>>>> But most importantly, versioning is fundamentally difficult to *use*
>>>> correctly.
>>>> We want to make it easy and straightforward to get the latest, fastest,
>>>> best
>>>> versions of the distribution algorithms; otherwise, what's the point?
>>>> The way
>>>> to make that easy is to make the latest the default.  But the default
>>>> will
>>>> necessarily change from release to release, so the user?s code would
>>>> need to be
>>>> altered anyway to specify the specific version that one wants to
>>>> replicate.
>>>>
>>>> Adding in versioning to maintain stream-compatibility would still only
>>>> provide
>>>> the same level of stream-compatibility that we currently do, with all
>>>> of the
>>>> limitations described earlier.  Given that the standard practice for
>>>> such needs
>>>> is to pin the release of ``numpy`` as a whole, versioning
>>>> ``RandomState`` alone
>>>> is superfluous.
>>>>
>>>
>>> This section is a bit unclear. Would it be correct to say that the rng
>>> version is the numpy version? If so, it might be best to say that up front
>>> before justifying it.
>>>
>>
>> I'm sorry, I'm unclear on what you are asking me to make clearer. There
>> is currently no such thing as "the rng version". The thrust of this section
>> of the NEP is to reject the previously floated idea of introducing the
>> concept at all. So I would certainly not say anything along the lines that
>> "the rng version is the numpy version". I do say, here and earlier, that
>> the way to get the same RNG code is to get the same version of numpy.
>>
>
> Just so, and you could make that clearer, as you do here.
>

I don't understand. All I did was repeat what I already said twice. If
you'd like to provide some text that would have clarified things for you,
I'll see about inserting it, but I'm at a loss for writing that text.

-- 
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/3b7696fb/attachment.html>

From kevin.k.sheppard at gmail.com  Mon Jun  4 02:05:56 2018
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 4 Jun 2018 07:05:56 +0100
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <mailman.2290.1528077246.2803.numpy-discussion@python.org>
References: <mailman.2290.1528077246.2803.numpy-discussion@python.org>
Message-ID: <5b14d6c4.1c69fb81.21ba2.13ed@mx.google.com>

The seed() discussion seems unnecessary.  StableRandom will need to have a method to set/get state
which can be used by any project that needs to get reproducible numbers from the module-level generator.

While this is an implementation detail, many generators have much smaller states than MT19937 
(a few uint64s). So this is easy enough to hard code where needed.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/ed9a0333/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 02:18:21 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 23:18:21 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <5b14d6c4.1c69fb81.21ba2.13ed@mx.google.com>
References: <mailman.2290.1528077246.2803.numpy-discussion@python.org>
 <5b14d6c4.1c69fb81.21ba2.13ed@mx.google.com>
Message-ID: <CAF6FJitW7-fyh4Yy1tvhiRxUm50_L59L6Hs34ovqXsdR1_pVGA@mail.gmail.com>

On Sun, Jun 3, 2018 at 11:07 PM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> The seed() discussion seems unnecessary.  StableRandom will need to have a
> method to set/get state
>
> which can be used by any project that needs to get reproducible numbers
> from the module-level generator.
>
>
>
> While this is an implementation detail, many generators have much smaller
> states than MT19937
>
> (a few uint64s). So this is easy enough to hard code where needed.
>

The question isn't about what .seed() methods look like on the new
generators. Rather, it's about the behavior when code calls
numpy.random.seed() then numpy.random.uniform() (or one of the other
convenience aliases). Specifically, there will be a period of time when
RandomState is merely deprecated but is still expected to be there and be
fully backwards-compatible to give reproducible streams. Does that
expectation extend to code that uses numpy.random.seed() to get that
reproducibility? What happens with code that just calls
numpy.random.uniform(): does it use RandomState or the new code?

These questions are probably in-scope for this NEP, but I'd like to get
some kind of consensus on the rest first, as the higher level decisions
will tell us more about what we want to do for numpy.random.seed().

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/aed38381/attachment-0001.html>

From shoyer at gmail.com  Mon Jun  4 02:19:51 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 3 Jun 2018 23:19:51 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>
Message-ID: <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>

On Sun, Jun 3, 2018 at 9:54 PM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Mixed return values of NotImplementedButCoercible and NotImplemented would
> still result in TypeError, and there would be no second chances for
> overloads.
>
>
> I would like to differ with you here: It can be quite useful to have
> second chances for overloads. Think ``np.func(list, custom_array))``: If
> second rounds did not exist, custom_array would need to have a list of
> coercible types (which is not nice IMO).
>

Even if we did this, we would still want to preserve the equivalence
between:
1. Returning NotImplementedButCoercible from __array_ufunc__ or
__array_function__, and
2. Not implementing __array_ufunc__ or __array_function__ at all.

Changing __array_ufunc__ to do multiple rounds of checks could indeed be
useful in some cases, and you're right that it would not change existing
behavior (in these cases we currently raise TypeError). But I'd rather
leave that for a separate discussion, because it's orthogonal to our
proposal here for __array_function__.

(Personally, I don't think it would be worth the additional complexity.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/f1e34a66/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 02:22:57 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 3 Jun 2018 23:22:57 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+CJBjJUcy6VauMrPXfSWhaEwoCFbb0-RdnNGe9ObFvm+w@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAEQ_Tvf_tAJ0QV1UDuADjWT=c-N6oNdP8Wybf=wDNbCJOcb4OA@mail.gmail.com>
 <CAMMTP+CJBjJUcy6VauMrPXfSWhaEwoCFbb0-RdnNGe9ObFvm+w@mail.gmail.com>
Message-ID: <CAF6FJitqaHG_PuF=zZR5KdJo_bRVZUY2usFu3ZBGBZzwZQfiCg@mail.gmail.com>

On Sun, Jun 3, 2018 at 10:27 PM <josef.pktd at gmail.com> wrote:

>
>
> On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>> It may be worth having a look at test suites for scipy, statsmodels,
>>> scikit-learn, etc. and estimate how much work this NEP causes those
>>> projects. If the devs of those packages are forced to do large scale
>>> migrations from RandomState to StableState, then why not instead keep
>>> RandomState and just add a new API next to it?
>>>
>>
>> Tests that explicitly create RandomState objects would not be difficult
>> to migrate. The goal of "StableState" is that it could be used directly in
>> cases where RandomState is current used in tests, so I would guess that
>> "RandomState" could be almost mechanistically replaced by "StableState".
>>
>> The challenging case are calls to np.random.seed(). If no replacement API
>> is planned, then these would need to be manually converted to use
>> StableState instead. This is probably not too onerous (and is a good
>> cleanup to do anyways) but it would be a bit of work.
>>
>
> I agree with this. Statsmodels uses mostly np.random.seed. That cleanup is
> planned, but postponed so far as not high priority. We will have to do it
> eventually.
>
> The main work will come when StableState doesn't include specific
> distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that
> we don't even use yet, like Beta.
>

I would posit that it is probably very rare that one uses the full breadth
of distributions in unit tests. You may be the only one. :-)


> I don't want to migrate random number generation for the distributions
> abandoned by numpy Stable to statsmodels.
>

What if we followed Kevin's suggestion and forked off RandomState into its
own forever-frozen package sooner rather than later? It's intended use
would be for people with legacy packages that cannot upgrade (other than
changing some imports) and for unit tests that require precise streams for
a full breadth of distributions. We would still leave it in numpy.random
for a deprecation period, but maybe we would be noisy about it sooner and
remove it sooner than my NEP planned for.

Would that work? I'd be happy to maintain that forked-RandomState for you.

I would probably still encourage most people to continue to use
StableRandom for most unit testing.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/3e44480b/attachment.html>

From antoine at python.org  Mon Jun  4 04:50:52 2018
From: antoine at python.org (Antoine Pitrou)
Date: Mon, 4 Jun 2018 10:50:52 +0200
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
Message-ID: <92186546-e2b6-062a-b446-54704af065bf@python.org>


Hi,

Do you plan to consider trying to add PEP 574 / pickle5 support? There's
an implementation ready (and a PyPI backport) that you can play with.
https://www.python.org/dev/peps/pep-0574/

PEP 574 implicits targets Numpy arrays as one of its primary producers,
since Numpy arrays is how large scientific or numerical data often ends
up represented and where zero-copy is often desired by users.

PEP 574 could certainly be useful even without Numpy arrays supporting
it, but less so.  So I would welcome any feedback on that front (and,
given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd
ideally like to have that feedback sometimes in the forthcoming months ;-)).

Best regards

Antoine.


On Thu, 31 May 2018 16:50:02 -0700
Matti Picus <matti.picus at gmail.com> wrote:
> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> we spent some time brainstorming about a roadmap for NumPy, in the
> spirit of similar work that was done for Jupyter. The idea is that a
> document with wide community acceptance can guide the work of the
> full-time developer(s), and be a source of ideas for expanding
> development efforts.
>
> I put the document up at
> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> it at a BOF session during SciPy in the middle of July in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti

From kevin.k.sheppard at gmail.com  Mon Jun  4 05:54:27 2018
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 4 Jun 2018 09:54:27 +0000
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <mailman.2347.1528091288.2803.numpy-discussion@python.org>
References: <mailman.2347.1528091288.2803.numpy-discussion@python.org>
Message-ID: <AM5PR0701MB2946977766C47DE90A0EE046FE670@AM5PR0701MB2946.eurprd07.prod.outlook.com>

I?m not sure if this is within the scope of the NEP or an implementation detail, but I think a new PRNG should use platform independent integer types rather than depending on the platform?s choice of 64-bit data model.  This should be enough to ensure that any integer distribution that only uses integers internally should produce identical results across uarch/OS.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/d3f4b0a8/attachment-0001.html>

From harrigan.matthew at gmail.com  Mon Jun  4 07:28:09 2018
From: harrigan.matthew at gmail.com (Matthew Harrigan)
Date: Mon, 4 Jun 2018 07:28:09 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>
 <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>
Message-ID: <CAOfRF=g+n5Sajhj2CLFAnB_hUH9d=MnqcXsz6+Lo9CTiz0XS3Q@mail.gmail.com>

Should there be discussion of typing (pep-484) or abstract base classes in
this nep?  Are there any requirements on the result returned by
__array_function__?

On Mon, Jun 4, 2018, 2:20 AM Stephan Hoyer <shoyer at gmail.com> wrote:

>
> On Sun, Jun 3, 2018 at 9:54 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>> Mixed return values of NotImplementedButCoercible and NotImplemented
>> would still result in TypeError, and there would be no second chances for
>> overloads.
>>
>>
>> I would like to differ with you here: It can be quite useful to have
>> second chances for overloads. Think ``np.func(list, custom_array))``: If
>> second rounds did not exist, custom_array would need to have a list of
>> coercible types (which is not nice IMO).
>>
>
> Even if we did this, we would still want to preserve the equivalence
> between:
> 1. Returning NotImplementedButCoercible from __array_ufunc__ or
> __array_function__, and
> 2. Not implementing __array_ufunc__ or __array_function__ at all.
>
> Changing __array_ufunc__ to do multiple rounds of checks could indeed be
> useful in some cases, and you're right that it would not change existing
> behavior (in these cases we currently raise TypeError). But I'd rather
> leave that for a separate discussion, because it's orthogonal to our
> proposal here for __array_function__.
>
> (Personally, I don't think it would be worth the additional complexity.)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/f7fef143/attachment.html>

From josef.pktd at gmail.com  Mon Jun  4 08:29:26 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 4 Jun 2018 08:29:26 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJitqaHG_PuF=zZR5KdJo_bRVZUY2usFu3ZBGBZzwZQfiCg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAEQ_Tvf_tAJ0QV1UDuADjWT=c-N6oNdP8Wybf=wDNbCJOcb4OA@mail.gmail.com>
 <CAMMTP+CJBjJUcy6VauMrPXfSWhaEwoCFbb0-RdnNGe9ObFvm+w@mail.gmail.com>
 <CAF6FJitqaHG_PuF=zZR5KdJo_bRVZUY2usFu3ZBGBZzwZQfiCg@mail.gmail.com>
Message-ID: <CAMMTP+Cy9LVHMmFYXXVPpBADCxkQKd8gQYU4Up=AV6J=wxhaBw@mail.gmail.com>

On Mon, Jun 4, 2018 at 2:22 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 10:27 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>> It may be worth having a look at test suites for scipy, statsmodels,
>>>> scikit-learn, etc. and estimate how much work this NEP causes those
>>>> projects. If the devs of those packages are forced to do large scale
>>>> migrations from RandomState to StableState, then why not instead keep
>>>> RandomState and just add a new API next to it?
>>>>
>>>
>>> Tests that explicitly create RandomState objects would not be difficult
>>> to migrate. The goal of "StableState" is that it could be used directly in
>>> cases where RandomState is current used in tests, so I would guess that
>>> "RandomState" could be almost mechanistically replaced by "StableState".
>>>
>>> The challenging case are calls to np.random.seed(). If no replacement
>>> API is planned, then these would need to be manually converted to use
>>> StableState instead. This is probably not too onerous (and is a good
>>> cleanup to do anyways) but it would be a bit of work.
>>>
>>
>> I agree with this. Statsmodels uses mostly np.random.seed. That cleanup
>> is planned, but postponed so far as not high priority. We will have to do
>> it eventually.
>>
>> The main work will come when StableState doesn't include specific
>> distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that
>> we don't even use yet, like Beta.
>>
>
> I would posit that it is probably very rare that one uses the full breadth
> of distributions in unit tests. You may be the only one. :-)
>

Given that I'm one of the maintainers for Statistics in Python, I wouldn't
be surprised if I would use more than almost all others.
However, statsmodels doesn't use a very large set, there are other packages
that use Pareto and Extreme Value distributions or circular distributions
like vonmises which are not yet in statsmodels. I have no idea about
whether MCMC packages still rely on numpy.random.

But the main "user" of numpy's random is scipy.stats which might be using
almost all of the distributions. I don't have a current overview about how
much scipy.stats unit tests rely on having stable streams for the available
distributions.


>
>
>> I don't want to migrate random number generation for the distributions
>> abandoned by numpy Stable to statsmodels.
>>
>
> What if we followed Kevin's suggestion and forked off RandomState into its
> own forever-frozen package sooner rather than later? It's intended use
> would be for people with legacy packages that cannot upgrade (other than
> changing some imports) and for unit tests that require precise streams for
> a full breadth of distributions. We would still leave it in numpy.random
> for a deprecation period, but maybe we would be noisy about it sooner and
> remove it sooner than my NEP planned for.
>
> Would that work? I'd be happy to maintain that forked-RandomState for you.
>

It would not be nice to have to add another dependency, but that would work
for statsmodels.

I'm not sure whether scipy.stats maintainers are fine with it. Given that
scipy already uses RandomState instead of the global instance, the actual
change if distributions are available would be to swap a StableState for a
RandomState in the unit tests, AFAIK.

Josef


>
> I would probably still encourage most people to continue to use
> StableRandom for most unit testing.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/ac093e8c/attachment.html>

From m.h.vankerkwijk at gmail.com  Mon Jun  4 10:34:49 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 4 Jun 2018 10:34:49 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
Message-ID: <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>

Hi Stephan,

Another potential consideration in favor of NotImplementedButCoercible is
> for subclassing: we could use it to write the default implementations of
> ndarray.__array_ufunc__ and ndarray.__array_function__, e.g.,
>
> class ndarray:
>     def __array_ufunc__(self, *args, **kwargs):
>         return NotIImplementedButCoercible
>     def __array_function__(self, *args, **kwargs):
>         return NotIImplementedButCoercible
>
> I think (not 100% sure yet) this would result in exactly equivalent
> behavior to what ndarray.__array_ufunc__ currently does:
> http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#
> subclass-hierarchies
>

As written would not work for ndarray subclasses, because the subclass will
generically change itself before calling super. At least for Quantity, say
if I add two quantities, the quantities will both be converted to arrays
(with one scaled so that the units match) and then the super call is done
with those modified arrays. This expects that the super call will actually
return a result (which it now can because all inputs are arrays).

But I think it would work to return `NotImplementedButCoercible` in the
case that perhaps you had in mind in the first place, in which any of the
*other* arguments had a `__array_ufunc__` implementation and `ndarray` thus
does not know what to do. For those cases, `ndarray` currently returns a
straight `NotImplemented`.

Though I am still a bit worried: this gets back to
`Quantity.__array_ufunc__`, but what does it do with it? It cannot just
pass it on, since then it is effectively telling, incorrectly, that the
*quantity* is coercible, which it is not. I guess at this point it would
have to change it to `NotImplemented`. Looking at my current
implementation, I see that if we made this change to
`ndarray.__array_ufunc__`, the implementation would mostly raise an
exception as it tried to view `NotImplementedButCoercible` as a quantity,
except for comparisons, where the output is not viewed at all (being
boolean and thus unit-less) and passed straight down. That said, we've said
the __array_ufunc__ implementation is experimental, so I think such small
annoyances are OK.

Overall, it is an intriguing idea, and I think it should be mentioned at
least in the NEP. It would be good, though, to have a few more examples of
how it would work in practice.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/7d95d292/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Mon Jun  4 10:37:19 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 4 Jun 2018 10:37:19 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>
 <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>
Message-ID: <CAJNV+9u04DTQZCBj9jLXF1jtwwVPTziLEg_7PuW=F+g8LH1-PA@mail.gmail.com>

I agree that second rounds of overloads have to be left to the implementers
of `__array_function__` - obviously, though, we should be sure that these
rounds are rarely necessary...  The link posted by Stephan [1] has some
decent discussion for `__array_ufunc__` about when an override should
re-call the function rather than try to do something itself.

-- Marten

[1]
http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/a2fad4ba/attachment.html>

From shoyer at gmail.com  Mon Jun  4 11:09:35 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 4 Jun 2018 08:09:35 -0700
Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning
In-Reply-To: <92186546-e2b6-062a-b446-54704af065bf@python.org>
References: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com>
 <92186546-e2b6-062a-b446-54704af065bf@python.org>
Message-ID: <CAEQ_TvdyiLehosrwAz8M4iPjbxA6pOR5Vha2eggwUg7OCviUOg@mail.gmail.com>

PEP-574 isn't on the roadmap (yet!), but I think we would clearly welcome
it. Like all NumPy improvements, it would need to implemented by an
interested party.
On Mon, Jun 4, 2018 at 1:52 AM Antoine Pitrou <antoine at python.org> wrote:

>
> Hi,
>
> Do you plan to consider trying to add PEP 574 / pickle5 support? There's
> an implementation ready (and a PyPI backport) that you can play with.
> https://www.python.org/dev/peps/pep-0574/
>
> PEP 574 implicits targets Numpy arrays as one of its primary producers,
> since Numpy arrays is how large scientific or numerical data often ends
> up represented and where zero-copy is often desired by users.
>
> PEP 574 could certainly be useful even without Numpy arrays supporting
> it, but less so.  So I would welcome any feedback on that front (and,
> given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd
> ideally like to have that feedback sometimes in the forthcoming months
> ;-)).
>
> Best regards
>
> Antoine.
>
>
> On Thu, 31 May 2018 16:50:02 -0700
> Matti Picus <matti.picus at gmail.com> wrote:
> > At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> > we spent some time brainstorming about a roadmap for NumPy, in the
> > spirit of similar work that was done for Jupyter. The idea is that a
> > document with wide community acceptance can guide the work of the
> > full-time developer(s), and be a source of ideas for expanding
> > development efforts.
> >
> > I put the document up at
> > https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> > it at a BOF session during SciPy in the middle of July in Austin.
> >
> > Eventually it could become a NEP or formalized in another way.
> >
> > Matti
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/6885d799/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 13:58:59 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 4 Jun 2018 10:58:59 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <AM5PR0701MB2946977766C47DE90A0EE046FE670@AM5PR0701MB2946.eurprd07.prod.outlook.com>
References: <mailman.2347.1528091288.2803.numpy-discussion@python.org>
 <AM5PR0701MB2946977766C47DE90A0EE046FE670@AM5PR0701MB2946.eurprd07.prod.outlook.com>
Message-ID: <CAF6FJiv8tZiacWaWZWOiad1Q+FfpRUXBYDvNsc2rWdxPjwbvNQ@mail.gmail.com>

On Mon, Jun 4, 2018 at 2:55 AM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> I?m not sure if this is within the scope of the NEP or an implementation
> detail, but I think a new PRNG should use platform independent integer
> types rather than depending on the platform?s choice of 64-bit data model.
> This should be enough to ensure that any integer distribution that only
> uses integers internally should produce identical results across uarch/OS.
>

Probably an implementation detail (possibly one that ought to be worked out
in its own NEP).

I know that I would like it if the new system had all of the same
distribution methods as RandomState currently does, such that we can drop
in the new generator objects in places where RandomState is currently used,
and everything would still work (just with a different stream). Might want
to add a statement to that effect in this NEP. I think it's likely "good
enough" if the integer distributions now return uint64 arrays instead of
uint32 arrays on Windows.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/9bd10820/attachment.html>

From robert.kern at gmail.com  Mon Jun  4 18:18:25 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 4 Jun 2018 15:18:25 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
Message-ID: <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>

On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>

The problem is that we can't really have an ecosystem with two different
general purpose systems. To properly use pseudorandom numbers, I need to
instantiate a PRNG and thread it through all of the code in my program:
both the parts that I write and the third party libraries that I don't
write.

Generating test data for unit tests is separable, though. That's why I
propose having a StableRandom built on the new architecture. Its purpose
would be well-documented, and in my proposal is limited in features such
that it will be less likely to be abused outside of that purpose. If you
make it fully-featured, it is more likely to be abused by building library
code around it. But even if it is so abused, because it is built on the new
architecture, at least I can thread the same core PRNG state through the
StableRandom distributions from the abusing library and use the better
distributions class elsewhere (randomgen names it "Generator"). Just
keeping RandomState around can't work like that because it doesn't have a
replaceable core PRNG.

But that does suggest another alternative that we should explore:

The new architecture separates the core uniform PRNG from the wide variety
of non-uniform probability distributions. That is, the core PRNG state is
encapsulated in a discrete object that can be shared between instances of
different distribution-providing classes. numpy.random should provide two
such distribution-providing classes. The main one (let us call it
``Generator``, as it is called in the prototype) will follow the new
policy: distribution methods can break the stream in feature releases.
There will also be a secondary distributions class (let us call it
``LegacyGenerator``) which contains distribution methods exactly as they
exist in the current ``RandomState`` implementation. When one combines
``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
exact same stream as ``RandomState`` for all distribution methods. The
``LegacyGenerator`` methods will be forever frozen.
``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
the MT19937 core PRNG, and whatever tricks needed to make
``isinstance(prng, RandomState)`` and unpickling work should be done. This
way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
deprecated, becoming progressively noisier over a number of release cycles,
in favor of explicitly instantiating ``LegacyGenerator``.

``LegacyGenerator`` CAN be used during this deprecation period in library
and application code until libraries and applications can migrate to the
new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT
be forced to. ``LegacyGenerator`` CAN be used to generate test data for
unit tests where cross-release stability of the streams is important. Test
writers SHOULD consider ways to mitigate their reliance on such stability
and SHOULD limit their usage to distribution methods that have fewer
cross-platform stability risks.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/bbd47848/attachment.html>

From charlesr.harris at gmail.com  Tue Jun  5 10:56:42 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 5 Jun 2018 08:56:42 -0600
Subject: [Numpy-discussion] NumPy 1.14.4 release
Message-ID: <CAB6mnx+zzNep=b7eQy20bRL5Ch3DHJz_Zpf9cB5RbvU9hEfkBg@mail.gmail.com>

Hi All,

The release notes for the NumPy 1.14.4
<https://github.com/numpy/numpy/pull/11251> release are up.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/0217050a/attachment.html>

From shoyer at gmail.com  Tue Jun  5 14:34:18 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 5 Jun 2018 11:34:18 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAOfRF=g+n5Sajhj2CLFAnB_hUH9d=MnqcXsz6+Lo9CTiz0XS3Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CADViA5CyQVYyeKCrvW+UBMMD+ww2gNBGWvaRicHGVsyZsD_+Hg@mail.gmail.com>
 <CAEQ_Tvd_zsp2i=JDw-A3NTHnu7a5+jC2ibPJR9Gy1vw_2GdNbA@mail.gmail.com>
 <CAOfRF=g+n5Sajhj2CLFAnB_hUH9d=MnqcXsz6+Lo9CTiz0XS3Q@mail.gmail.com>
Message-ID: <CAEQ_Tvd+6B3y-yfLpNrpuU6J0y5y7u0K_6HVRi9HS3v-pCK4=Q@mail.gmail.com>

On Mon, Jun 4, 2018 at 5:39 AM Matthew Harrigan <harrigan.matthew at gmail.com>
wrote:

> Should there be discussion of typing (pep-484) or abstract base classes in
> this nep?  Are there any requirements on the result returned by
> __array_function__?
>

This is a good question that should be addressed in the NEP. Currently, we
impose no limitations on the types returned by __array_function__ (or
__array_ufunc__, for that matter). Given the complexity of potential
__array_function__ implementations, I think this would be hard/impossible
to do in general.

I think the best case scenario we could hope for is that type checkers
would identify that result of NumPy functions as:
- numpy.ndarray if all inputs are numpy.ndarray objects
- Any if any non-numpy.ndarray inputs implement the __array_function__

Based on my understanding of proposed rules for typing protocols [1] and
overloads [2], I think this could just work, e.g.,

@overload
def func(array: np.ndarray) -> np.ndarray: ...
@overload
def func(array: ImplementsArrayFunction) -> Any: ...

[1] https://www.python.org/dev/peps/pep-0544/
[2] https://github.com/python/typing/issues/253#issuecomment-389262904
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/72e9f820/attachment.html>

From shoyer at gmail.com  Tue Jun  5 14:49:00 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 5 Jun 2018 11:49:00 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
Message-ID: <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>

On Mon, Jun 4, 2018 at 7:35 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Stephan,
>
> Another potential consideration in favor of NotImplementedButCoercible is
>> for subclassing: we could use it to write the default implementations of
>> ndarray.__array_ufunc__ and ndarray.__array_function__, e.g.,
>>
>> class ndarray:
>>     def __array_ufunc__(self, *args, **kwargs):
>>         return NotIImplementedButCoercible
>>     def __array_function__(self, *args, **kwargs):
>>         return NotIImplementedButCoercible
>>
>> I think (not 100% sure yet) this would result in exactly equivalent
>> behavior to what ndarray.__array_ufunc__ currently does:
>>
>> http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies
>>
>
> As written would not work for ndarray subclasses, because the subclass
> will generically change itself before calling super. At least for Quantity,
> say if I add two quantities, the quantities will both be converted to
> arrays (with one scaled so that the units match) and then the super call is
> done with those modified arrays. This expects that the super call will
> actually return a result (which it now can because all inputs are arrays).
>

Thanks for clarifying. This is definitely trickier than I had thought.

If Quantity.__array_ufunc__ implemented overrides by calling the public
ufunc method again (instead of calling super), then it would still work
fine with this change. But of course, in that case you would not need
ndarray.__array_ufunc__ defined at all.

I will say that personally, I find the complexity of the current
ndarray.__array_ufunc__ implementation a little inelegant, and I would
welcome simplifying it. But I also try to avoid implementation inheritance
entirely [2], for exactly the same reasons why refactoring
ndarray.__array_ufunc__ here would be difficult (inheritance is fragile).
So I would be happy to defer to your judgment, as someone who actually uses
subclassing.

https://hackernoon.com/inheritance-based-on-internal-structure-is-evil-7474cc8e64dc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/1b486635/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun  5 15:33:40 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 5 Jun 2018 15:33:40 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
Message-ID: <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>

Hi Stephan,

Things would, I think, make much more sense if `ndarray.__array_ufunc__`
(or `*_function__`) actually *were* the implementation for array-only. But
while that is something I'd like to eventually get to, it seems out of
scope for the current discussion. But we should be sure that the ndarray
versions return either `NotImplemented` or a result.

Given that, I think that perhaps it is also best not to do
`NotImplementedButCoercible` - as I think the implementers of
`__array_function__` perhaps should just do that themselves. But I may well
swing the other way again... Good examples of non-trivial benefits would
help.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/7aac7d2c/attachment.html>

From shoyer at gmail.com  Tue Jun  5 17:11:23 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 5 Jun 2018 14:11:23 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
Message-ID: <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>

On Tue, Jun 5, 2018 at 12:35 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Things would, I think, make much more sense if `ndarray.__array_ufunc__`
> (or `*_function__`) actually *were* the implementation for array-only. But
> while that is something I'd like to eventually get to, it seems out of
> scope for the current discussion.
>

If this is a desirable end-state, we should at least consider it now while
we are designing the __array_function__ interface.

With the current proposal, I think this would be nearly impossible. The
challenge is that ndarray.__array_function__ would somehow need to call the
non-overloaded version of the provided function provided that no other
arguments overload __array_function__. However, currently don't expose this
information in any way.

Some ways this could be done (including some of your prior suggestions):
- Add a coerce=True argument to all NumPy functions, which could be used by
non-overloaded implementations.
- A separate namespace for non-overloaded functions (e.g.,
numpy.array_only).
- Adding another argument to the __array_function__ interface to explicitly
provide the non-overloaded implementation (e.g., func_impl).

I don't like any of these options and I'm not sure I agree with your goal,
but the NEP should make clear that we are precluding this possibility.

Given that, I think that perhaps it is also best not to do
> `NotImplementedButCoercible` - as I think the implementers of
> `__array_function__` perhaps should just do that themselves. But I may well
> swing the other way again... Good examples of non-trivial benefits would
> help.
>

This would also be my default stance, and of course we can always add
NotImplementedButCoercible later.

I can think of two main use cases:
1. Libraries that only want to overload *some* NumPy functions, but want
the rest of NumPy's API by coercing arguments to NumPy arrays.
2. Library that want to eventually overload all of NumPy's high level API,
but need to do so incrementally, in a way that preserves backwards
compatibility.

I'm not sure I agree with use case 1. Arguably, libraries that only
overload a limited part of NumPy's API shouldn't encourage their users
their users to rely on it. This state of affairs is pretty confusing to
users.

However, case 2 is valid and potentially important. Consider the case of a
library with existing users that would like to start implementing
__array_function__ (e.g., dask, astropy, xarray, pandas). The right
strategy really depends upon whether the library considers the current
behavior of NumPy functions on their objects (silent coercion to numpy
arrays) a feature or a bug:
- If coercion is a bug and something that the library never intended to
support, then perhaps it would be OK to suddenly change all existing
overloads to return the correct type.
- However, if coercion is a feature (which is probably the attitude of at
least some users), ideally there really should be a graceful way to enable
the new overloaded behavior incrementally. For example, a library might
want to start issuing FutureWarning in version X, before switching over to
the new overloaded behavior in version X+1. I can't think of how to do this
without NotImplementedButCoercible.

For projects like dask and xarray, the benefits of __array_function__ are
so large that we will accept a hard transition that breaks some user code
without warning. But this may not be the case for other projects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/b75a2ec7/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Tue Jun  5 17:31:38 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 5 Jun 2018 17:31:38 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
Message-ID: <CAJNV+9t2iuxMP5MW_Fj2eC3FJKW5gsNecJs7NZmL=c4-ep6jqA@mail.gmail.com>

Hi Stephan,

On `NotImplementedButCoercible`: don't forget that even a preliminary
implementation of `__array_function__` has always the choice of coercing
its own instances to ndarray and re-calling the function; that is really no
different from (though probably a bit slower than) what would happen if one
returned NIBC. It does require, however, a fairly efficient way of finding
arguments of one's own class, which is partially why I think it is
important for there to be a quick way to find instances of one's own type;
we should try to avoid having people to reimplement the dance. It may still
be that `types` is the right vehicle for this - it just depends on how much
of the state of the dance it carries.
?
On the "separate" name-space question: one thing it is not is particularly
difficult, especially if one works with a decorator: effectively one
already has the original function and the wrapped one; the only question is
whether it would pay to keep the original one around somewhere.

I do continue to think that we will get grumbling about regressions in
speed and that it would help to have the undecorated versions available.
Though in my ideal world those would do no coercing whatsoever, but just
take arrays, i.e., they are actually faster than the current ones.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/39bd4b7c/attachment.html>

From matti.picus at gmail.com  Tue Jun  5 17:43:10 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 5 Jun 2018 14:43:10 -0700
Subject: [Numpy-discussion] 
 =?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
 =?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
Message-ID: <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>


On 05/06/18 14:11, Stephan Hoyer wrote:
> On Tue, Jun 5, 2018 at 12:35 PM Marten van Kerkwijk 
> <m.h.vankerkwijk at gmail.com <mailto:m.h.vankerkwijk at gmail.com>> wrote:
>
>     Things would, I think, make much more sense if
>     `ndarray.__array_ufunc__` (or `*_function__`) actually *were* the
>     implementation for array-only. But while that is something I'd
>     like to eventually get to, it seems out of scope for the current
>     discussion.
>
>
> If this is a desirable end-state, we should at least consider it now 
> while we are designing the __array_function__ interface.
>
> With the current proposal, I think this would be nearly impossible. 
> The challenge is that ndarray.__array_function__ would somehow need to 
> call the non-overloaded version of the provided function provided that 
> no other arguments overload __array_function__. However, currently 
> don't expose this information in any way.
>
> Some ways this could be done (including some of your prior suggestions):
> - Add?a coerce=True argument to all NumPy functions, which could be 
> used by non-overloaded implementations.
> - A separate namespace for non-overloaded functions (e.g., 
> numpy.array_only).
> - Adding another argument to the __array_function__ interface to 
> explicitly provide the non-overloaded implementation (e.g., func_impl).
>
> I don't like any of these options and I'm not sure I agree with your 
> goal, but the NEP should make clear that we are precluding this 
> possibility.
>

What is the difference between the `func` provided as the first argument 
to `__array_function__` and `__array_ufunc__` and the "non-overloaded 
version of the provided function"?

This NEP calls it an "arbitrary callable".
In `__array_ufunc__` it turns out people count on it being exactly the 
`np.ufunc`.

Matti

From shoyer at gmail.com  Tue Jun  5 18:03:32 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 5 Jun 2018 15:03:32 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
 <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>
Message-ID: <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>

On Tue, Jun 5, 2018 at 2:47 PM Matti Picus <matti.picus at gmail.com> wrote:

> What is the difference between the `func` provided as the first argument
> to `__array_function__` and `__array_ufunc__` and the "non-overloaded
> version of the provided function"?
>

The ""non-overloaded version of the provided function" is entirely
hypothetical at this point.

If we use a decorator to implement overloads, it would be the undecorated
function, e.g., the original definition of concatenate here:

@overload_for_array_function(['arrays', 'out'])def concatenate(arrays,
axis=0, out=None):
    ... # continue with the definition of concatenate


This NEP calls it an "arbitrary callable".
> In `__array_ufunc__` it turns out people count on it being exactly the
> `np.ufunc`.


Right, I think this is good guarantee to provide. Certainly it's one that
people fine useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/be907956/attachment.html>

From nelle.varoquaux at gmail.com  Tue Jun  5 20:06:37 2018
From: nelle.varoquaux at gmail.com (Nelle Varoquaux)
Date: Tue, 5 Jun 2018 17:06:37 -0700
Subject: [Numpy-discussion] 2018 John Hunter Excellence in Plotting Contest
Message-ID: <CAE-UAvT7RwrNa50acnKKszWNqe8ceFb5vNSopo8ap0KOHBZimg@mail.gmail.com>

Hello everyone,

Sorry about the cross-posting.

There's a couple more days to submit to the John Hunter Excellence in
Plotting Competition!
If you have any scientific plot worth sharing, submit an entry before June
8th.

For more information, see below.

Thanks,
Nelle

In memory of John Hunter, we are pleased to be reviving the SciPy John
Hunter Excellence in Plotting Competition for 2018. This open competition
aims to highlight the importance of data visualization to scientific
progress and showcase the capabilities of open source software.

Participants are invited to submit scientific plots to be judged by a
panel. The winning entries will be announced and displayed at the
conference.

John Hunter?s family and NumFocus are graciously sponsoring cash prizes for
the winners in the following amounts:


   -

   1st prize: $1000
   -

   2nd prize: $750
   -

   3rd prize: $500


   -

   Entries must be submitted by June, 8th to the form at
   https://goo.gl/forms/7q86zgu5OYUOjODH3
   <https://goo.gl/forms/7q86zgu5OYUOjODH3>.
   -

   Winners will be announced at Scipy 2018 in Austin, TX.
   -

   Participants do not need to attend the Scipy conference.
   -

   Entries may take the definition of ?visualization? rather broadly.
   Entries may be, for example, a traditional printed plot, an interactive
   visualization for the web, or an animation.
   -

   Source code for the plot must be provided, in the form of Python code
   and/or a Jupyter notebook, along with a rendering of the plot in a widely
   used format.  This may be, for example, PDF for print, standalone HTML and
   Javascript for an interactive plot, or MPEG-4 for a video. If the original
   data can not be shared for reasons of size or licensing, "fake" data may be
   substituted, along with an image of the plot using real data.
   -

   Each entry must include a 300-500 word abstract describing the plot and
   its importance for a general scientific audience.
   -

   Entries will be judged on their clarity, innovation and aesthetics, but
   most importantly for their effectiveness in communicating a real-world
   problem. Entrants are encouraged to submit plots that were used during the
   course of research or work, rather than merely being hypothetical.
   -

   SciPy reserves the right to display any and all entries, whether
   prize-winning or not, at the conference, use in any materials or on its
   website, with attribution to the original author(s).


SciPy John Hunter Excellence in Plotting Competition Co-Chairs

Thomas Caswell

Michael Droettboom

Nelle Varoquaux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/06748474/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Tue Jun  5 20:32:49 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 5 Jun 2018 20:32:49 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
 <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
Message-ID: <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>

Yes, the function should definitely be the same as what the user called -
i.e., the decorated function. I'm only wondering if it would also be
possible to have access to the undecorated one (via `coerce` or
`ndarray.__array_function__` or otherwise).
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/e6260e77/attachment.html>

From nathan12343 at gmail.com  Tue Jun  5 20:39:33 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Tue, 5 Jun 2018 19:39:33 -0500
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
 <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
Message-ID: <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>

Hmm, does this mean the callable that gets passed into __array_ufunc__ will
change? I'm pretty sure that will break the dispatch mechanism I'm using in
my __array_ufunc__ implementation, which directly checks whether the
callable is in one of several tuples of functions that have different
behavior.

On Tue, Jun 5, 2018 at 7:32 PM, Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Yes, the function should definitely be the same as what the user called -
> i.e., the decorated function. I'm only wondering if it would also be
> possible to have access to the undecorated one (via `coerce` or
> `ndarray.__array_function__` or otherwise).
> -- Marten
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/d064d854/attachment.html>

From nathan12343 at gmail.com  Tue Jun  5 20:41:25 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Tue, 5 Jun 2018 19:41:25 -0500
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CADViA5DnugsRRVOTWHL7Wic6SMyYXUSpqQthGZXv-LicXLou4w@mail.gmail.com>
 <CAJNV+9tOFXY85RcOrs0rDQp8TdpZGmh1-DUyh4hb2=CTXZTV5Q@mail.gmail.com>
 <CAEQ_Tvcv8of3CQYe3mETVE9O0UFyquvFwH1KHkTWNiim+aTm7A@mail.gmail.com>
 <CAJNV+9smYFr0NWc5yJ_5isTAM0zZ2Jau+KoZ5TbZVY4LNphuPQ@mail.gmail.com>
 <CAEQ_TvdhAFAZG0fgQ=v84mQm0pg-jTkTn2bSVRcYDdPYARkVDA@mail.gmail.com>
 <CAJNV+9v5b1z2KdqVsQd9LJx1TYnFHHLtCP8pbN7d_OPTpxUJ0g@mail.gmail.com>
 <CAEQ_TvdHQLVVK0th-HwpFMDuDWUq-4PKr7zLyi-hB53zESwEMg@mail.gmail.com>
 <CAJNV+9u=dVuAqTQWddB3Pat9g9e4H9EVURDnnb_97TfWN_pGOg@mail.gmail.com>
 <CAEQ_TvdsSZ93qUeKRbN8MpSxKvUUPSfu8Bue9yYnjAT3DY97+Q@mail.gmail.com>
 <e916651b-4453-824f-90c4-ffeb7cfcd5b4@gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
Message-ID: <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>

Oh wait, since the decorated version of the ufunc will be the one in the
public numpy API it won't break. It would only break if the callable that
was passed in *wasn't* the decorated version, so it kinda *has* to pass in
the decorated function to preserve backward compatibility. Apologies for
the noise.


On Tue, Jun 5, 2018 at 7:39 PM, Nathan Goldbaum <nathan12343 at gmail.com>
wrote:

> Hmm, does this mean the callable that gets passed into __array_ufunc__
> will change? I'm pretty sure that will break the dispatch mechanism I'm
> using in my __array_ufunc__ implementation, which directly checks whether
> the callable is in one of several tuples of functions that have different
> behavior.
>
> On Tue, Jun 5, 2018 at 7:32 PM, Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Yes, the function should definitely be the same as what the user called -
>> i.e., the decorated function. I'm only wondering if it would also be
>> possible to have access to the undecorated one (via `coerce` or
>> `ndarray.__array_function__` or otherwise).
>> -- Marten
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180605/0deb0ab6/attachment.html>

From einstein.edison at gmail.com  Wed Jun  6 06:20:03 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Wed, 6 Jun 2018 12:20:03 +0200
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
Message-ID: <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>

On 6. Jun 2018 at 05:41, Nathan Goldbaum <nathan12343 at gmail.com> wrote:


Oh wait, since the decorated version of the ufunc will be the one in the
public numpy API it won't break. It would only break if the callable that
was passed in *wasn't* the decorated version, so it kinda *has* to pass in
the decorated function to preserve backward compatibility. Apologies for
the noise.


On Tue, Jun 5, 2018 at 7:39 PM, Nathan Goldbaum <nathan12343 at gmail.com>
wrote:

> Hmm, does this mean the callable that gets passed into __array_ufunc__
> will change? I'm pretty sure that will break the dispatch mechanism I'm
> using in my __array_ufunc__ implementation, which directly checks whether
> the callable is in one of several tuples of functions that have different
> behavior.
>

Section ?Non-Goals? states that Ufuncs will not be part of this protocol,
__array_ufunc__ will be used to override those as usual.

Sent from Astro <https://www.helloastro.com> for Mac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180606/30d75520/attachment-0001.html>

From charlesr.harris at gmail.com  Wed Jun  6 14:06:48 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 6 Jun 2018 12:06:48 -0600
Subject: [Numpy-discussion] NumPy 1.14.4 released.
Message-ID: <CAB6mnx+yQD+HMWJFMjxio0ycBKHDar7Q0pRkxQPLaUWTQT+yFw@mail.gmail.com>

Hi All,

On behalf of the NumPy team, I am pleased to announce the release of NumPy
1.14.4.
This is a bugfix release for bugs reported following the 1.14.3 release. The
most significant fixes are:

* fixes for compiler instruction reordering that resulted in NaN's not being
  properly propagated in `np.max` and `np.min`,

* fixes for bus faults on SPARC and older ARM due to incorrect alignment
  checks.

There are also improvements to printing of long doubles on PPC platforms.
All
is not yet perfect on that platform, the whitespace padding is still
incorrect
and is to be fixed in numpy 1.15, consequently NumPy still fails some
printing-related (and other) unit tests on ppc systems. However, the printed
values are now correct.

Note that NumPy will error on import if it detects incorrect float32 `dot`
results. This problem has been seen on the Mac when working in the Anaconda
enviroment and is due to a subtle interaction between MKL and PyQt5.  It is
not
strictly a NumPy problem, but it is best that users be aware of it.  See the
gh-8577 NumPy issue for more information.

The Python versions supported in this release are 2.7 and 3.4 - 3.6. Wheels
for all supported versions  are available from PIP
<https://pypi.org/project/numpy/> and source releases are available
on github <https://github.com/numpy/numpy/releases/tag/v1.14.4>. The source
releases were cythonized with Cython 0.28.2 and should
be compatible with the upcoming Python 3.7.

Contributors
============

A total of 7 people contributed to this release.  People with a "+" by their
names contributed a patch for the first time.

* Allan Haldane
* Charles Harris
* Marten van Kerkwijk
* Matti Picus
* Pauli Virtanen
* Ryan Soklaski +
* Sebastian Berg

Pull requests merged
====================

A total of 11 pull requests were merged for this release.

* #11104: BUG: str of DOUBLE_DOUBLE format wrong on ppc64
* #11170: TST: linalg: add regression test for gh-8577
* #11174: MAINT: add sanity-checks to be run at import time
* #11181: BUG: void dtype setup checked offset not actual pointer for
alignment
* #11194: BUG: Python2 doubles don't print correctly in interactive shell.
* #11198: BUG: optimizing compilers can reorder call to npy_get_floatstatus
* #11199: BUG: reduce using SSE only warns if inside SSE loop
* #11203: BUG: Bytes delimiter/comments in genfromtxt should be decoded


Cheers,

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180606/c33f1807/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  8 11:57:18 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 8 Jun 2018 11:57:18 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
 <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
Message-ID: <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>

Hi Stephan,

I think we're getting to the stage where an updated text would be useful.
For that, you may want to consider an actual implementation of, e.g., a
very simple function like `np.reshape` as well as a more complicated one
like `np.concatenate`, and in particular how the implementation finds out
where its own instances are located.

?All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180608/ece2a063/attachment.html>

From shoyer at gmail.com  Fri Jun  8 12:39:49 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Fri, 8 Jun 2018 09:39:49 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
 <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
 <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>
Message-ID: <CAEQ_Tvcj6i0k_YZifPNToCHheUwmatvhK_7dSA4qDZ7FA07nuw@mail.gmail.com>

On Fri, Jun 8, 2018 at 8:58 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> I think we're getting to the stage where an updated text would be useful.
>

Yes, I plan to work on this over the weekend. Stay tuned!


> For that, you may want to consider an actual implementation of, e.g., a
> very simple function like `np.reshape` as well as a more complicated one
> like `np.concatenate`
>

Yes, I agree that actual implementation (in Python rather than C for now)
would be useful.


> and in particular how the implementation finds out where its own instances
> are located.
>

I think we've discussed this before, but I don't think this is feasible to
solve in general given the diversity of wrapped APIs. If you want to find
the arguments in which a class' own instances appear, you will need to do
that in your overloaded function.

That said, if merely pulling out the flat list of arguments that are
checked for and/or implement __array_function__ would be enough, we can
probably figure out a way to expose that information.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180608/5d2597db/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  8 19:49:13 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 8 Jun 2018 19:49:13 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_Tvcj6i0k_YZifPNToCHheUwmatvhK_7dSA4qDZ7FA07nuw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
 <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
 <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>
 <CAEQ_Tvcj6i0k_YZifPNToCHheUwmatvhK_7dSA4qDZ7FA07nuw@mail.gmail.com>
Message-ID: <CAJNV+9vT+Rv4A6e5j25+4dW9z=KyPQZCCmufbxb6-Lm6dcJwFQ@mail.gmail.com>

> and in particular how the implementation finds out where its own instances
>> are located.
>>
>
> I think we've discussed this before, but I don't think this is feasible to
> solve in general given the diversity of wrapped APIs. If you want to find
> the arguments in which a class' own instances appear, you will need to do
> that in your overloaded function.
>
> That said, if merely pulling out the flat list of arguments that are
> checked for and/or implement __array_function__ would be enough, we can
> probably figure out a way to expose that information.
>

In the end, somewhere inside the "dance", you are checking for
`__array_function` - it would seem to me that at that point you know
exactly where you are, and it would not be difficult to something like
```
types[new_type] += [where_i_am]
```
(where here I assume types is a defaultdict(list))  - has the set of types
in keys and locations as values.

But easier to discuss whether this is easy with some sample code to look at!

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180608/0b60a30e/attachment.html>

From shoyer at gmail.com  Fri Jun  8 20:10:13 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Fri, 8 Jun 2018 17:10:13 -0700
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAJNV+9vT+Rv4A6e5j25+4dW9z=KyPQZCCmufbxb6-Lm6dcJwFQ@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
 <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
 <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>
 <CAEQ_Tvcj6i0k_YZifPNToCHheUwmatvhK_7dSA4qDZ7FA07nuw@mail.gmail.com>
 <CAJNV+9vT+Rv4A6e5j25+4dW9z=KyPQZCCmufbxb6-Lm6dcJwFQ@mail.gmail.com>
Message-ID: <CAEQ_TveQtckNSSW_bKutHXw4Zu36PwWsKeT-C1Az=GBjkE1eBw@mail.gmail.com>

(offlist)

To clarify, by "where_i_am" you mean something like the name of the
argument where it was found?

On Fri, Jun 8, 2018 at 4:49 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> and in particular how the implementation finds out where its own instances
>>> are located.
>>>
>>
>> I think we've discussed this before, but I don't think this is feasible
>> to solve in general given the diversity of wrapped APIs. If you want to
>> find the arguments in which a class' own instances appear, you will need to
>> do that in your overloaded function.
>>
>> That said, if merely pulling out the flat list of arguments that are
>> checked for and/or implement __array_function__ would be enough, we can
>> probably figure out a way to expose that information.
>>
>
> In the end, somewhere inside the "dance", you are checking for
> `__array_function` - it would seem to me that at that point you know
> exactly where you are, and it would not be difficult to something like
> ```
> types[new_type] += [where_i_am]
> ```
> (where here I assume types is a defaultdict(list))  - has the set of types
> in keys and locations as values.
>
> But easier to discuss whether this is easy with some sample code to look
> at!
>
> -- Marten
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180608/cc2b92cd/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  8 21:51:07 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 8 Jun 2018 21:51:07 -0400
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP=3A_Dispatch_Mechanism_for_NumPy?=
	=?utf-8?q?=E2=80=99s_high_level_API?=
In-Reply-To: <CAEQ_TveQtckNSSW_bKutHXw4Zu36PwWsKeT-C1Az=GBjkE1eBw@mail.gmail.com>
References: <CAEQ_TvfKXA+ZHi4LDcNAZEnrYetCSJiNKUm74tAe47_hrUUh_Q@mail.gmail.com>
 <CAEQ_TveDP459pZhHinMfO47N2f6Bxypst31a8HhLAoTvEgAhZQ@mail.gmail.com>
 <CAJNV+9uYQgdjOp3Zd5Q_J-2ufQETExH9UdxJQ+cgbGSK+KBM=w@mail.gmail.com>
 <CAJXewOkY8qG+a1X00iyF=BX4N+QLJQ+mJ0U4ZRM1UfVAJ6GEqw@mail.gmail.com>
 <CAJXewO=cyMQ8cVT2igcGJyWxAWQTdxTwr68egoCVrbJxZZuF9A@mail.gmail.com>
 <CADViA5D4QivJgdarX_aSd_synzoh77So2PZe7niG8fBB9Tf2hg@mail.gmail.com>
 <CAJNV+9uVcmG5KLTvswt7rorddaQMCcHYt3CjR7nK2TArxfeg3A@mail.gmail.com>
 <CAEQ_Tvcj6i0k_YZifPNToCHheUwmatvhK_7dSA4qDZ7FA07nuw@mail.gmail.com>
 <CAJNV+9vT+Rv4A6e5j25+4dW9z=KyPQZCCmufbxb6-Lm6dcJwFQ@mail.gmail.com>
 <CAEQ_TveQtckNSSW_bKutHXw4Zu36PwWsKeT-C1Az=GBjkE1eBw@mail.gmail.com>
Message-ID: <CAJNV+9vqHHg8TQpP2MGRmwTfqutBpjf-ux0JiROo7bDDWeXLbw@mail.gmail.com>

I meant whatever the state of the dance routine is, e.g., the way the
arguments are enumerated by the decorator
?(this is partially why some example code for the dance routine is needed
-- I am not 100% how this should work, just seems logical that if the dance
routine can understand it, so can __array_function__ implementations).

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180608/df77372b/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 10 12:27:32 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 10 Jun 2018 12:27:32 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
Message-ID: <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>

OK, I spent my Sunday morning writing a NEP. I hope this can lead to some
closure...
See https://github.com/numpy/numpy/pull/11297
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/4652054b/attachment.html>

From wieser.eric+numpy at gmail.com  Sun Jun 10 19:02:35 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 10 Jun 2018 16:02:35 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
Message-ID: <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>

Rendered here:
https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps/nep-0020-gufunc-signature-enhancement.rst


Eric

On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:

> OK, I spent my Sunday morning writing a NEP. I hope this can lead to some
> closure...
> See https://github.com/numpy/numpy/pull/11297
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/91e8923a/attachment.html>

From wieser.eric+numpy at gmail.com  Sun Jun 10 19:31:41 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 10 Jun 2018 16:31:41 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
Message-ID: <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>

Thanks for the writeup Marten,

Nathaniel:

Output shape feels very similar to
output dtype to me, so maybe the general way to handle this would be
to make the first callback take the input shapes+dtypes and return the
desired output shapes+dtypes?

This hits on an interesting alternative to frozen dimensions - np.cross
could just become a regular ufunc with signature np.dtype((float64, 3)),
np.dtype((float64, 3)) &rarr; np.dtype((float64, 3))

Furthermore, the expansion quickly becomes cumbersome. For instance, for
the all_equal signature of (n|1),(n|1)->() ?

I think this is only a good argument when used in conjunction with the
broadcasting syntax. I don?t think it?s a reason for matmul not to have
multiple signatures. Having multiple signatures is an disincentive to
introduced too many overloads of the same function, which seems like a good
thing to me

Summarizing my overall opinions:

   - I?m +0.5 on frozen dimensions. The use-cases seem reasonable, and it
   seems like an easy-ish way to get them. Allowing ufuncs to natively support
   subarray types might be a tidier solution, but that could come down the road
   - I?m -1 on optional dimensions: they seem to legitimize creating many
   overloads of gufuncs. I?m already not a fan of how matmul has special cases
   for lower dimensions that don?t generalize well. To me, the best way to
   handle matmul would be to use the proposed __array_function__ to handle
   the shape-based special-case dispatching, either by:
      - Inserting dimensions, and calling the true gufunc
      np.linalg.matmul_2d (which is a function I?d like direct access to
      anyway).
      - Dispatching to one of four ufuncs
   - Broadcasting dimensions:
      - I know you?re not suggesting this but: enabling broadcasting
      unconditionally for all gufuncs would be a bad idea, masking linalg bugs.
      (although einsum does support broadcasting?)
      - Does it really need a per-dimension flag, rather than a global one?
      Can you give a case where that?s useful?
      - If we?d already made all_equal a gufunc, I?d be +1 on adding
      broadcasting support to it
      - I?m -0.5 on the all_equal path in the first place. I think we
      either should have a more generic approach to combined ufuncs, or just
      declare them numbas job.
      - Can you come up with a broadcasting use-case that isn?t just
      chaining a reduction with a broadcasting ufunc?

Eric

On Sun, 10 Jun 2018 at 16:02 Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

Rendered here:
> https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps/nep-0020-gufunc-signature-enhancement.rst
>
>
> Eric
>
> On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> OK, I spent my Sunday morning writing a NEP. I hope this can lead to some
>> closure...
>> See https://github.com/numpy/numpy/pull/11297
>> -- Marten
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/8425297f/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Jun 10 20:26:35 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 17:26:35 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
Message-ID: <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>

On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> It may be worth having a look at test suites for scipy, statsmodels,
>> scikit-learn, etc. and estimate how much work this NEP causes those
>> projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
>>
>
> The problem is that we can't really have an ecosystem with two different
> general purpose systems.
>

Can't = prefer not to. But yes, that's true. That's not what I was saying
though. We want one generic one, and one meant for unit testing only. You
can achieve that in two ways:
1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
2. Add a new generic API, and document the current np.random API as being
meant for unit tests only, for other usage <new API> should be preferred.

(2) has a couple of pros:
- you're not forcing almost every library and end user out there to migrate
their unit tests.
- more design freedom for the new generic API. The current one is clearly
sub-optimal; in a new one you wouldn't have to expose all the global
state/functions that np.random exposes now. You could even restrict it to a
single class and put that in the main numpy namespace.

Ralf


To properly use pseudorandom numbers, I need to instantiate a PRNG and
> thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
>
> Generating test data for unit tests is separable, though. That's why I
> propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
>
> But that does suggest another alternative that we should explore:
>
> The new architecture separates the core uniform PRNG from the wide variety
> of non-uniform probability distributions. That is, the core PRNG state is
> encapsulated in a discrete object that can be shared between instances of
> different distribution-providing classes. numpy.random should provide two
> such distribution-providing classes. The main one (let us call it
> ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
>
> ``LegacyGenerator`` CAN be used during this deprecation period in library
> and application code until libraries and applications can migrate to the
> new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT
> be forced to. ``LegacyGenerator`` CAN be used to generate test data for
> unit tests where cross-release stability of the streams is important. Test
> writers SHOULD consider ways to mitigate their reliance on such stability
> and SHOULD limit their usage to distribution methods that have fewer
> cross-platform stability risks.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/5a5a9567/attachment.html>

From ralf.gommers at gmail.com  Sun Jun 10 20:46:36 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 17:46:36 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
Message-ID: <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>

On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <warren.weckesser at gmail.com
> wrote:

>
>
> On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 6:54 PM, <josef.pktd at gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>>
>>>> On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>>>>> tests:
>>>>>>>
>>>>>>>     * ``.randint()``
>>>>>>>     * ``.uniform()``
>>>>>>>     * ``.normal()``
>>>>>>>     * ``.standard_normal()``
>>>>>>>     * ``.choice()``
>>>>>>>     * ``.shuffle()``
>>>>>>>     * ``.permutation()``
>>>>>>>
>>>>>>
>>>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>>>>> @bashtage writes:
>>>>>> > standard_gamma and standard_exponential are important enough to be
>>>>>> included here IMO.
>>>>>>
>>>>>> "Importance" was not my criterion, only whether they are used in unit
>>>>>> test suites. This list was just off the top of my head for methods that I
>>>>>> think were actually used in test suites, so I'd be happy to be shown live
>>>>>> tests that use other methods. I'd like to be a *little* conservative about
>>>>>> what methods we stick in here, but we don't have to be *too* conservative,
>>>>>> since we are explicitly never going to be modifying these.
>>>>>>
>>>>>
>>>>> That's one area where I thought the selection is too narrow.
>>>>> We should be able to get a stable stream from the uniform for some
>>>>> distributions.
>>>>>
>>>>> However, according to the Wikipedia description Poisson doesn't look
>>>>> easy. I just wrote a unit test for statsmodels using Poisson random numbers
>>>>> with hard coded numbers for the regression tests.
>>>>>
>>>>
>>>> I'd really rather people do this than use StableRandom; this is best
>>>> practice, as I see it, if your tests involve making precise comparisons to
>>>> expected results.
>>>>
>>>
>>> I hardcoded the results not the random data. So the unit tests rely on a
>>> reproducible stream of Poisson random numbers.
>>> I don't want to save 500 (100 or 1000) observations in a csv file for
>>> every variation of the unit test that I run.
>>>
>>
>> I agree, hardcoding numbers in every place where seeded random numbers
>> are now used is quite unrealistic.
>>
>> It may be worth having a look at test suites for scipy, statsmodels,
>> scikit-learn, etc. and estimate how much work this NEP causes those
>> projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
>>
>>
>
> As a quick and imperfect test, I monkey-patched numpy so that a call to
> numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests
> using the `runtests.py` script:
>
> *seed+1000, using 'python runtests.py -n' in the source directory:*
>
>   236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
> xpassed
>
>
> Most of the failures are in scipy.stats:
>
> *seed+1000, using 'python runtests.py -n -s stats' in the source
> directory:*
>
>   203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed
>
>
> Changing the amount added to the seed or running the tests using the
> function `scipy.test("full")` gives different (but similar magnitude)
> results:
>
> *seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:*
>
>   269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed
>
> *seed+1, using 'python runtests.py -n' in the source directory:*
>
>   305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
> xpassed
>
>
> I suspect many of the tests will be easy to update, so fixing 300 or so
> tests does not seem like a monumental task.
>

It's all not monumental, but it adds up quickly. In addition to changing
tests, one will also need compatibility code when supporting multiple numpy
versions (e.g. scipy when get a copy of RandomStable in
scipy/_lib/_numpy_compat.py).

A quick count of just np.random.seed occurrences with ``$ grep -roh
--include \*.py np.random.seed . | wc -w`` for some packages:
numpy: 77
scipy: 462
matplotlib: 204
statsmodels: 461
pymc3: 36
scikit-image: 63
scikit-learn: 69
keras: 46
pytorch: 0
tensorflow: 368
astropy: 24

And note, these are *not* incorrect/broken usages, this is code that works
and has done so for years.

Conclusion: the current proposal will cause work for the vast majority of
libraries that depends on numpy. The total amount of that work will
certainly not be counted in person-days/weeks, and more likely in years
than months. So I'm not convinced yet that the current proposal is the best
way forward.

Ralf


I haven't looked into why there are 585 deselected tests; maybe there are
> many more tests lurking there that will have to be updated.
>
> Warren
>
>
>
> Ralf
>>
>>
>>
>>>
>>>
>>>>
>>>> StableRandom is intended as a crutch so that the pain of moving
>>>> existing unit tests away from the deprecated RandomState is less onerous.
>>>> I'd really rather people write better unit tests!
>>>>
>>>> In particular, I do not want to add any of the integer-domain
>>>> distributions (aside from shuffle/permutation/choice) as these are the ones
>>>> that have the platform-dependency issues with respect to 32/64-bit `long`
>>>> integers. They'd be unreliable for unit tests even if we kept them stable
>>>> over time.
>>>>
>>>>
>>>>> I'm not sure which other distributions are common enough and not
>>>>> easily reproducible by transformation. E.g. negative binomial can be
>>>>> reproduces by a gamma-poisson mixture.
>>>>>
>>>>> On the other hand normal can be easily recreated from standard_normal.
>>>>>
>>>>
>>>> I was mostly motivated by making it a bit easier to mechanically
>>>> replace uses of randn(), which is probably even more common than normal()
>>>> and standard_normal() in unit tests.
>>>>
>>>>
>>>>> Would it be difficult to keep this list large, given that it should be
>>>>> frozen, low maintenance code ?
>>>>>
>>>>
>>>> I admit that I had in mind non-statistical unit tests. That is, tests
>>>> that didn't depend on the precise distribution of the inputs.
>>>>
>>>
>>> The problem is that the unit test in `stats` rely on precise inputs (up
>>> to some numerical noise).
>>> For example p-values themselves are uniformly distributed if the
>>> hypothesis test works correctly. That mean if I don't have control over the
>>> inputs, then my p-value could be anything in (0, 1). So either we need a
>>> real dataset, save all the random numbers in a file or have a reproducible
>>> set of random numbers.
>>>
>>> 95% of the unit tests that I write are for statistics. A large fraction
>>> of them don't rely on the exact distribution, but do rely on a random
>>> numbers that are "good enough".
>>> For example, when writing unit test, then I get every once in a while or
>>> sometimes more often a "bad" stream of random numbers, for which
>>> convergence might fail or where the estimated numbers are far away from the
>>> true numbers, so test tolerance would have to be very high.
>>> If I pick one of the seeds that looks good, then I can have tighter unit
>>> test tolerance to insure results are good in a nice case.
>>>
>>> The problem is that we cannot write robust unit tests for regression
>>> tests without stable inputs.
>>> E.g. I verified my results with a Monte Carlo with 5000 replications and
>>> 1000 Poisson observations in each.
>>> Results look close to expected and won't depend much on the exact stream
>>> of random variables.
>>> But the Monte Carlo for each variant of the test took about 40 seconds.
>>> Doing this for all option combination and dataset specification takes too
>>> long to be feasible in a unit test suite.
>>> So I rely on numpy's stable random numbers and hard code the results for
>>> a specific random sample in the regression unit tests.
>>>
>>> Josef
>>>
>>>
>>>
>>>>
>>>> --
>>>> Robert Kern
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/2efe793b/attachment-0001.html>

From shoyer at gmail.com  Sun Jun 10 20:52:50 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 10 Jun 2018 17:52:50 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
Message-ID: <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>

In Sun, Jun 10, 2018 at 4:31 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Thanks for the writeup Marten,
>
Indeed, thank you Marten!

> This hits on an interesting alternative to frozen dimensions - np.cross
> could just become a regular ufunc with signature np.dtype((float64, 3)),
> np.dtype((float64, 3)) &rarr; np.dtype((float64, 3))
>
> Another alternative to mention is returning multiple arrays, e.g., two
arrays for a fixed dimension of size 2.

That said, I still think frozen dimension are a better proposal than either
of these.


>    - I?m -1 on optional dimensions: they seem to legitimize creating many
>    overloads of gufuncs. I?m already not a fan of how matmul has special cases
>    for lower dimensions that don?t generalize well. To me, the best way to
>    handle matmul would be to use the proposed __array_function__ to
>    handle the shape-based special-case dispatching, either by:
>       - Inserting dimensions, and calling the true gufunc
>       np.linalg.matmul_2d (which is a function I?d like direct access to
>       anyway).
>       - Dispatching to one of four ufuncs
>
> I don't understand your alternative here. If we overload np.matmul using
__array_function__, then it would not use *ether* of these options for
writing the operation in terms of other gufuncs. It would simply look for
an __array_function__ attribute, and call that method instead.

My concern with either inserting dimensions or dispatching to one of four
ufuncs is that some objects (e.g., xarray.DataArray) define matrix
multiplication, but in an incompatible way with NumPy (e.g., xarray sums
over axes with the same name, instead of last / second-to-last axes). NumPy
really ought to provide a way overload the either operation, without either
inserting/removing dummy dimensions or inspecting input shapes to dispatch
to other gufuncs.

That said, if you don't want to make np.matmul a gufunc, then I would much
rather use Python's standard overloading rules with __matmul__/__rmatmul__
than use __array_function__, for two reasons:
1. You *already* need to use __matmul__/__rmatmul__ if you want to support
matrix multiplication with @ on your class, so __array_function__ would be
additional and redundant. __array_function__ is really intended as a
fall-back, for cases where there is no other alternative.
2. With the current __array_function__ proposal, this would imply that
calling other unimplemented NumPy functions on your object would raise
TypeError rather than doing coercion. This sort of additional coupled
behavior is probably not what an implementor of operator.matmul/@ is
looking for.

In summary, I would either support:
1. (This proposal) Adding additional optional dimensions to gufuncs for
np.matmul/operator.matmul, or
2. Making operator.matmul a special case for mathematical operators that
always checks overloads with __matmul__/__rmatmul__ even if __array_ufunc__
is defined.

Either way, matrix-multiplication becomes somewhat of a special case. It's
just a matter of whether it's a special case for gufuncs (using optional
dimensions) or a special case for arithmetic overloads in NumPy (not using
__array_ufunc__). Given that I think optional dimensions have other
conceivable uses in gufuncs (for row/column vectors), I think that's the
better option.

I would not support either expand dimensions or dispatch to multiple
gufuncs in NumPy's implementation of operator.matmul (i.e.,
ndarray.__matmul__). We could potentially only do this for numpy.matmul
rather than operator.matmul/@, but that opens the door to potential
inconsistency between the NumPy version of an operator and Python's version
of an operator, which is something we tried very hard to avoid with
__arary_ufunc__.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/bdaf7677/attachment.html>

From robert.kern at gmail.com  Sun Jun 10 20:57:24 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 10 Jun 2018 17:57:24 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
Message-ID: <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>

On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

>> I suspect many of the tests will be easy to update, so fixing 300 or so
tests does not seem like a monumental task.
>
> It's all not monumental, but it adds up quickly. In addition to changing
tests, one will also need compatibility code when supporting multiple numpy
versions (e.g. scipy when get a copy of RandomStable in
scipy/_lib/_numpy_compat.py).
>
> A quick count of just np.random.seed occurrences with ``$ grep -roh
--include \*.py np.random.seed . | wc -w`` for some packages:
> numpy: 77
> scipy: 462
> matplotlib: 204
> statsmodels: 461
> pymc3: 36
> scikit-image: 63
> scikit-learn: 69
> keras: 46
> pytorch: 0
> tensorflow: 368
> astropy: 24
>
> And note, these are *not* incorrect/broken usages, this is code that
works and has done so for years.

Yes, some of them are incorrect and broken. Failure can be difficult to
detect. This module from keras is particularly problematic:


https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image.py

> Conclusion: the current proposal will cause work for the vast majority of
libraries that depends on numpy. The total amount of that work will
certainly not be counted in person-days/weeks, and more likely in years
than months. So I'm not convinced yet that the current proposal is the best
way forward.

The mere usage of np.random.seed() doesn't imply that these packages
actually require stream-compatibility. Some might, for sure, like where
they are used in the unit tests, but that's not what you counted. At best,
these numbers just mean that we can't eliminate np.random.seed() in a new
system right away.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/a0d9a37d/attachment.html>

From robert.kern at gmail.com  Sun Jun 10 21:08:47 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 10 Jun 2018 18:08:47 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
Message-ID: <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>

On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>>>
>>> It may be worth having a look at test suites for scipy, statsmodels,
scikit-learn, etc. and estimate how much work this NEP causes those
projects. If the devs of those packages are forced to do large scale
migrations from RandomState to StableState, then why not instead keep
RandomState and just add a new API next to it?
>>
>> The problem is that we can't really have an ecosystem with two different
general purpose systems.
>
> Can't = prefer not to.

I meant what I wrote. :-)

> But yes, that's true. That's not what I was saying though. We want one
generic one, and one meant for unit testing only. You can achieve that in
two ways:
> 1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
> 2. Add a new generic API, and document the current np.random API as being
meant for unit tests only, for other usage <new API> should be preferred.
>
> (2) has a couple of pros:
> - you're not forcing almost every library and end user out there to
migrate their unit tests.

But it has the cons that I talked about. RandomState *is* a fully
functional general purpose PRNG system. After all, that's its current use.
Documenting it as intended to be something else will not change that fact.
Documentation alone provides no real impetus to move to the new system
outside of the unit tests. And the community does need to move together to
the new system in their library code, or else we won't be able to combine
libraries together; these PRNG objects need to thread all the way through
between code from different authors if we are to write programs with a
controlled seed. The failure mode when people don't pay attention to the
documentation is that I can no longer write programs that compose these
libraries together. That's why I wrote "can't". It's not a mere preference
for not having two systems to maintain. It has binary Go/No Go implications
for building reproducible programs.

> - more design freedom for the new generic API. The current one is clearly
sub-optimal; in a new one you wouldn't have to expose all the global
state/functions that np.random exposes now. You could even restrict it to a
single class and put that in the main numpy namespace.

I'm not sure why you are talking about the global state and np.random.*
convenience functions. What we do with those functions is out of scope for
this NEP and would be talked about it another NEP fully introducing the new
system.

>> To properly use pseudorandom numbers, I need to instantiate a PRNG and
thread it through all of the code in my program: both the parts that I
write and the third party libraries that I don't write.
>>
>> Generating test data for unit tests is separable, though. That's why I
propose having a StableRandom built on the new architecture. Its purpose
would be well-documented, and in my proposal is limited in features such
that it will be less likely to be abused outside of that purpose. If you
make it fully-featured, it is more likely to be abused by building library
code around it. But even if it is so abused, because it is built on the new
architecture, at least I can thread the same core PRNG state through the
StableRandom distributions from the abusing library and use the better
distributions class elsewhere (randomgen names it "Generator"). Just
keeping RandomState around can't work like that because it doesn't have a
replaceable core PRNG.
>>
>> But that does suggest another alternative that we should explore:
>>
>> The new architecture separates the core uniform PRNG from the wide
variety of non-uniform probability distributions. That is, the core PRNG
state is encapsulated in a discrete object that can be shared between
instances of different distribution-providing classes. numpy.random should
provide two such distribution-providing classes. The main one (let us call
it ``Generator``, as it is called in the prototype) will follow the new
policy: distribution methods can break the stream in feature releases.
There will also be a secondary distributions class (let us call it
``LegacyGenerator``) which contains distribution methods exactly as they
exist in the current ``RandomState`` implementation. When one combines
``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
exact same stream as ``RandomState`` for all distribution methods. The
``LegacyGenerator`` methods will be forever frozen.
``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
the MT19937 core PRNG, and whatever tricks needed to make
``isinstance(prng, RandomState)`` and unpickling work should be done. This
way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
deprecated, becoming progressively noisier over a number of release cycles,
in favor of explicitly instantiating ``LegacyGenerator``.
>>
>> ``LegacyGenerator`` CAN be used during this deprecation period in
library and application code until libraries and applications can migrate
to the new ``Generator``. Libraries and applications SHOULD migrate but
MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
data for unit tests where cross-release stability of the streams is
important. Test writers SHOULD consider ways to mitigate their reliance on
such stability and SHOULD limit their usage to distribution methods that
have fewer cross-platform stability risks.

I would appreciate your consideration of this proposal. Does it address
your concerns? It addresses my concerns with keeping around a
fully-functional RandomState implementation.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/e8233b2e/attachment-0001.html>

From josef.pktd at gmail.com  Sun Jun 10 22:45:11 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 10 Jun 2018 22:45:11 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
Message-ID: <CAMMTP+C8kuvZPCdvVUhFAy2PFEWnHsr3mWgNEKmqS=e0Q-ZSgw@mail.gmail.com>

On Sun, Jun 10, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >>>
> >>> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
> >>
> >> The problem is that we can't really have an ecosystem with two
> different general purpose systems.
> >
> > Can't = prefer not to.
>
> I meant what I wrote. :-)
>
> > But yes, that's true. That's not what I was saying though. We want one
> generic one, and one meant for unit testing only. You can achieve that in
> two ways:
> > 1. Change the current np.random API to new generic, and add a new
> RandomStable for unit tests.
> > 2. Add a new generic API, and document the current np.random API as
> being meant for unit tests only, for other usage <new API> should be
> preferred.
> >
> > (2) has a couple of pros:
> > - you're not forcing almost every library and end user out there to
> migrate their unit tests.
>
> But it has the cons that I talked about. RandomState *is* a fully
> functional general purpose PRNG system. After all, that's its current use.
> Documenting it as intended to be something else will not change that fact.
> Documentation alone provides no real impetus to move to the new system
> outside of the unit tests. And the community does need to move together to
> the new system in their library code, or else we won't be able to combine
> libraries together; these PRNG objects need to thread all the way through
> between code from different authors if we are to write programs with a
> controlled seed. The failure mode when people don't pay attention to the
> documentation is that I can no longer write programs that compose these
> libraries together. That's why I wrote "can't". It's not a mere preference
> for not having two systems to maintain. It has binary Go/No Go implications
> for building reproducible programs.
>

I don't understand this part.
For example, scipy.stats and scikit-learn allow the user to provide a
RandomState instance to the functions. I don't see why you want to force
down stream libraries to change this. A random state argument should be
(essentially) compatible with whatever the user uses, and there is no
reason to force packages to update there internal use like in unit tests if
they don't want to, e.g. because of the instability.

Aside to statsmodels: We currently have very few user facing random
functions, those are just in maybe 3 to 5 places where we have simulated or
bootstrap values.
Most of the other uses of np.random are in unit tests and some in the
documentation examples.

Josef


>
> > - more design freedom for the new generic API. The current one is
> clearly sub-optimal; in a new one you wouldn't have to expose all the
> global state/functions that np.random exposes now. You could even restrict
> it to a single class and put that in the main numpy namespace.
>
> I'm not sure why you are talking about the global state and np.random.*
> convenience functions. What we do with those functions is out of scope for
> this NEP and would be talked about it another NEP fully introducing the new
> system.
>
> >> To properly use pseudorandom numbers, I need to instantiate a PRNG and
> thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
> >>
> >> Generating test data for unit tests is separable, though. That's why I
> propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
> >>
> >> But that does suggest another alternative that we should explore:
> >>
> >> The new architecture separates the core uniform PRNG from the wide
> variety of non-uniform probability distributions. That is, the core PRNG
> state is encapsulated in a discrete object that can be shared between
> instances of different distribution-providing classes. numpy.random should
> provide two such distribution-providing classes. The main one (let us call
> it ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
> >>
> >> ``LegacyGenerator`` CAN be used during this deprecation period in
> library and application code until libraries and applications can migrate
> to the new ``Generator``. Libraries and applications SHOULD migrate but
> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
> data for unit tests where cross-release stability of the streams is
> important. Test writers SHOULD consider ways to mitigate their reliance on
> such stability and SHOULD limit their usage to distribution methods that
> have fewer cross-platform stability risks.
>
> I would appreciate your consideration of this proposal. Does it address
> your concerns? It addresses my concerns with keeping around a
> fully-functional RandomState implementation.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/95eafc54/attachment.html>

From ralf.gommers at gmail.com  Sun Jun 10 23:01:20 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 20:01:20 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
Message-ID: <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>

On Sun, Jun 10, 2018 at 6:08 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >>>
> >>> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
> >>
> >> The problem is that we can't really have an ecosystem with two
> different general purpose systems.
> >
> > Can't = prefer not to.
>
> I meant what I wrote. :-)
>
> > But yes, that's true. That's not what I was saying though. We want one
> generic one, and one meant for unit testing only. You can achieve that in
> two ways:
> > 1. Change the current np.random API to new generic, and add a new
> RandomStable for unit tests.
> > 2. Add a new generic API, and document the current np.random API as
> being meant for unit tests only, for other usage <new API> should be
> preferred.
> >
> > (2) has a couple of pros:
> > - you're not forcing almost every library and end user out there to
> migrate their unit tests.
>
> But it has the cons that I talked about. RandomState *is* a fully
> functional general purpose PRNG system. After all, that's its current use.
> Documenting it as intended to be something else will not change that fact.
> Documentation alone provides no real impetus to move to the new system
> outside of the unit tests. And the community does need to move together to
> the new system in their library code, or else we won't be able to combine
> libraries together; these PRNG objects need to thread all the way through
> between code from different authors if we are to write programs with a
> controlled seed. The failure mode when people don't pay attention to the
> documentation is that I can no longer write programs that compose these
> libraries together. That's why I wrote "can't". It's not a mere preference
> for not having two systems to maintain. It has binary Go/No Go implications
> for building reproducible programs.
>

I strongly suspect you are right, but only because you're asserting "can't"
so heavily. I have trouble formulating what would go wrong in case there's
two PRNGs used in a single program. It's not described in the NEP, nor in
the numpy.random docs (those don't even have any recommendations for best
practices listed as far as I can tell - that needs fixing). All you explain
in the NEP is that reproducible research isn't helped by the current
stream-compat guarantee. So a bit of (probably incorrect) devil's advocate
reasoning:
- If there's no stream-compat guarantee, all a user can rely on is the
properties of drawing from a seeded PRNG.
- Any use of a PRNG in library code can also only rely on properties
- So now whether in a user's program libraries draw from one or two seeded
PRNGs doesn't matter for reproducibility, because those properties don't
change.


Also, if there is to be a multi-year transitioning to the new API, would
there be two PRNG systems anyway during those years?


> > - more design freedom for the new generic API. The current one is
> clearly sub-optimal; in a new one you wouldn't have to expose all the
> global state/functions that np.random exposes now. You could even restrict
> it to a single class and put that in the main numpy namespace.
>
> I'm not sure why you are talking about the global state and np.random.*
> convenience functions. What we do with those functions is out of scope for
> this NEP and would be talked about it another NEP fully introducing the new
> system.
>

To quote you from one of the first emails in this thread: "
I deliberately left it out of this one as it may, depending on our choices,
impinge upon the design of the new PRNG subsystem, which I declared out of
scope for this NEP. I have ideas (besides the glib "Let them eat
AttributeErrors!"), and now that I think more about it, that does seem like
it might be in scope just like the discussion of freezing RandomState and
StableRandom are. But I think I'd like to hold that thought a little bit
and get a little more screaming^Wfeedback on the core proposal first. I'll
return to this in a few days if not sooner.
"

So consider this some screaming^Wfeedback:)


>
> >> To properly use pseudorandom numbers, I need to instantiate a PRNG and
> thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
> >>
> >> Generating test data for unit tests is separable, though. That's why I
> propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
> >>
> >> But that does suggest another alternative that we should explore:
> >>
> >> The new architecture separates the core uniform PRNG from the wide
> variety of non-uniform probability distributions. That is, the core PRNG
> state is encapsulated in a discrete object that can be shared between
> instances of different distribution-providing classes. numpy.random should
> provide two such distribution-providing classes. The main one (let us call
> it ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
> >>
> >> ``LegacyGenerator`` CAN be used during this deprecation period in
> library and application code until libraries and applications can migrate
> to the new ``Generator``. Libraries and applications SHOULD migrate but
> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
> data for unit tests where cross-release stability of the streams is
> important. Test writers SHOULD consider ways to mitigate their reliance on
> such stability and SHOULD limit their usage to distribution methods that
> have fewer cross-platform stability risks.
>
> I would appreciate your consideration of this proposal. Does it address
> your concerns? It addresses my concerns with keeping around a
> fully-functional RandomState implementation.
>

My concerns are:
1. The amount of work caused by making libraries and end users migrate.
2. That this is a backwards compatibility break, which will cause problems
for users who relied on the old guarantees (the arguments in the NEP that
the old guarantees weren't 100% watertight don't mean that backcompat
doesn't matter at all).

As far as I can tell, this new proposal doesn't deal with those concerns
directly. What it does seem to do is making transitioning a bit easier for
users that were already using RandomState instances.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/cf674237/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Jun 10 23:10:16 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 20:10:16 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
Message-ID: <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>

On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
> >> I suspect many of the tests will be easy to update, so fixing 300 or so
> tests does not seem like a monumental task.
> >
> > It's all not monumental, but it adds up quickly. In addition to changing
> tests, one will also need compatibility code when supporting multiple numpy
> versions (e.g. scipy when get a copy of RandomStable in
> scipy/_lib/_numpy_compat.py).
> >
> > A quick count of just np.random.seed occurrences with ``$ grep -roh
> --include \*.py np.random.seed . | wc -w`` for some packages:
> > numpy: 77
> > scipy: 462
> > matplotlib: 204
> > statsmodels: 461
> > pymc3: 36
> > scikit-image: 63
> > scikit-learn: 69
> > keras: 46
> > pytorch: 0
> > tensorflow: 368
> > astropy: 24
> >
> > And note, these are *not* incorrect/broken usages, this is code that
> works and has done so for years.
>
> Yes, some of them are incorrect and broken. Failure can be difficult to
> detect. This module from keras is particularly problematic:
>
>   https://github.com/keras-team/keras-preprocessing/blob/
> master/keras_preprocessing/image.py
>

You have to appreciate that we're not all thinking at lightning speed and
in the same direction. If there is a difficult to detect problem, it may be
useful to give a brief code example (or even line of reasoning) of how this
actually breaks something.


>
> > Conclusion: the current proposal will cause work for the vast majority
> of libraries that depends on numpy. The total amount of that work will
> certainly not be counted in person-days/weeks, and more likely in years
> than months. So I'm not convinced yet that the current proposal is the best
> way forward.
>
> The mere usage of np.random.seed() doesn't imply that these packages
> actually require stream-compatibility. Some might, for sure, like where
> they are used in the unit tests, but that's not what you counted. At best,
> these numbers just mean that we can't eliminate np.random.seed() in a new
> system right away.
>

Well, mere usage has been called an antipattern (also on your behalf), plus
for scipy over half of the usages do give test failures (Warren's quick
test). So I'd say that counting usages is a decent proxy for the work that
has to be done.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/d40d3763/attachment.html>

From shoyer at gmail.com  Sun Jun 10 23:38:50 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 10 Jun 2018 20:38:50 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
 <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
Message-ID: <CAEQ_TvcV4mVvnca+zqdrSFLGbXsWN6v1s2_Z5KDryR7ceZEuFA@mail.gmail.com>

On Sun, Jun 10, 2018 at 8:10 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> > Conclusion: the current proposal will cause work for the vast majority
>> of libraries that depends on numpy. The total amount of that work will
>> certainly not be counted in person-days/weeks, and more likely in years
>> than months. So I'm not convinced yet that the current proposal is the best
>> way forward.
>>
>
>> The mere usage of np.random.seed() doesn't imply that these packages
>> actually require stream-compatibility. Some might, for sure, like where
>> they are used in the unit tests, but that's not what you counted. At best,
>> these numbers just mean that we can't eliminate np.random.seed() in a new
>> system right away.
>>
>
> Well, mere usage has been called an antipattern (also on your behalf),
> plus for scipy over half of the usages do give test failures (Warren's
> quick test). So I'd say that counting usages is a decent proxy for the work
> that has to be done.
>

Let me suggest another possible concession for backwards compatibility. We
should make a dedicated module, e.g., "numpy.random.stable" that contains
functions implemented as methods on StableRandom. These functions should
include "seed", which is too pervasive to justify removing.

Transitioning to the new module should be as simple as mechanistically
replacing all uses of "numpy.random" with "numpy.random.stable".

This module would add virtually no maintenance overhead, because the
implementations would be entirely contained on StableRandom, and would
simply involve creating a single top-level StableRandom object (like what
is currently done in numpy.random).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/f3b4cd18/attachment.html>

From robert.kern at gmail.com  Mon Jun 11 01:06:11 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 10 Jun 2018 22:06:11 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+C8kuvZPCdvVUhFAy2PFEWnHsr3mWgNEKmqS=e0Q-ZSgw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CAMMTP+C8kuvZPCdvVUhFAy2PFEWnHsr3mWgNEKmqS=e0Q-ZSgw@mail.gmail.com>
Message-ID: <CAF6FJitLyvaDjCcipJKsTmSkQq8wLU8rC_DKRkcChA89HBoP0Q@mail.gmail.com>

On Sun, Jun 10, 2018 at 7:46 PM <josef.pktd at gmail.com> wrote:
>
> On Sun, Jun 10, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>>
>> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >
>> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>> >>
>> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >>>
>> >>> It may be worth having a look at test suites for scipy, statsmodels,
scikit-learn, etc. and estimate how much work this NEP causes those
projects. If the devs of those packages are forced to do large scale
migrations from RandomState to StableState, then why not instead keep
RandomState and just add a new API next to it?
>> >>
>> >> The problem is that we can't really have an ecosystem with two
different general purpose systems.
>> >
>> > Can't = prefer not to.
>>
>> I meant what I wrote. :-)
>>
>> > But yes, that's true. That's not what I was saying though. We want one
generic one, and one meant for unit testing only. You can achieve that in
two ways:
>> > 1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
>> > 2. Add a new generic API, and document the current np.random API as
being meant for unit tests only, for other usage <new API> should be
preferred.
>> >
>> > (2) has a couple of pros:
>> > - you're not forcing almost every library and end user out there to
migrate their unit tests.
>>
>> But it has the cons that I talked about. RandomState *is* a fully
functional general purpose PRNG system. After all, that's its current use.
Documenting it as intended to be something else will not change that fact.
Documentation alone provides no real impetus to move to the new system
outside of the unit tests. And the community does need to move together to
the new system in their library code, or else we won't be able to combine
libraries together; these PRNG objects need to thread all the way through
between code from different authors if we are to write programs with a
controlled seed. The failure mode when people don't pay attention to the
documentation is that I can no longer write programs that compose these
libraries together. That's why I wrote "can't". It's not a mere preference
for not having two systems to maintain. It has binary Go/No Go implications
for building reproducible programs.
>
> I don't understand this part.
> For example, scipy.stats and scikit-learn allow the user to provide a
RandomState instance to the functions. I don't see why you want to force
down stream libraries to change this. A random state argument should be
(essentially) compatible with whatever the user uses, and there is no
reason to force packages to update there internal use like in unit tests if
they don't want to, e.g. because of the instability.
>
> Aside to statsmodels: We currently have very few user facing random
functions, those are just in maybe 3 to 5 places where we have simulated or
bootstrap values.
> Most of the other uses of np.random are in unit tests and some in the
documentation examples.

Please consider my alternative proposal. Your feedback has convinced me
that that's a better approach than the StableRandom as laid out in the NEP.
I'm even willing to not deprecate the name RandomState.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/4f129d5b/attachment-0001.html>

From robert.kern at gmail.com  Mon Jun 11 01:36:29 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 10 Jun 2018 22:36:29 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
Message-ID: <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>

On Sun, Jun 10, 2018 at 8:04 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Sun, Jun 10, 2018 at 6:08 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>>
>> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >
>> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>> >>
>> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >>>
>> >>> It may be worth having a look at test suites for scipy, statsmodels,
scikit-learn, etc. and estimate how much work this NEP causes those
projects. If the devs of those packages are forced to do large scale
migrations from RandomState to StableState, then why not instead keep
RandomState and just add a new API next to it?
>> >>
>> >> The problem is that we can't really have an ecosystem with two
different general purpose systems.
>> >
>> > Can't = prefer not to.
>>
>> I meant what I wrote. :-)
>>
>> > But yes, that's true. That's not what I was saying though. We want one
generic one, and one meant for unit testing only. You can achieve that in
two ways:
>> > 1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
>> > 2. Add a new generic API, and document the current np.random API as
being meant for unit tests only, for other usage <new API> should be
preferred.
>> >
>> > (2) has a couple of pros:
>> > - you're not forcing almost every library and end user out there to
migrate their unit tests.
>>
>> But it has the cons that I talked about. RandomState *is* a fully
functional general purpose PRNG system. After all, that's its current use.
Documenting it as intended to be something else will not change that fact.
Documentation alone provides no real impetus to move to the new system
outside of the unit tests. And the community does need to move together to
the new system in their library code, or else we won't be able to combine
libraries together; these PRNG objects need to thread all the way through
between code from different authors if we are to write programs with a
controlled seed. The failure mode when people don't pay attention to the
documentation is that I can no longer write programs that compose these
libraries together. That's why I wrote "can't". It's not a mere preference
for not having two systems to maintain. It has binary Go/No Go implications
for building reproducible programs.
>
> I strongly suspect you are right, but only because you're asserting
"can't" so heavily. I have trouble formulating what would go wrong in case
there's two PRNGs used in a single program. It's not described in the NEP,
nor in the numpy.random docs (those don't even have any recommendations for
best practices listed as far as I can tell - that needs fixing). All you
explain in the NEP is that reproducible research isn't helped by the
current stream-compat guarantee. So a bit of (probably incorrect) devil's
advocate reasoning:
> - If there's no stream-compat guarantee, all a user can rely on is the
properties of drawing from a seeded PRNG.
> - Any use of a PRNG in library code can also only rely on properties
> - So now whether in a user's program libraries draw from one or two
seeded PRNGs doesn't matter for reproducibility, because those properties
don't change.

Correctly making a stochastic program reproducible while retaining good
statistical properties is difficult. People don't do it well in the best of
circumstances. The best way that we've found to manage that difficulty is
to instantiate a single stream and use it all throughout your code. Every
new stream requires the management of more seeds (unless if we use the
fancy new algorithms that have settable stream IDs, but by stipulation, we
don't have these in this case). And now I have to thread both of these
objects through my code, and pass the right object to each third-party
library. These third-party libraries don't know anything about this weird
2-stream workaround that you are doing, so we now have libraries that can't
build on each other unless if they are using the same compatible API, even
if I can make workarounds to build a program that combines two libraries
side-to-side.

So yeah, people "can" do this. "It's just a matter of code" as my boss
likes to say. But it's making an already-difficult task more difficult.

> Also, if there is to be a multi-year transitioning to the new API, would
there be two PRNG systems anyway during those years?

Sure, but with a deadline and not-just-documentation to motivate
transitioning.

But if we follow my alternative proposal, there'll be no need for
deprecation! You've convinced me to not deprecate RandomState. I just want
to change some of its internal implementation details, add a less-stable
set of distributions on the side, and a framework of core uniform PRNGs
that can be shared by both.

>> > - more design freedom for the new generic API. The current one is
clearly sub-optimal; in a new one you wouldn't have to expose all the
global state/functions that np.random exposes now. You could even restrict
it to a single class and put that in the main numpy namespace.
>>
>> I'm not sure why you are talking about the global state and np.random.*
convenience functions. What we do with those functions is out of scope for
this NEP and would be talked about it another NEP fully introducing the new
system.
>
> To quote you from one of the first emails in this thread: "
> I deliberately left it out of this one as it may, depending on our
choices, impinge upon the design of the new PRNG subsystem, which I
declared out of scope for this NEP. I have ideas (besides the glib "Let
them eat AttributeErrors!"), and now that I think more about it, that does
seem like it might be in scope just like the discussion of freezing
RandomState and StableRandom are. But I think I'd like to hold that thought
a little bit and get a little more screaming^Wfeedback on the core proposal
first. I'll return to this in a few days if not sooner.
> "
>
> So consider this some screaming^Wfeedback:)

Ahem. Yes, I just remembered I said that. :-) But still, there will be lots
of options about what to do with np.random.*, whatever proposal we go with.
It doesn't really impose constraints on the core proposals.

>> >> To properly use pseudorandom numbers, I need to instantiate a PRNG
and thread it through all of the code in my program: both the parts that I
write and the third party libraries that I don't write.
>> >>
>> >> Generating test data for unit tests is separable, though. That's why
I propose having a StableRandom built on the new architecture. Its purpose
would be well-documented, and in my proposal is limited in features such
that it will be less likely to be abused outside of that purpose. If you
make it fully-featured, it is more likely to be abused by building library
code around it. But even if it is so abused, because it is built on the new
architecture, at least I can thread the same core PRNG state through the
StableRandom distributions from the abusing library and use the better
distributions class elsewhere (randomgen names it "Generator"). Just
keeping RandomState around can't work like that because it doesn't have a
replaceable core PRNG.
>> >>
>> >> But that does suggest another alternative that we should explore:
>> >>
>> >> The new architecture separates the core uniform PRNG from the wide
variety of non-uniform probability distributions. That is, the core PRNG
state is encapsulated in a discrete object that can be shared between
instances of different distribution-providing classes. numpy.random should
provide two such distribution-providing classes. The main one (let us call
it ``Generator``, as it is called in the prototype) will follow the new
policy: distribution methods can break the stream in feature releases.
There will also be a secondary distributions class (let us call it
``LegacyGenerator``) which contains distribution methods exactly as they
exist in the current ``RandomState`` implementation. When one combines
``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
exact same stream as ``RandomState`` for all distribution methods. The
``LegacyGenerator`` methods will be forever frozen.
``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
the MT19937 core PRNG, and whatever tricks needed to make
``isinstance(prng, RandomState)`` and unpickling work should be done. This
way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
deprecated, becoming progressively noisier over a number of release cycles,
in favor of explicitly instantiating ``LegacyGenerator``.
>> >>
>> >> ``LegacyGenerator`` CAN be used during this deprecation period in
library and application code until libraries and applications can migrate
to the new ``Generator``. Libraries and applications SHOULD migrate but
MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
data for unit tests where cross-release stability of the streams is
important. Test writers SHOULD consider ways to mitigate their reliance on
such stability and SHOULD limit their usage to distribution methods that
have fewer cross-platform stability risks.
>>
>> I would appreciate your consideration of this proposal. Does it address
your concerns? It addresses my concerns with keeping around a
fully-functional RandomState implementation.
>
> My concerns are:
> 1. The amount of work caused by making libraries and end users migrate.
> 2. That this is a backwards compatibility break, which will cause
problems for users who relied on the old guarantees (the arguments in the
NEP that the old guarantees weren't 100% watertight don't mean that
backcompat doesn't matter at all).
>
> As far as I can tell, this new proposal doesn't deal with those concerns
directly. What it does seem to do is making transitioning a bit easier for
users that were already using RandomState instances.

Let me drop the deprecation of the name RandomState. RandomState(int_seed)
will forever and always create a backwards- and stream-compatible object.
No one will have to migrate.

How does that strike you?

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/755d3d9b/attachment.html>

From robert.kern at gmail.com  Mon Jun 11 02:15:44 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 10 Jun 2018 23:15:44 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
 <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
Message-ID: <CAF6FJithUe_7=O8Oa21PP7ku1807ijH88p9KCJ+eFW3OazeAOw@mail.gmail.com>

On Sun, Jun 10, 2018 at 8:11 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>>
>> On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >
>> > On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:
>>
>> >> I suspect many of the tests will be easy to update, so fixing 300 or
so tests does not seem like a monumental task.
>> >
>> > It's all not monumental, but it adds up quickly. In addition to
changing tests, one will also need compatibility code when supporting
multiple numpy versions (e.g. scipy when get a copy of RandomStable in
scipy/_lib/_numpy_compat.py).
>> >
>> > A quick count of just np.random.seed occurrences with ``$ grep -roh
--include \*.py np.random.seed . | wc -w`` for some packages:
>> > numpy: 77
>> > scipy: 462
>> > matplotlib: 204
>> > statsmodels: 461
>> > pymc3: 36
>> > scikit-image: 63
>> > scikit-learn: 69
>> > keras: 46
>> > pytorch: 0
>> > tensorflow: 368
>> > astropy: 24
>> >
>> > And note, these are *not* incorrect/broken usages, this is code that
works and has done so for years.
>>
>> Yes, some of them are incorrect and broken. Failure can be difficult to
detect. This module from keras is particularly problematic:
>>
>>
https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image.py
>
> You have to appreciate that we're not all thinking at lightning speed and
in the same direction. If there is a difficult to detect problem, it may be
useful to give a brief code example (or even line of reasoning) of how this
actually breaks something.

Ahem. Sorry. That wasn't the code I was thinking of. It's merely hazardous,
not broken by itself. However, if you used any of the `seed=` arguments
that are helpfully(?) provided, you are almost certainly writing broken
code. If you must use np.random.seed() to get reproducibility, you need to
call it exactly once at the start of your code (or maybe once for each
process) and let it ride.

This is the impossible-to-use-correctly code that I was thinking of, which
got partially fixed after I pointed out the problem.

  https://github.com/keras-team/keras/pull/8325/files

The intention of this code is to shuffle two same-length sequences in the
same way. So now if I write my code well to call np.random.seed() once at
the start of my program, this function comes along and obliterates that
with a fixed seed just so it can reuse the seed again to replicate the
shuffle.

Puzzlingly, the root sin of unconditionally and unavoidably reseeding for
some of these functions is still there even though I showed how and why to
avoid it. This is one reason why I was skeptical that merely documenting
RandomState or StableRandom to only be used for unit tests would work. :-)

>> > Conclusion: the current proposal will cause work for the vast majority
of libraries that depends on numpy. The total amount of that work will
certainly not be counted in person-days/weeks, and more likely in years
than months. So I'm not convinced yet that the current proposal is the best
way forward.
>>
>> The mere usage of np.random.seed() doesn't imply that these packages
actually require stream-compatibility. Some might, for sure, like where
they are used in the unit tests, but that's not what you counted. At best,
these numbers just mean that we can't eliminate np.random.seed() in a new
system right away.
>
> Well, mere usage has been called an antipattern (also on your behalf),
plus for scipy over half of the usages do give test failures (Warren's
quick test). So I'd say that counting usages is a decent proxy for the work
that has to be done.

Sure. But with my new proposal, we don't have to change it (as much as I'd
like to!). I'll draft up a PR to modify my NEP accordingly.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/a04aedd7/attachment-0001.html>

From ralf.gommers at gmail.com  Mon Jun 11 02:43:33 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 23:43:33 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
 <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>
Message-ID: <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>

On Sun, Jun 10, 2018 at 10:36 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 8:04 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Sun, Jun 10, 2018 at 6:08 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >> >
> >> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >> >>
> >> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >> >>>
> >> >>> It may be worth having a look at test suites for scipy,
> statsmodels, scikit-learn, etc. and estimate how much work this NEP causes
> those projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
> >> >>
> >> >> The problem is that we can't really have an ecosystem with two
> different general purpose systems.
> >> >
> >> > Can't = prefer not to.
> >>
> >> I meant what I wrote. :-)
> >>
> >> > But yes, that's true. That's not what I was saying though. We want
> one generic one, and one meant for unit testing only. You can achieve that
> in two ways:
> >> > 1. Change the current np.random API to new generic, and add a new
> RandomStable for unit tests.
> >> > 2. Add a new generic API, and document the current np.random API as
> being meant for unit tests only, for other usage <new API> should be
> preferred.
> >> >
> >> > (2) has a couple of pros:
> >> > - you're not forcing almost every library and end user out there to
> migrate their unit tests.
> >>
> >> But it has the cons that I talked about. RandomState *is* a fully
> functional general purpose PRNG system. After all, that's its current use.
> Documenting it as intended to be something else will not change that fact.
> Documentation alone provides no real impetus to move to the new system
> outside of the unit tests. And the community does need to move together to
> the new system in their library code, or else we won't be able to combine
> libraries together; these PRNG objects need to thread all the way through
> between code from different authors if we are to write programs with a
> controlled seed. The failure mode when people don't pay attention to the
> documentation is that I can no longer write programs that compose these
> libraries together. That's why I wrote "can't". It's not a mere preference
> for not having two systems to maintain. It has binary Go/No Go implications
> for building reproducible programs.
> >
> > I strongly suspect you are right, but only because you're asserting
> "can't" so heavily. I have trouble formulating what would go wrong in case
> there's two PRNGs used in a single program. It's not described in the NEP,
> nor in the numpy.random docs (those don't even have any recommendations for
> best practices listed as far as I can tell - that needs fixing). All you
> explain in the NEP is that reproducible research isn't helped by the
> current stream-compat guarantee. So a bit of (probably incorrect) devil's
> advocate reasoning:
> > - If there's no stream-compat guarantee, all a user can rely on is the
> properties of drawing from a seeded PRNG.
> > - Any use of a PRNG in library code can also only rely on properties
> > - So now whether in a user's program libraries draw from one or two
> seeded PRNGs doesn't matter for reproducibility, because those properties
> don't change.
>
> Correctly making a stochastic program reproducible while retaining good
> statistical properties is difficult. People don't do it well in the best of
> circumstances. The best way that we've found to manage that difficulty is
> to instantiate a single stream and use it all throughout your code. Every
> new stream requires the management of more seeds (unless if we use the
> fancy new algorithms that have settable stream IDs, but by stipulation, we
> don't have these in this case). And now I have to thread both of these
> objects through my code, and pass the right object to each third-party
> library. These third-party libraries don't know anything about this weird
> 2-stream workaround that you are doing, so we now have libraries that can't
> build on each other unless if they are using the same compatible API, even
> if I can make workarounds to build a program that combines two libraries
> side-to-side.
>
> So yeah, people "can" do this. "It's just a matter of code" as my boss
> likes to say. But it's making an already-difficult task more difficult.
>

Okay, that makes more sense to me now. It would be really useful to
document such best practices and rationales.

Note that scipy.stats distributions allow passing in either a RandomState
instance or an integer as seed (which will be used for seeding a new
instance, not for np.random.seed) [1]. That seems like a fine design
pattern as well, and passing on a seed that way is fairly easy and as good
for reproducibility as passing in a single PRNG.

[1]
https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py#L612


> > Also, if there is to be a multi-year transitioning to the new API, would
> there be two PRNG systems anyway during those years?
>
> Sure, but with a deadline and not-just-documentation to motivate
> transitioning.
>
> But if we follow my alternative proposal, there'll be no need for
> deprecation! You've convinced me to not deprecate RandomState.
>

That's not how I had read it, but great to hear that!

I just want to change some of its internal implementation details, add a
> less-stable set of distributions on the side, and a framework of core
> uniform PRNGs that can be shared by both.
>
> >> > - more design freedom for the new generic API. The current one is
> clearly sub-optimal; in a new one you wouldn't have to expose all the
> global state/functions that np.random exposes now. You could even restrict
> it to a single class and put that in the main numpy namespace.
> >>
> >> I'm not sure why you are talking about the global state and np.random.*
> convenience functions. What we do with those functions is out of scope for
> this NEP and would be talked about it another NEP fully introducing the new
> system.
> >
> > To quote you from one of the first emails in this thread: "
> > I deliberately left it out of this one as it may, depending on our
> choices, impinge upon the design of the new PRNG subsystem, which I
> declared out of scope for this NEP. I have ideas (besides the glib "Let
> them eat AttributeErrors!"), and now that I think more about it, that does
> seem like it might be in scope just like the discussion of freezing
> RandomState and StableRandom are. But I think I'd like to hold that thought
> a little bit and get a little more screaming^Wfeedback on the core proposal
> first. I'll return to this in a few days if not sooner.
> > "
> >
> > So consider this some screaming^Wfeedback:)
>
> Ahem. Yes, I just remembered I said that. :-) But still, there will be
> lots of options about what to do with np.random.*, whatever proposal we go
> with. It doesn't really impose constraints on the core proposals.
>
> >> >> To properly use pseudorandom numbers, I need to instantiate a PRNG
> and thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
> >> >>
> >> >> Generating test data for unit tests is separable, though. That's why
> I propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
> >> >>
> >> >> But that does suggest another alternative that we should explore:
> >> >>
> >> >> The new architecture separates the core uniform PRNG from the wide
> variety of non-uniform probability distributions. That is, the core PRNG
> state is encapsulated in a discrete object that can be shared between
> instances of different distribution-providing classes. numpy.random should
> provide two such distribution-providing classes. The main one (let us call
> it ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
> >> >>
> >> >> ``LegacyGenerator`` CAN be used during this deprecation period in
> library and application code until libraries and applications can migrate
> to the new ``Generator``. Libraries and applications SHOULD migrate but
> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
> data for unit tests where cross-release stability of the streams is
> important. Test writers SHOULD consider ways to mitigate their reliance on
> such stability and SHOULD limit their usage to distribution methods that
> have fewer cross-platform stability risks.
> >>
> >> I would appreciate your consideration of this proposal. Does it address
> your concerns? It addresses my concerns with keeping around a
> fully-functional RandomState implementation.
> >
> > My concerns are:
> > 1. The amount of work caused by making libraries and end users migrate.
> > 2. That this is a backwards compatibility break, which will cause
> problems for users who relied on the old guarantees (the arguments in the
> NEP that the old guarantees weren't 100% watertight don't mean that
> backcompat doesn't matter at all).
> >
> > As far as I can tell, this new proposal doesn't deal with those concerns
> directly. What it does seem to do is making transitioning a bit easier for
> users that were already using RandomState instances.
>
> Let me drop the deprecation of the name RandomState. RandomState(int_seed)
> will forever and always create a backwards- and stream-compatible object.
> No one will have to migrate.
>
> How does that strike you?
>

Sounds good.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/fb590ed6/attachment-0001.html>

From ralf.gommers at gmail.com  Mon Jun 11 02:53:07 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 10 Jun 2018 23:53:07 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJithUe_7=O8Oa21PP7ku1807ijH88p9KCJ+eFW3OazeAOw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
 <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
 <CAF6FJithUe_7=O8Oa21PP7ku1807ijH88p9KCJ+eFW3OazeAOw@mail.gmail.com>
Message-ID: <CABL7CQhCR2=HVe--YdJe4gLw2ovcwe3pZAG628CJgah3e8sGQw@mail.gmail.com>

On Sun, Jun 10, 2018 at 11:15 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 8:11 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >> >
> >> > On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
> >>
> >> >> I suspect many of the tests will be easy to update, so fixing 300 or
> so tests does not seem like a monumental task.
> >> >
> >> > It's all not monumental, but it adds up quickly. In addition to
> changing tests, one will also need compatibility code when supporting
> multiple numpy versions (e.g. scipy when get a copy of RandomStable in
> scipy/_lib/_numpy_compat.py).
> >> >
> >> > A quick count of just np.random.seed occurrences with ``$ grep -roh
> --include \*.py np.random.seed . | wc -w`` for some packages:
> >> > numpy: 77
> >> > scipy: 462
> >> > matplotlib: 204
> >> > statsmodels: 461
> >> > pymc3: 36
> >> > scikit-image: 63
> >> > scikit-learn: 69
> >> > keras: 46
> >> > pytorch: 0
> >> > tensorflow: 368
> >> > astropy: 24
> >> >
> >> > And note, these are *not* incorrect/broken usages, this is code that
> works and has done so for years.
> >>
> >> Yes, some of them are incorrect and broken. Failure can be difficult to
> detect. This module from keras is particularly problematic:
> >>
> >>   https://github.com/keras-team/keras-preprocessing/blob/
> master/keras_preprocessing/image.py
> >
> > You have to appreciate that we're not all thinking at lightning speed
> and in the same direction. If there is a difficult to detect problem, it
> may be useful to give a brief code example (or even line of reasoning) of
> how this actually breaks something.
>
> Ahem. Sorry. That wasn't the code I was thinking of. It's merely
> hazardous, not broken by itself. However, if you used any of the `seed=`
> arguments that are helpfully(?) provided, you are almost certainly writing
> broken code. If you must use np.random.seed() to get reproducibility, you
> need to call it exactly once at the start of your code (or maybe once for
> each process) and let it ride.
>
> This is the impossible-to-use-correctly code that I was thinking of, which
> got partially fixed after I pointed out the problem.
>
>   https://github.com/keras-team/keras/pull/8325/files
>
> The intention of this code is to shuffle two same-length sequences in the
> same way. So now if I write my code well to call np.random.seed() once at
> the start of my program, this function comes along and obliterates that
> with a fixed seed just so it can reuse the seed again to replicate the
> shuffle.
>

Yes, that's a big no-no. There are situations conceivable where a library
has to set a seed, but I think the right pattern in that case would be
something like

old_state = np.random.get_state()
np.random.seed(some_int)
do_stuff()
np.random.set_state(**old._state)


> Puzzlingly, the root sin of unconditionally and unavoidably reseeding for
> some of these functions is still there even though I showed how and why to
> avoid it. This is one reason why I was skeptical that merely documenting
> RandomState or StableRandom to only be used for unit tests would work. :-)
>

Well, no matter what we do, I'm sure that there'll be lots of people who
will still get it wrong:)


> >> > Conclusion: the current proposal will cause work for the vast
> majority of libraries that depends on numpy. The total amount of that work
> will certainly not be counted in person-days/weeks, and more likely in
> years than months. So I'm not convinced yet that the current proposal is
> the best way forward.
> >>
> >> The mere usage of np.random.seed() doesn't imply that these packages
> actually require stream-compatibility. Some might, for sure, like where
> they are used in the unit tests, but that's not what you counted. At best,
> these numbers just mean that we can't eliminate np.random.seed() in a new
> system right away.
> >
> > Well, mere usage has been called an antipattern (also on your behalf),
> plus for scipy over half of the usages do give test failures (Warren's
> quick test). So I'd say that counting usages is a decent proxy for the work
> that has to be done.
>
> Sure. But with my new proposal, we don't have to change it (as much as I'd
> like to!). I'll draft up a PR to modify my NEP accordingly.
>

Sounds good!

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/dada43fb/attachment.html>

From kevin.k.sheppard at gmail.com  Mon Jun 11 03:02:54 2018
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 11 Jun 2018 08:02:54 +0100
Subject: [Numpy-discussion] NEP: Random Number Generator Policy (Robert
 Kern)
In-Reply-To: <mailman.1459.1528697778.22136.numpy-discussion@python.org>
References: <mailman.1459.1528697778.22136.numpy-discussion@python.org>
Message-ID: <5b1e1e9e.1c69fb81.7976a.7aed@mx.google.com>

Maybe a good place for a stable, testing focused generator would be in numpy.random.testing.  This could host a default implementation of StableGenerator, although a better name might be TestingGenerator.  It would also help users decide that this is not the generator they are looking for (I think many people might think StableGenerator is a good thing, after all, who wants an UnstableGenerator).


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/44423cd8/attachment.html>

From robert.kern at gmail.com  Mon Jun 11 03:29:33 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 11 Jun 2018 00:29:33 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
 <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>
 <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>
Message-ID: <CAF6FJit1y2pz9nkNvstV3C7=thaPnVYXMjKN_7UECdSFbXZdKQ@mail.gmail.com>

On Sun, Jun 10, 2018 at 11:44 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:

> Note that scipy.stats distributions allow passing in either a RandomState
instance or an integer as seed (which will be used for seeding a new
instance, not for np.random.seed) [1]. That seems like a fine design
pattern as well, and passing on a seed that way is fairly easy and as good
for reproducibility as passing in a single PRNG.
>
> [1]
https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py#L612

Well, carefully. You wouldn't want to pass on the same integer seed to
multiple functions. Accepting an integer seed is super-convenient at the
command line/notebooks, though, or docstrings or in tests or other
situations where your "reproducibility horizon" is small. These utilities
are good for scaling from these small use cases to up to large ones.

scikit-learn is also a good example of good PRNG hygiene:


https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L715

>> > Also, if there is to be a multi-year transitioning to the new API,
would there be two PRNG systems anyway during those years?
>>
>> Sure, but with a deadline and not-just-documentation to motivate
transitioning.
>>
>> But if we follow my alternative proposal, there'll be no need for
deprecation! You've convinced me to not deprecate RandomState.
>
> That's not how I had read it, but great to hear that!

Indeed, I did deprecate the name RandomState in that drafting, but it's not
really necessary, and you've convinced me that we shouldn't do it.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/e944b2e5/attachment-0001.html>

From robert.kern at gmail.com  Mon Jun 11 03:33:08 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 11 Jun 2018 00:33:08 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQhCR2=HVe--YdJe4gLw2ovcwe3pZAG628CJgah3e8sGQw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
 <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
 <CAF6FJithUe_7=O8Oa21PP7ku1807ijH88p9KCJ+eFW3OazeAOw@mail.gmail.com>
 <CABL7CQhCR2=HVe--YdJe4gLw2ovcwe3pZAG628CJgah3e8sGQw@mail.gmail.com>
Message-ID: <CAF6FJiujgp-eSGb_S9CmFWuD94k=FfAhM+C4dbiXRRLtBKNCCA@mail.gmail.com>

On Sun, Jun 10, 2018 at 11:54 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>
> On Sun, Jun 10, 2018 at 11:15 PM, Robert Kern <robert.kern at gmail.com>
wrote:

>> Puzzlingly, the root sin of unconditionally and unavoidably reseeding
for some of these functions is still there even though I showed how and why
to avoid it. This is one reason why I was skeptical that merely documenting
RandomState or StableRandom to only be used for unit tests would work. :-)
>
> Well, no matter what we do, I'm sure that there'll be lots of people who
will still get it wrong:)

Exactly! This is why I objected to leaving RandomState completely alone and
just documenting it for use to generate test data. Inevitably, people will
"get it wrong", so we need to design in anticipation of these failure modes
and provide ways to work around them.

>> Sure. But with my new proposal, we don't have to change it (as much as
I'd like to!). I'll draft up a PR to modify my NEP accordingly.
>
> Sounds good!

Thanks! Your and Josef's feedback on these points has been very helpful.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/def9bbfe/attachment.html>

From njs at pobox.com  Mon Jun 11 03:45:39 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 11 Jun 2018 00:45:39 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQhCR2=HVe--YdJe4gLw2ovcwe3pZAG628CJgah3e8sGQw@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAGzF1udvFouVhnn9iQkJiXC9ouXP9C8ndp4se5h3CAhw7F9uRQ@mail.gmail.com>
 <CABL7CQhJG8y_paPN6PpfChFKXGbwUFSDr42Q3VxfBQXVpPhgJg@mail.gmail.com>
 <CAF6FJiu-x2gQiRfhdQz28ibXXLCO-AqsvaU8e+2TKBoqHRh6hQ@mail.gmail.com>
 <CABL7CQhNvVJvFV++Ha0w=Eqi_wtAEnm6ncsD-kzPn+59JhL41Q@mail.gmail.com>
 <CAF6FJithUe_7=O8Oa21PP7ku1807ijH88p9KCJ+eFW3OazeAOw@mail.gmail.com>
 <CABL7CQhCR2=HVe--YdJe4gLw2ovcwe3pZAG628CJgah3e8sGQw@mail.gmail.com>
Message-ID: <CAPJVwBn_3eB0XDqeQypYkcnyY=2aUxKsH=qrjxX+M2mkK01-mQ@mail.gmail.com>

On Sun, Jun 10, 2018 at 11:53 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Sun, Jun 10, 2018 at 11:15 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> The intention of this code is to shuffle two same-length sequences in the
>> same way. So now if I write my code well to call np.random.seed() once at
>> the start of my program, this function comes along and obliterates that with
>> a fixed seed just so it can reuse the seed again to replicate the shuffle.
>
>
> Yes, that's a big no-no. There are situations conceivable where a library
> has to set a seed, but I think the right pattern in that case would be
> something like
>
> old_state = np.random.get_state()
> np.random.seed(some_int)
> do_stuff()
> np.random.set_state(**old._state)

This will seem to work fine in testing, and then when someone tries to
use your library in a multithreaded program everything will break in
complicated and subtle ways :-(. I really don't think there's any
conceivable situation where a library (as opposed to an application)
can correctly use the global random state.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From josef.pktd at gmail.com  Mon Jun 11 10:26:04 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 11 Jun 2018 10:26:04 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
 <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>
 <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>
Message-ID: <CAMMTP+Cz4wJqhd2pPJwaWWCpUFwAPo-4ZcyBpOcRBK4Sh4T+Pg@mail.gmail.com>

On Mon, Jun 11, 2018 at 2:43 AM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sun, Jun 10, 2018 at 10:36 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Sun, Jun 10, 2018 at 8:04 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>> >
>> > On Sun, Jun 10, 2018 at 6:08 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>> >>
>> >> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>> >> >
>> >> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>> >> >>
>> >> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>> >> >>>
>> >> >>> It may be worth having a look at test suites for scipy,
>> statsmodels, scikit-learn, etc. and estimate how much work this NEP causes
>> those projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
>> >> >>
>> >> >> The problem is that we can't really have an ecosystem with two
>> different general purpose systems.
>> >> >
>> >> > Can't = prefer not to.
>> >>
>> >> I meant what I wrote. :-)
>> >>
>> >> > But yes, that's true. That's not what I was saying though. We want
>> one generic one, and one meant for unit testing only. You can achieve that
>> in two ways:
>> >> > 1. Change the current np.random API to new generic, and add a new
>> RandomStable for unit tests.
>> >> > 2. Add a new generic API, and document the current np.random API as
>> being meant for unit tests only, for other usage <new API> should be
>> preferred.
>> >> >
>> >> > (2) has a couple of pros:
>> >> > - you're not forcing almost every library and end user out there to
>> migrate their unit tests.
>> >>
>> >> But it has the cons that I talked about. RandomState *is* a fully
>> functional general purpose PRNG system. After all, that's its current use.
>> Documenting it as intended to be something else will not change that fact.
>> Documentation alone provides no real impetus to move to the new system
>> outside of the unit tests. And the community does need to move together to
>> the new system in their library code, or else we won't be able to combine
>> libraries together; these PRNG objects need to thread all the way through
>> between code from different authors if we are to write programs with a
>> controlled seed. The failure mode when people don't pay attention to the
>> documentation is that I can no longer write programs that compose these
>> libraries together. That's why I wrote "can't". It's not a mere preference
>> for not having two systems to maintain. It has binary Go/No Go implications
>> for building reproducible programs.
>> >
>> > I strongly suspect you are right, but only because you're asserting
>> "can't" so heavily. I have trouble formulating what would go wrong in case
>> there's two PRNGs used in a single program. It's not described in the NEP,
>> nor in the numpy.random docs (those don't even have any recommendations for
>> best practices listed as far as I can tell - that needs fixing). All you
>> explain in the NEP is that reproducible research isn't helped by the
>> current stream-compat guarantee. So a bit of (probably incorrect) devil's
>> advocate reasoning:
>> > - If there's no stream-compat guarantee, all a user can rely on is the
>> properties of drawing from a seeded PRNG.
>> > - Any use of a PRNG in library code can also only rely on properties
>> > - So now whether in a user's program libraries draw from one or two
>> seeded PRNGs doesn't matter for reproducibility, because those properties
>> don't change.
>>
>> Correctly making a stochastic program reproducible while retaining good
>> statistical properties is difficult. People don't do it well in the best of
>> circumstances. The best way that we've found to manage that difficulty is
>> to instantiate a single stream and use it all throughout your code. Every
>> new stream requires the management of more seeds (unless if we use the
>> fancy new algorithms that have settable stream IDs, but by stipulation, we
>> don't have these in this case). And now I have to thread both of these
>> objects through my code, and pass the right object to each third-party
>> library. These third-party libraries don't know anything about this weird
>> 2-stream workaround that you are doing, so we now have libraries that can't
>> build on each other unless if they are using the same compatible API, even
>> if I can make workarounds to build a program that combines two libraries
>> side-to-side.
>>
>> So yeah, people "can" do this. "It's just a matter of code" as my boss
>> likes to say. But it's making an already-difficult task more difficult.
>>
>
> Okay, that makes more sense to me now. It would be really useful to
> document such best practices and rationales.
>
> Note that scipy.stats distributions allow passing in either a RandomState
> instance or an integer as seed (which will be used for seeding a new
> instance, not for np.random.seed) [1]. That seems like a fine design
> pattern as well, and passing on a seed that way is fairly easy and as good
> for reproducibility as passing in a single PRNG.
>
> [1] https://github.com/scipy/scipy/blob/master/scipy/stats/
> _distn_infrastructure.py#L612
>
>
>> > Also, if there is to be a multi-year transitioning to the new API,
>> would there be two PRNG systems anyway during those years?
>>
>> Sure, but with a deadline and not-just-documentation to motivate
>> transitioning.
>>
>> But if we follow my alternative proposal, there'll be no need for
>> deprecation! You've convinced me to not deprecate RandomState.
>>
>
> That's not how I had read it, but great to hear that!
>
> I just want to change some of its internal implementation details, add a
>> less-stable set of distributions on the side, and a framework of core
>> uniform PRNGs that can be shared by both.
>>
>> >> > - more design freedom for the new generic API. The current one is
>> clearly sub-optimal; in a new one you wouldn't have to expose all the
>> global state/functions that np.random exposes now. You could even restrict
>> it to a single class and put that in the main numpy namespace.
>> >>
>> >> I'm not sure why you are talking about the global state and
>> np.random.* convenience functions. What we do with those functions is out
>> of scope for this NEP and would be talked about it another NEP fully
>> introducing the new system.
>> >
>> > To quote you from one of the first emails in this thread: "
>> > I deliberately left it out of this one as it may, depending on our
>> choices, impinge upon the design of the new PRNG subsystem, which I
>> declared out of scope for this NEP. I have ideas (besides the glib "Let
>> them eat AttributeErrors!"), and now that I think more about it, that does
>> seem like it might be in scope just like the discussion of freezing
>> RandomState and StableRandom are. But I think I'd like to hold that thought
>> a little bit and get a little more screaming^Wfeedback on the core proposal
>> first. I'll return to this in a few days if not sooner.
>> > "
>> >
>> > So consider this some screaming^Wfeedback:)
>>
>> Ahem. Yes, I just remembered I said that. :-) But still, there will be
>> lots of options about what to do with np.random.*, whatever proposal we go
>> with. It doesn't really impose constraints on the core proposals.
>>
>> >> >> To properly use pseudorandom numbers, I need to instantiate a PRNG
>> and thread it through all of the code in my program: both the parts that I
>> write and the third party libraries that I don't write.
>> >> >>
>> >> >> Generating test data for unit tests is separable, though. That's
>> why I propose having a StableRandom built on the new architecture. Its
>> purpose would be well-documented, and in my proposal is limited in features
>> such that it will be less likely to be abused outside of that purpose. If
>> you make it fully-featured, it is more likely to be abused by building
>> library code around it. But even if it is so abused, because it is built on
>> the new architecture, at least I can thread the same core PRNG state
>> through the StableRandom distributions from the abusing library and use the
>> better distributions class elsewhere (randomgen names it "Generator"). Just
>> keeping RandomState around can't work like that because it doesn't have a
>> replaceable core PRNG.
>> >> >>
>> >> >> But that does suggest another alternative that we should explore:
>> >> >>
>> >> >> The new architecture separates the core uniform PRNG from the wide
>> variety of non-uniform probability distributions. That is, the core PRNG
>> state is encapsulated in a discrete object that can be shared between
>> instances of different distribution-providing classes. numpy.random should
>> provide two such distribution-providing classes. The main one (let us call
>> it ``Generator``, as it is called in the prototype) will follow the new
>> policy: distribution methods can break the stream in feature releases.
>> There will also be a secondary distributions class (let us call it
>> ``LegacyGenerator``) which contains distribution methods exactly as they
>> exist in the current ``RandomState`` implementation. When one combines
>> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
>> exact same stream as ``RandomState`` for all distribution methods. The
>> ``LegacyGenerator`` methods will be forever frozen.
>> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
>> the MT19937 core PRNG, and whatever tricks needed to make
>> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
>> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
>> deprecated, becoming progressively noisier over a number of release cycles,
>> in favor of explicitly instantiating ``LegacyGenerator``.
>> >> >>
>> >> >> ``LegacyGenerator`` CAN be used during this deprecation period in
>> library and application code until libraries and applications can migrate
>> to the new ``Generator``. Libraries and applications SHOULD migrate but
>> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
>> data for unit tests where cross-release stability of the streams is
>> important. Test writers SHOULD consider ways to mitigate their reliance on
>> such stability and SHOULD limit their usage to distribution methods that
>> have fewer cross-platform stability risks.
>> >>
>> >> I would appreciate your consideration of this proposal. Does it
>> address your concerns? It addresses my concerns with keeping around a
>> fully-functional RandomState implementation.
>> >
>> > My concerns are:
>> > 1. The amount of work caused by making libraries and end users migrate.
>> > 2. That this is a backwards compatibility break, which will cause
>> problems for users who relied on the old guarantees (the arguments in the
>> NEP that the old guarantees weren't 100% watertight don't mean that
>> backcompat doesn't matter at all).
>> >
>> > As far as I can tell, this new proposal doesn't deal with those
>> concerns directly. What it does seem to do is making transitioning a bit
>> easier for users that were already using RandomState instances.
>>
>> Let me drop the deprecation of the name RandomState.
>> RandomState(int_seed) will forever and always create a backwards- and
>> stream-compatible object. No one will have to migrate.
>>
>> How does that strike you?
>>
>
> Sounds good.
>


I'm trying to catch up here but I'm not sure what the latest version of the
proposal is.

IMO we need a stable stream of random numbers for the various distribution
forever. Talking about deprecation misses the point that we don't want to
have to migrate our unit tests to a "non-stable" stream of random numbers.

In terms of user API we need some instance of a random state or random
generator that can be used with scikit-learn's check_random_state (which
was copied to scipy and will be copied to statsmodels when we get around to
it.)

IMO naming or pure API changes are fine, with deprecation of the old
"style", as long as the changes can be done mechanically, e.g. adding
"legacy" somewhere in the names or options.
E.g. for scikit-learn and scipy.stats it might be just a small change in
the check_random_state function, but maybe more changes in the unit tests
that actually use and create a Random stream.

Implementation
I don't know or didn't pay enough attention to the details.

The proposal sounds now like separating the distribution rvs generation
from the underlying random stream. I had thought that was already in the
proposal.

If I were writing this for statsmodels, then I would hand a `method`
keyword around that defaults to `method=None` which uses the latest and
greatest available method independent of backwards compatibility, and
method='stable' or method='legacy' as alternative. And maybe some
distribution specific methods.
This is separate from the option which underlying PRNG method to use.

IIUC, the choices in the proposal are now 3 combinations

- legacy:  MT19937 core PRNG + distribution_method='stable'
- mixed:  MT19937 core PRNG + distribution_method=None
- new:  ??? core PRNG + distribution_method=None

where the second might be just a special case of the third option, so it
reduces to binary choice.

aside to
> The best way that we've found to manage that difficulty is to instantiate
a single stream and use it all throughout your code.

First, with check_random_state option it's up to a user

As a user I have cases where I the use cases are independent and it doesn't
matter if it uses the same seed.
As a user I have cases where I would prefer if two methods use the same
random stream. (e.g. bootstrap confidence intervals computed with two
different methods where I wouldn't want the difference to come from
different random streams when I compare them.)
Also as a user, in some cases I used two different RandomState instances to
get random numbers for different parts of the simulation. (Example: I
generate y and x for a regression simulation separately, so that when I
increase the number of observations, the initial sample stays the same,
i.e. is recreated each time.)

Some of this might make it into library code when statsmodels gets more
simulation and bootstrap methods.

Josef


>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/76fbd0cb/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 11 10:59:49 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 11 Jun 2018 10:59:49 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
Message-ID: <CAJNV+9t2qP7O_S=4PSCA9HetKTfe1hqfR3b1jmgueLruE9M_Vg@mail.gmail.com>

>
> Nathaniel:
>
> Output shape feels very similar to
> output dtype to me, so maybe the general way to handle this would be
> to make the first callback take the input shapes+dtypes and return the
> desired output shapes+dtypes?
>
> This hits on an interesting alternative to frozen dimensions - np.cross
> could just become a regular ufunc with signature np.dtype((float64, 3)),
> np.dtype((float64, 3)) &rarr; np.dtype((float64, 3))
>
As you note further down, the present proposal of just using numbers has
the advantage of being clear and easy. Another (small?) advantage is that I
can use `axis` to tell where my three coordinates are, rather than be stuck
with having them as the last dimension.

Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy
routines, I started with just making every 3-vector and 3x3-matrix
structured arrays with the relevant single sub-array entry. That worked,
but I ended up disliking the casting to and fro.


> Furthermore, the expansion quickly becomes cumbersome. For instance, for
> the all_equal signature of (n|1),(n|1)->() ?
>
> I think this is only a good argument when used in conjunction with the
> broadcasting syntax. I don?t think it?s a reason for matmul not to have
> multiple signatures. Having multiple signatures is an disincentive to
> introduced too many overloads of the same function, which seems like a good
> thing to me
>
But implementation for matmul is actually considerably trickier, since the
internal loop now has to check the number of distinct dimensions.


> Summarizing my overall opinions:
>
>    - I?m +0.5 on frozen dimensions. The use-cases seem reasonable, and it
>    seems like an easy-ish way to get them. Allowing ufuncs to natively support
>    subarray types might be a tidier solution, but that could come down the road
>
> Indeed, they are not mutually exclusive. My guess would be that the use
cases would be somewhat different.


>
>    - I?m -1 on optional dimensions: they seem to legitimize creating many
>    overloads of gufuncs. I?m already not a fan of how matmul has special cases
>    for lower dimensions that don?t generalize well. To me, the best way to
>    handle matmul would be to use the proposed __array_function__ to
>    handle the shape-based special-case dispatching, either by:
>       - Inserting dimensions, and calling the true gufunc
>       np.linalg.matmul_2d (which is a function I?d like direct access to
>       anyway).
>       - Dispatching to one of four ufuncs
>
> I must admit I wish that `@` was just pure matrix multiplication...  But
otherwise agree with Stephan as optional dimensions being the least-bad
solution.

Aside: do agree we should think about how to expose the `linalg` gufuncs.

>
>    - Broadcasting dimensions:
>       - I know you?re not suggesting this but: enabling broadcasting
>       unconditionally for all gufuncs would be a bad idea, masking linalg bugs.
>       (although einsum does support broadcasting?)
>
> Indeed, definitely *not* suggesting that!


>
>    -
>       - Does it really need a per-dimension flag, rather than a global
>       one? Can you give a case where that?s useful?
>
> Mostly simply that the implementation is easier given the optional
dimensions... Also, it has the benefit of being clear what the function can
handle by inspection of the signature, i.e., it self-documents better (one
of my main arguments in favour of frozen dimensions...).


>
>    -
>       - If we?d already made all_equal a gufunc, I?d be +1 on adding
>       broadcasting support to it
>       - I?m -0.5 on the all_equal path in the first place. I think we
>       either should have a more generic approach to combined ufuncs, or just
>       declare them numbas job.
>
> I am working on and off on a way to generically chain ufuncs (goal would
be to auto-create an inner loop that calls all the chained ufuncs loops in
turn). Not sure that short-circuiting will be all that easy.

I actually quite like the all_equal ufunc, but it is in part because I
remember discovering how painfully slow (a==b).all() was (and still have a
place where I would use it if it existed). And it does fit in the
(admittedly vague) plans to try to make `.reduce` a gufunc.

>
>    -
>       - Can you come up with a broadcasting use-case that isn?t just
>       chaining a reduction with a broadcasting ufunc?
>
> Perhaps the use is that it allows people to write gufuncs that are like
such functions... Absent a mechanism to chain ufuncs, more complicated
gufuncs are currently the easiest way to get fast more complicated algebra.

But perhaps a putative

weighted_mean(y, sigma) -> mean, sigma_mean

is a decent example? Its signature would be

(n),(n)->(),()

but then you're forced to give individual sigmas for each point. With

(n|1),(n|1)->(),()

you are no longer forced to do that (though the case of all y being the
same is less than useful here... I did at some point have an implementation
that worked by core dimension of each argument, but ended up feeling it was
not worth the extra complication)

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/fcc7b4b2/attachment.html>

From josef.pktd at gmail.com  Mon Jun 11 11:00:52 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 11 Jun 2018 11:00:52 -0400
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAMMTP+Cz4wJqhd2pPJwaWWCpUFwAPo-4ZcyBpOcRBK4Sh4T+Pg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJivyupqjGdbDSa07zRRkrKuJP5chvYSuZZBkL-=hg6r1zw@mail.gmail.com>
 <CAMMTP+BsZap5b7ShgCvBanTCkgsGTwAronro-Q4o4ze2v0yKzg@mail.gmail.com>
 <CAF6FJithb1NJ5CmS17jMRGNHmVQmPxoYHYMUDM71boqvcgLJWA@mail.gmail.com>
 <CAMMTP+CRfLp3wrL9yrz+36z-6LqnT8GXT0DsWHFKhJmGanauLg@mail.gmail.com>
 <CABL7CQiHAXcF2+6bOSh=wPVYNGwbznwZpOGOvJKQKkOzXBKhjQ@mail.gmail.com>
 <CAF6FJiu8-FK7twFxqSqJyb1RXeFDEhGU=5NKXv2FKujj=1G7Tw@mail.gmail.com>
 <CABL7CQjdRMKKtLLoNNYZoCnTPiTahQeozXM8YJ6AX0fE25qisA@mail.gmail.com>
 <CAF6FJisndqmaT4pFyK52NSngWWMg2AGUG6HKqGxOH5KxD=LTEQ@mail.gmail.com>
 <CABL7CQgGjy51ByZ99dUD-tDdRHu9kOGguGuWRpYVHebG2-bTEg@mail.gmail.com>
 <CAF6FJiteU8_skAwtC2FnddZ=PODEAYySE87RbMKpCCKnP8J-Lw@mail.gmail.com>
 <CABL7CQj_SAns+ihCn7RVJXyOx_C=uukVt-u2QtNsr+_MxG8m0Q@mail.gmail.com>
 <CAMMTP+Cz4wJqhd2pPJwaWWCpUFwAPo-4ZcyBpOcRBK4Sh4T+Pg@mail.gmail.com>
Message-ID: <CAMMTP+Du0ZURiXA_Z1zAMgkaSOaU9gLLJPVcw_sBdMLLvsUbog@mail.gmail.com>

On Mon, Jun 11, 2018 at 10:26 AM, <josef.pktd at gmail.com> wrote:

>
>
> On Mon, Jun 11, 2018 at 2:43 AM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jun 10, 2018 at 10:36 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Sun, Jun 10, 2018 at 8:04 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>> >
>>> > On Sun, Jun 10, 2018 at 6:08 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>> >>
>>> >> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>> >> >
>>> >> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <
>>> ralf.gommers at gmail.com> wrote:
>>> >> >>>
>>> >> >>> It may be worth having a look at test suites for scipy,
>>> statsmodels, scikit-learn, etc. and estimate how much work this NEP causes
>>> those projects. If the devs of those packages are forced to do large scale
>>> migrations from RandomState to StableState, then why not instead keep
>>> RandomState and just add a new API next to it?
>>> >> >>
>>> >> >> The problem is that we can't really have an ecosystem with two
>>> different general purpose systems.
>>> >> >
>>> >> > Can't = prefer not to.
>>> >>
>>> >> I meant what I wrote. :-)
>>> >>
>>> >> > But yes, that's true. That's not what I was saying though. We want
>>> one generic one, and one meant for unit testing only. You can achieve that
>>> in two ways:
>>> >> > 1. Change the current np.random API to new generic, and add a new
>>> RandomStable for unit tests.
>>> >> > 2. Add a new generic API, and document the current np.random API as
>>> being meant for unit tests only, for other usage <new API> should be
>>> preferred.
>>> >> >
>>> >> > (2) has a couple of pros:
>>> >> > - you're not forcing almost every library and end user out there to
>>> migrate their unit tests.
>>> >>
>>> >> But it has the cons that I talked about. RandomState *is* a fully
>>> functional general purpose PRNG system. After all, that's its current use.
>>> Documenting it as intended to be something else will not change that fact.
>>> Documentation alone provides no real impetus to move to the new system
>>> outside of the unit tests. And the community does need to move together to
>>> the new system in their library code, or else we won't be able to combine
>>> libraries together; these PRNG objects need to thread all the way through
>>> between code from different authors if we are to write programs with a
>>> controlled seed. The failure mode when people don't pay attention to the
>>> documentation is that I can no longer write programs that compose these
>>> libraries together. That's why I wrote "can't". It's not a mere preference
>>> for not having two systems to maintain. It has binary Go/No Go implications
>>> for building reproducible programs.
>>> >
>>> > I strongly suspect you are right, but only because you're asserting
>>> "can't" so heavily. I have trouble formulating what would go wrong in case
>>> there's two PRNGs used in a single program. It's not described in the NEP,
>>> nor in the numpy.random docs (those don't even have any recommendations for
>>> best practices listed as far as I can tell - that needs fixing). All you
>>> explain in the NEP is that reproducible research isn't helped by the
>>> current stream-compat guarantee. So a bit of (probably incorrect) devil's
>>> advocate reasoning:
>>> > - If there's no stream-compat guarantee, all a user can rely on is the
>>> properties of drawing from a seeded PRNG.
>>> > - Any use of a PRNG in library code can also only rely on properties
>>> > - So now whether in a user's program libraries draw from one or two
>>> seeded PRNGs doesn't matter for reproducibility, because those properties
>>> don't change.
>>>
>>> Correctly making a stochastic program reproducible while retaining good
>>> statistical properties is difficult. People don't do it well in the best of
>>> circumstances. The best way that we've found to manage that difficulty is
>>> to instantiate a single stream and use it all throughout your code. Every
>>> new stream requires the management of more seeds (unless if we use the
>>> fancy new algorithms that have settable stream IDs, but by stipulation, we
>>> don't have these in this case). And now I have to thread both of these
>>> objects through my code, and pass the right object to each third-party
>>> library. These third-party libraries don't know anything about this weird
>>> 2-stream workaround that you are doing, so we now have libraries that can't
>>> build on each other unless if they are using the same compatible API, even
>>> if I can make workarounds to build a program that combines two libraries
>>> side-to-side.
>>>
>>> So yeah, people "can" do this. "It's just a matter of code" as my boss
>>> likes to say. But it's making an already-difficult task more difficult.
>>>
>>
>> Okay, that makes more sense to me now. It would be really useful to
>> document such best practices and rationales.
>>
>> Note that scipy.stats distributions allow passing in either a RandomState
>> instance or an integer as seed (which will be used for seeding a new
>> instance, not for np.random.seed) [1]. That seems like a fine design
>> pattern as well, and passing on a seed that way is fairly easy and as good
>> for reproducibility as passing in a single PRNG.
>>
>> [1] https://github.com/scipy/scipy/blob/master/scipy/stats/_
>> distn_infrastructure.py#L612
>>
>>
>>> > Also, if there is to be a multi-year transitioning to the new API,
>>> would there be two PRNG systems anyway during those years?
>>>
>>> Sure, but with a deadline and not-just-documentation to motivate
>>> transitioning.
>>>
>>> But if we follow my alternative proposal, there'll be no need for
>>> deprecation! You've convinced me to not deprecate RandomState.
>>>
>>
>> That's not how I had read it, but great to hear that!
>>
>> I just want to change some of its internal implementation details, add a
>>> less-stable set of distributions on the side, and a framework of core
>>> uniform PRNGs that can be shared by both.
>>>
>>> >> > - more design freedom for the new generic API. The current one is
>>> clearly sub-optimal; in a new one you wouldn't have to expose all the
>>> global state/functions that np.random exposes now. You could even restrict
>>> it to a single class and put that in the main numpy namespace.
>>> >>
>>> >> I'm not sure why you are talking about the global state and
>>> np.random.* convenience functions. What we do with those functions is out
>>> of scope for this NEP and would be talked about it another NEP fully
>>> introducing the new system.
>>> >
>>> > To quote you from one of the first emails in this thread: "
>>> > I deliberately left it out of this one as it may, depending on our
>>> choices, impinge upon the design of the new PRNG subsystem, which I
>>> declared out of scope for this NEP. I have ideas (besides the glib "Let
>>> them eat AttributeErrors!"), and now that I think more about it, that does
>>> seem like it might be in scope just like the discussion of freezing
>>> RandomState and StableRandom are. But I think I'd like to hold that thought
>>> a little bit and get a little more screaming^Wfeedback on the core proposal
>>> first. I'll return to this in a few days if not sooner.
>>> > "
>>> >
>>> > So consider this some screaming^Wfeedback:)
>>>
>>> Ahem. Yes, I just remembered I said that. :-) But still, there will be
>>> lots of options about what to do with np.random.*, whatever proposal we go
>>> with. It doesn't really impose constraints on the core proposals.
>>>
>>> >> >> To properly use pseudorandom numbers, I need to instantiate a PRNG
>>> and thread it through all of the code in my program: both the parts that I
>>> write and the third party libraries that I don't write.
>>> >> >>
>>> >> >> Generating test data for unit tests is separable, though. That's
>>> why I propose having a StableRandom built on the new architecture. Its
>>> purpose would be well-documented, and in my proposal is limited in features
>>> such that it will be less likely to be abused outside of that purpose. If
>>> you make it fully-featured, it is more likely to be abused by building
>>> library code around it. But even if it is so abused, because it is built on
>>> the new architecture, at least I can thread the same core PRNG state
>>> through the StableRandom distributions from the abusing library and use the
>>> better distributions class elsewhere (randomgen names it "Generator"). Just
>>> keeping RandomState around can't work like that because it doesn't have a
>>> replaceable core PRNG.
>>> >> >>
>>> >> >> But that does suggest another alternative that we should explore:
>>> >> >>
>>> >> >> The new architecture separates the core uniform PRNG from the wide
>>> variety of non-uniform probability distributions. That is, the core PRNG
>>> state is encapsulated in a discrete object that can be shared between
>>> instances of different distribution-providing classes. numpy.random should
>>> provide two such distribution-providing classes. The main one (let us call
>>> it ``Generator``, as it is called in the prototype) will follow the new
>>> policy: distribution methods can break the stream in feature releases.
>>> There will also be a secondary distributions class (let us call it
>>> ``LegacyGenerator``) which contains distribution methods exactly as they
>>> exist in the current ``RandomState`` implementation. When one combines
>>> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
>>> exact same stream as ``RandomState`` for all distribution methods. The
>>> ``LegacyGenerator`` methods will be forever frozen.
>>> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
>>> the MT19937 core PRNG, and whatever tricks needed to make
>>> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
>>> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
>>> deprecated, becoming progressively noisier over a number of release cycles,
>>> in favor of explicitly instantiating ``LegacyGenerator``.
>>> >> >>
>>> >> >> ``LegacyGenerator`` CAN be used during this deprecation period in
>>> library and application code until libraries and applications can migrate
>>> to the new ``Generator``. Libraries and applications SHOULD migrate but
>>> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
>>> data for unit tests where cross-release stability of the streams is
>>> important. Test writers SHOULD consider ways to mitigate their reliance on
>>> such stability and SHOULD limit their usage to distribution methods that
>>> have fewer cross-platform stability risks.
>>> >>
>>> >> I would appreciate your consideration of this proposal. Does it
>>> address your concerns? It addresses my concerns with keeping around a
>>> fully-functional RandomState implementation.
>>> >
>>> > My concerns are:
>>> > 1. The amount of work caused by making libraries and end users migrate.
>>> > 2. That this is a backwards compatibility break, which will cause
>>> problems for users who relied on the old guarantees (the arguments in the
>>> NEP that the old guarantees weren't 100% watertight don't mean that
>>> backcompat doesn't matter at all).
>>> >
>>> > As far as I can tell, this new proposal doesn't deal with those
>>> concerns directly. What it does seem to do is making transitioning a bit
>>> easier for users that were already using RandomState instances.
>>>
>>> Let me drop the deprecation of the name RandomState.
>>> RandomState(int_seed) will forever and always create a backwards- and
>>> stream-compatible object. No one will have to migrate.
>>>
>>> How does that strike you?
>>>
>>
>> Sounds good.
>>
>
>
> I'm trying to catch up here but I'm not sure what the latest version of
> the proposal is.
>
> IMO we need a stable stream of random numbers for the various distribution
> forever. Talking about deprecation misses the point that we don't want to
> have to migrate our unit tests to a "non-stable" stream of random numbers.
>
> In terms of user API we need some instance of a random state or random
> generator that can be used with scikit-learn's check_random_state (which
> was copied to scipy and will be copied to statsmodels when we get around to
> it.)
>
> IMO naming or pure API changes are fine, with deprecation of the old
> "style", as long as the changes can be done mechanically, e.g. adding
> "legacy" somewhere in the names or options.
> E.g. for scikit-learn and scipy.stats it might be just a small change in
> the check_random_state function, but maybe more changes in the unit tests
> that actually use and create a Random stream.
>
> Implementation
> I don't know or didn't pay enough attention to the details.
>
> The proposal sounds now like separating the distribution rvs generation
> from the underlying random stream. I had thought that was already in the
> proposal.
>
> If I were writing this for statsmodels, then I would hand a `method`
> keyword around that defaults to `method=None` which uses the latest and
> greatest available method independent of backwards compatibility, and
> method='stable' or method='legacy' as alternative. And maybe some
> distribution specific methods.
> This is separate from the option which underlying PRNG method to use.
>
> IIUC, the choices in the proposal are now 3 combinations
>
> - legacy:  MT19937 core PRNG + distribution_method='stable'
> - mixed:  MT19937 core PRNG + distribution_method=None
> - new:  ??? core PRNG + distribution_method=None
>
> where the second might be just a special case of the third option, so it
> reduces to binary choice.
>
> aside to
> > The best way that we've found to manage that difficulty is to
> instantiate a single stream and use it all throughout your code.
>
> First, with check_random_state option it's up to a user
>
> As a user I have cases where I the use cases are independent and it
> doesn't matter if it uses the same seed.
> As a user I have cases where I would prefer if two methods use the same
> random stream. (e.g. bootstrap confidence intervals computed with two
> different methods where I wouldn't want the difference to come from
> different random streams when I compare them.)
> Also as a user, in some cases I used two different RandomState instances
> to get random numbers for different parts of the simulation. (Example: I
> generate y and x for a regression simulation separately, so that when I
> increase the number of observations, the initial sample stays the same,
> i.e. is recreated each time.)
>
> Some of this might make it into library code when statsmodels gets more
> simulation and bootstrap methods.
>


>  Test writers SHOULD consider ways to mitigate their reliance on such
stability and SHOULD limit their usage to distribution methods that have
fewer cross-platform stability risks.

Is there somewhere a list on what might be unstable across platforms?

In statsmodels we struggle quite a bit with cross-platform problems in the
unit tests. But most of them are because many test tolerances are pretty
tight, and then e.g. linalg noise might fluctuate too much across machines
and LAPACK versions. Other cases are because behavior in not nice cases
differs across machines and versions, those cases are sometimes added
intentionally and sometimes by accident.

But I don't think we had a problem because of random number generation.
E.g. for integers we are mostly limited to small numbers, either like in
Poisson because exp might overflow, or because test cases are usually small
for speed reasons.

Josef


>
> Josef
>
>
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/31bedbbd/attachment-0001.html>

From matti.picus at gmail.com  Mon Jun 11 13:10:34 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Mon, 11 Jun 2018 10:10:34 -0700
Subject: [Numpy-discussion] 1.14.5 bugfix release
Message-ID: <502735bc-e86c-7060-3386-6ffda5b04731@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/11e156e8/attachment.html>

From charlesr.harris at gmail.com  Mon Jun 11 14:13:22 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 11 Jun 2018 12:13:22 -0600
Subject: [Numpy-discussion] 1.14.5 bugfix release
In-Reply-To: <502735bc-e86c-7060-3386-6ffda5b04731@gmail.com>
References: <502735bc-e86c-7060-3386-6ffda5b04731@gmail.com>
Message-ID: <CAB6mnxK1+BGm66oV9Mtde-cpRMqda2OPKqK=8tzhMrdvwM36qg@mail.gmail.com>

On Mon, Jun 11, 2018 at 11:10 AM, Matti Picus <matti.picus at gmail.com> wrote:

> If there is a desire to do a bug-fix release 1.14.5 I would like to try my
> hand at releasing it, using doc/RELEASE_WALKTHROUGH.rst.txt. There were a
> few issues around compiling 1.14.4 on alpine and NetBSD.
> Since 1.15 will probably be released soon, do we continue to push these
> kind of bug fixes releases to 1.14.x?
> Matti
>

We only need to make the release to fix the regressions. I was going to do
it today/tomorrow as I think we have now covered all paths through the ifs.
Usually it takes about 2-4 weeks for bug reports to settle out, but a think
we can be a bit sooner here and the next release will be 1.15.

If you want to give it a shot, go ahead. We need more people with some
experience in the process, not to mention new perspectives on the
walkthrough. I expect most of your time will be spent getting set up. I
think you will also need commit privileges on `MacPython/numpy-wheels`,
ping Matthew Brett for those. If you run into problems, let me know.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/32e704ca/attachment.html>

From wieser.eric+numpy at gmail.com  Tue Jun 12 02:35:56 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Mon, 11 Jun 2018 23:35:56 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9t2qP7O_S=4PSCA9HetKTfe1hqfR3b1jmgueLruE9M_Vg@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAJNV+9t2qP7O_S=4PSCA9HetKTfe1hqfR3b1jmgueLruE9M_Vg@mail.gmail.com>
Message-ID: <CAL1kJvBmRNqkqqDBX2PhaFFc4i2ct0wi7YqTLSR4Mhb3M46qYA@mail.gmail.com>

Frozen dimensions:

I started with just making every 3-vector and 3x3-matrix structured arrays
with the relevant single sub-array entry

I was actually suggesting omitting the structured dtype (ie, field names)
altogether, and just using the subarray dtypes (which exist alone, but not
in arrays).

Another (small?) advantage is that I can use `axis

This is a fair argument against my proposal - at any rate, I think we?d
need a better story for subarray dtypes before trying to add support to
them for ufuncs
------------------------------

Broadcasting dimensions

But perhaps a putative weighted_mean ? is a decent example

That?s fairly convincing as a non-chained ufunc case. Can you add an
example like that to the NEP?

Also, it has the benefit of being clear what the function can handle by
inspection of the signature

Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can you
come up with an example where only some dimensions make sense to broadcast?
------------------------------

Eric
?

On Mon, 11 Jun 2018 at 08:04 Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:

> Nathaniel:
>>
>> Output shape feels very similar to
>> output dtype to me, so maybe the general way to handle this would be
>> to make the first callback take the input shapes+dtypes and return the
>> desired output shapes+dtypes?
>>
>> This hits on an interesting alternative to frozen dimensions - np.cross
>> could just become a regular ufunc with signature np.dtype((float64, 3)),
>> np.dtype((float64, 3)) &rarr; np.dtype((float64, 3))
>>
> As you note further down, the present proposal of just using numbers has
> the advantage of being clear and easy. Another (small?) advantage is that I
> can use `axis` to tell where my three coordinates are, rather than be stuck
> with having them as the last dimension.
>
> Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy
> routines, I started with just making every 3-vector and 3x3-matrix
> structured arrays with the relevant single sub-array entry. That worked,
> but I ended up disliking the casting to and fro.
>
>
>> Furthermore, the expansion quickly becomes cumbersome. For instance, for
>> the all_equal signature of (n|1),(n|1)->() ?
>>
>> I think this is only a good argument when used in conjunction with the
>> broadcasting syntax. I don?t think it?s a reason for matmul not to have
>> multiple signatures. Having multiple signatures is an disincentive to
>> introduced too many overloads of the same function, which seems like a good
>> thing to me
>>
> But implementation for matmul is actually considerably trickier, since the
> internal loop now has to check the number of distinct dimensions.
>
>
>> Summarizing my overall opinions:
>>
>>    - I?m +0.5 on frozen dimensions. The use-cases seem reasonable, and
>>    it seems like an easy-ish way to get them. Allowing ufuncs to natively
>>    support subarray types might be a tidier solution, but that could come down
>>    the road
>>
>> Indeed, they are not mutually exclusive. My guess would be that the use
> cases would be somewhat different.
>
>
>>
>>    - I?m -1 on optional dimensions: they seem to legitimize creating
>>    many overloads of gufuncs. I?m already not a fan of how matmul has special
>>    cases for lower dimensions that don?t generalize well. To me, the best way
>>    to handle matmul would be to use the proposed __array_function__ to
>>    handle the shape-based special-case dispatching, either by:
>>       - Inserting dimensions, and calling the true gufunc
>>       np.linalg.matmul_2d (which is a function I?d like direct access to
>>       anyway).
>>       - Dispatching to one of four ufuncs
>>
>> I must admit I wish that `@` was just pure matrix multiplication...  But
> otherwise agree with Stephan as optional dimensions being the least-bad
> solution.
>
> Aside: do agree we should think about how to expose the `linalg` gufuncs.
>
>>
>>    - Broadcasting dimensions:
>>       - I know you?re not suggesting this but: enabling broadcasting
>>       unconditionally for all gufuncs would be a bad idea, masking linalg bugs.
>>       (although einsum does support broadcasting?)
>>
>> Indeed, definitely *not* suggesting that!
>
>
>>
>>    -
>>       - Does it really need a per-dimension flag, rather than a global
>>       one? Can you give a case where that?s useful?
>>
>> Mostly simply that the implementation is easier given the optional
> dimensions... Also, it has the benefit of being clear what the function can
> handle by inspection of the signature, i.e., it self-documents better (one
> of my main arguments in favour of frozen dimensions...).
>
>
>>
>>    -
>>       - If we?d already made all_equal a gufunc, I?d be +1 on adding
>>       broadcasting support to it
>>       - I?m -0.5 on the all_equal path in the first place. I think we
>>       either should have a more generic approach to combined ufuncs, or just
>>       declare them numbas job.
>>
>> I am working on and off on a way to generically chain ufuncs (goal would
> be to auto-create an inner loop that calls all the chained ufuncs loops in
> turn). Not sure that short-circuiting will be all that easy.
>
> I actually quite like the all_equal ufunc, but it is in part because I
> remember discovering how painfully slow (a==b).all() was (and still have a
> place where I would use it if it existed). And it does fit in the
> (admittedly vague) plans to try to make `.reduce` a gufunc.
>
>>
>>    -
>>       - Can you come up with a broadcasting use-case that isn?t just
>>       chaining a reduction with a broadcasting ufunc?
>>
>> Perhaps the use is that it allows people to write gufuncs that are like
> such functions... Absent a mechanism to chain ufuncs, more complicated
> gufuncs are currently the easiest way to get fast more complicated algebra.
>
> But perhaps a putative
>
> weighted_mean(y, sigma) -> mean, sigma_mean
>
> is a decent example? Its signature would be
>
> (n),(n)->(),()
>
> but then you're forced to give individual sigmas for each point. With
>
> (n|1),(n|1)->(),()
>
> you are no longer forced to do that (though the case of all y being the
> same is less than useful here... I did at some point have an implementation
> that worked by core dimension of each argument, but ended up feeling it was
> not worth the extra complication)
>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/a05e705a/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Tue Jun 12 02:59:36 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Mon, 11 Jun 2018 23:59:36 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>
Message-ID: <CAL1kJvADe=-q+Ydx_xyOURctjrNnT+rdr-DU8Da0hZB3wXTWuw@mail.gmail.com>

I don?t understand your alternative here. If we overload np.matmul using
*array_function*, then it would not use *ether* of these options for
writing the operation in terms of other gufuncs. It would simply look for
an *array_function* attribute, and call that method instead.

Let me explain that suggestion a little more clearly.

   1. There?d be a linalg.matmul2d that performs the real matrix case,
   which would be easy to make as a ufunc right now.
   2. __matmul__ and __rmatmul__ would just call np.matmul, as they
   currently do (for consistency between np.matmul and operator.matmul,
   needed in python pre- at -operator)
   3. np.matmul would be implemented as:

   @do_array_function_overridesdef matmul(a, b):
       if a.ndim != 1 and b.ndim != 1:
           return matmul2d(a, b)
       elif a.ndim != 1:
           return matmul2d(a, b[:,None])[...,0]
       elif b.ndim != 1:
           return matmul2d(a[None,:], b)
       else:
           # this one probably deserves its own ufunf
           return matmul2d(a[None,:], b[:,None])[0,0]

   4. Quantity can just override __array_ufunc__ as with any other ufunc
   5. DataArray, knowing the above doesn?t work, would implement something
   like

   @matmul.register_array_function(DataArray)def __array_function__(a, b):
       if a.ndim != 1 and b.ndim != 1:
           return matmul2d(a, b)
       else:
           # either:
           # - add/remove dummy dimensions in a dataarray-specific way
           # - downcast to ndarray and do the dimension juggling there


Advantages of this approach:

   -

   Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop,
   need to know about optional dimensions.
   -

   We get a matmul2d ufunc, that all subclasses support out of the box if
   they support matmul

Eric
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/9e8d6f70/attachment.html>

From charlesr.harris at gmail.com  Tue Jun 12 17:26:25 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 12 Jun 2018 15:26:25 -0600
Subject: [Numpy-discussion] SciPy 2018
Message-ID: <CAB6mnx+PiJnvSk_Yq2x3WuuCoFWZz6VSr-y6weA5XWOoqOpkxA@mail.gmail.com>

Hi All,

Thought I'd raise the topic of meeting up at SciPy 2018. I wasn't planning
on registering for the main conference, but would be happy to fly down for
a couple of days if we plan on a meetup during sprints or some other point
in the conference schedule.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180612/345487ec/attachment.html>

From matti.picus at gmail.com  Tue Jun 12 17:40:09 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 12 Jun 2018 14:40:09 -0700
Subject: [Numpy-discussion] SciPy 2018
In-Reply-To: <CAB6mnx+PiJnvSk_Yq2x3WuuCoFWZz6VSr-y6weA5XWOoqOpkxA@mail.gmail.com>
References: <CAB6mnx+PiJnvSk_Yq2x3WuuCoFWZz6VSr-y6weA5XWOoqOpkxA@mail.gmail.com>
Message-ID: <ebc29070-375f-04d9-79bc-25b59f9aebe8@gmail.com>

On 12/06/18 14:26, Charles R Harris wrote:
> Hi All,
>
> Thought I'd raise the topic of meeting up at SciPy 2018. I wasn't 
> planning on registering for the main conference, but would be happy to 
> fly down for a couple of days if we plan on a meetup during sprints or 
> some other point in the conference schedule.
>
> Chuck
>
There will be a NumPy sprint July 14-15. I have requested a BOF room.
For the BOF, I hoped to continue the discussion of the NumPy roadmap 
https://github.com/numpy/numpy/wiki/NumPy-Roadmap as well as provide a 
forum to meet in person.

Matti

From matti.picus at gmail.com  Tue Jun 12 18:22:07 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 12 Jun 2018 15:22:07 -0700
Subject: [Numpy-discussion] Permissions to upload to PyPI
Message-ID: <cc01f92d-f2e0-a119-1138-ad95fc7efcbf@gmail.com>

Almost ready to finish the 1.14.5 release, but it seems I need 
permissions to upload to PyPI (makes sense). My user name there is 
mattip. Can someone help out?
Matti

From charlesr.harris at gmail.com  Tue Jun 12 18:26:18 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 12 Jun 2018 16:26:18 -0600
Subject: [Numpy-discussion] Permissions to upload to PyPI
In-Reply-To: <cc01f92d-f2e0-a119-1138-ad95fc7efcbf@gmail.com>
References: <cc01f92d-f2e0-a119-1138-ad95fc7efcbf@gmail.com>
Message-ID: <CAB6mnx+Tt3q4QLM4YnqmU7y6-vkNQN+Ch=168JBsZyP=Jxsvgw@mail.gmail.com>

On Tue, Jun 12, 2018 at 4:22 PM, Matti Picus <matti.picus at gmail.com> wrote:

> Almost ready to finish the 1.14.5 release, but it seems I need permissions
> to upload to PyPI (makes sense). My user name there is mattip. Can someone
> help out?
> Matti
>

Done. Sorry I missed that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180612/7a6836bc/attachment.html>

From charlesr.harris at gmail.com  Tue Jun 12 18:38:02 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 12 Jun 2018 16:38:02 -0600
Subject: [Numpy-discussion] NumPy 1.15.x branched.
Message-ID: <CAB6mnxJuQtYgrfZqc1=h8hMgkt4qUJn5_JeaFp7cLtqQrBBirw@mail.gmail.com>

Hi All,

NumPy 1.15.x has been branched and master is now open for 1.16 development.
If there are any remaining PRs that *just have to be in 1.15*, please
complain here :0

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180612/5a01f0b6/attachment.html>

From matti.picus at gmail.com  Tue Jun 12 20:09:28 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 12 Jun 2018 17:09:28 -0700
Subject: [Numpy-discussion] NumPy 1.14.5 released
Message-ID: <7466261a-7d28-da89-d7a4-c2494ebae3ce@gmail.com>

Hi All,

I am pleased to announce the release of NumPy 14.4.5.

This is a bugfix release for bugs reported following the 1.14.4 release. The
most significant fixes are:

* fixes for compilation errors on alpine and NetBSD

The Python versions supported in this release are 2.7 and 3.4 - 3.6. The 
Python
3.6 wheels available from PIP are built with Python 3.6.2 and should be
compatible with all previous versions of Python 3.6. The source releases 
were
cythonized with Cython 0.28.2 and should work for the upcoming Python 3.7.

Contributors
============

A total of 1 person contributed to this release.? People with a "+" by their
names contributed a patch for the first time.

* Charles Harris

Pull requests merged
====================

A total of 2 pull requests were merged for this release.

* `#11274 <https://github.com/numpy/numpy/pull/11274>`__: BUG: Correct 
use of NPY_UNUSED.
* `#11294 <https://github.com/numpy/numpy/pull/11294>`__: BUG: Remove 
extra trailing parentheses.

Cheers,
Matti

From m.h.vankerkwijk at gmail.com  Tue Jun 12 21:13:47 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 12 Jun 2018 21:13:47 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAL1kJvBmRNqkqqDBX2PhaFFc4i2ct0wi7YqTLSR4Mhb3M46qYA@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAJNV+9t2qP7O_S=4PSCA9HetKTfe1hqfR3b1jmgueLruE9M_Vg@mail.gmail.com>
 <CAL1kJvBmRNqkqqDBX2PhaFFc4i2ct0wi7YqTLSR4Mhb3M46qYA@mail.gmail.com>
Message-ID: <CAJNV+9sKgLarTcAviBoJFQK5zFBTWaY16Vad4=aeFOpoJr3X3Q@mail.gmail.com>

On Tue, Jun 12, 2018 at 2:35 AM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Frozen dimensions:
>
> I started with just making every 3-vector and 3x3-matrix structured arrays
> with the relevant single sub-array entry
>
> I was actually suggesting omitting the structured dtype (ie, field names)
> altogether, and just using the subarray dtypes (which exist alone, but not
> in arrays).
>
> Another (small?) advantage is that I can use `axis
>
> This is a fair argument against my proposal - at any rate, I think we?d
> need a better story for subarray dtypes before trying to add support to
> them for ufuncs
>
Yes, I've been wondering about the point of the sub-arrays... They seem
interesting but in arrays just disappear. Their possible use would be to
change the shape as seen by the outside world (as happens if one does
define that sub-array as a 1-part structured array).

Anyway, for another discussion!


> ------------------------------
>
> Broadcasting dimensions
>
> But perhaps a putative weighted_mean ? is a decent example
>
> That?s fairly convincing as a non-chained ufunc case. Can you add an
> example like that to the NEP?
>
Done.


> Also, it has the benefit of being clear what the function can handle by
> inspection of the signature
>
> Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can
> you come up with an example where only some dimensions make sense to
> broadcast?
>
Not a super-convincing one, though I guess one could think of a similar
function for 3-vectors (which somehow must care about those being
three-dimensional, because, say, it calculates the average direction of the
cross product in spherical angles...), then, in the signature
`(n,3),(n,3)->(),(),(),()` one would like to indicate that the `n` could be
broadcast, but the `3` could not.

As I now write in the NEP, part of the reason of doing it by distinct
dimension is that I already need a flag for flexible, so it is easy to add
one for broadcastable; similarly, in the actual code, there is quite a bit
of shared stuff.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180612/c2c3f8a1/attachment.html>

From charlesr.harris at gmail.com  Wed Jun 13 17:27:50 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 13 Jun 2018 15:27:50 -0600
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
Message-ID: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>

Hi All,

I think NumPy 1.16 would be a good time to drop Python 3.4 support. We will
want to do that anyway once we drop 2.7 so that we will only be using
recent Windows compilers, and with Python 3.7 due at the end of the month I
think supporting 3.5-7 for 1.16 should be sufficient.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/7cb3535f/attachment.html>

From shoyer at gmail.com  Wed Jun 13 17:45:23 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 13 Jun 2018 14:45:23 -0700
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
Message-ID: <CAEQ_Tvc4otgV_aPem7gRMfv_5XXxJVhnz-=Su+LU6gGFU8uiqA@mail.gmail.com>

This sounds good to me. Most of the downstream projects I work with have
already dropped Python 3.4 support.

On Wed, Jun 13, 2018 at 2:30 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> I think NumPy 1.16 would be a good time to drop Python 3.4 support. We
> will want to do that anyway once we drop 2.7 so that we will only be using
> recent Windows compilers, and with Python 3.7 due at the end of the month I
> think supporting 3.5-7 for 1.16 should be sufficient.
>
> Thoughts?
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/43fa4c14/attachment.html>

From chris.barker at noaa.gov  Wed Jun 13 17:56:06 2018
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 13 Jun 2018 14:56:06 -0700
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAEQ_Tvc4otgV_aPem7gRMfv_5XXxJVhnz-=Su+LU6gGFU8uiqA@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
 <CAEQ_Tvc4otgV_aPem7gRMfv_5XXxJVhnz-=Su+LU6gGFU8uiqA@mail.gmail.com>
Message-ID: <CALGmxELey833A06roYdmiUb1_F2GafoEg+54Yk4iRGYbMeaNdQ@mail.gmail.com>

>
> I think NumPy 1.16 would be a good time to drop Python 3.4 support.
>>
>
+1

Using python3 before 3.5 was still kinda "bleeding edge" -- so projects are
more likely to be actively upgrading.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/db9f0493/attachment.html>

From millman at berkeley.edu  Wed Jun 13 18:15:20 2018
From: millman at berkeley.edu (Jarrod Millman)
Date: Wed, 13 Jun 2018 15:15:20 -0700
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
Message-ID: <CAB6X4sgenxAhjxn55sBKVzXZrDcGuGb+LS6qXXuZWa7sL-Fc6Q@mail.gmail.com>

+1

On Wed, Jun 13, 2018 at 2:27 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> I think NumPy 1.16 would be a good time to drop Python 3.4 support. We will
> want to do that anyway once we drop 2.7 so that we will only be using recent
> Windows compilers, and with Python 3.7 due at the end of the month I think
> supporting 3.5-7 for 1.16 should be sufficient.
>
> Thoughts?
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

From charlesr.harris at gmail.com  Wed Jun 13 20:10:50 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 13 Jun 2018 18:10:50 -0600
Subject: [Numpy-discussion] Updated 1.15.0 release notes
Message-ID: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>

Hi All,

There is a PR for the updated NumPy 1.15.0 release notes
<https://github.com/numpy/numpy/pull/11327> . I would appreciate it if all
those involved in the thatn release would have a look and fix incorrect or
missing notes.

Cheers,

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/f0639488/attachment.html>

From nathan12343 at gmail.com  Wed Jun 13 20:28:12 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Wed, 13 Jun 2018 19:28:12 -0500
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
Message-ID: <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>

Hi Chuck,

Are you planning on doing an rc release this time? I think the NumPy 1.14
release was unusually bumpy and part of that was the lack of an rc. One
example: importing h5py caused a warning under numpy 1.14 and an h5py
release didn?t come out with a workaround or fix for a couple months. There
was also an issue with array printing that caused problems in yt (although
both yt and NumPy quickly did bugfix releases that fixed that).

I guess 1.14 was particularly noisy, but still I?d really appreciate having
a prerelease version to test against and some time to report issues with
the prerelease so numpy and other projects can implement workarounds as
needed without doing a release that might potentially break real users who
happen to install right after numpy 1.x.0 comes out.

Best,
Nathan Goldbaum

On Wed, Jun 13, 2018 at 7:11 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> There is a PR for the updated NumPy 1.15.0 release notes
> <https://github.com/numpy/numpy/pull/11327> . I would appreciate it if
> all those involved in the thatn release would have a look and fix incorrect
> or missing notes.
>
> Cheers,
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/276532dd/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Wed Jun 13 20:33:39 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 13 Jun 2018 20:33:39 -0400
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
 <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
Message-ID: <CAJNV+9tLRNMWYUYn4vC8yCDkxWcGwponGy9Unh-P6uWERUqC-A@mail.gmail.com>

Request for a -rc seconded (although this time we should be fine for
astropy, as things are working well with -dev).
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/3c09d733/attachment.html>

From charlesr.harris at gmail.com  Wed Jun 13 20:42:10 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 13 Jun 2018 18:42:10 -0600
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
 <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
Message-ID: <CAB6mnxLApc1Hpr23ZFNDMqKXLPbORumP4c_yR3qgDDGpQtEYAw@mail.gmail.com>

On Wed, Jun 13, 2018 at 6:28 PM, Nathan Goldbaum <nathan12343 at gmail.com>
wrote:

> Hi Chuck,
>
> Are you planning on doing an rc release this time? I think the NumPy 1.14
> release was unusually bumpy and part of that was the lack of an rc. One
> example: importing h5py caused a warning under numpy 1.14 and an h5py
> release didn?t come out with a workaround or fix for a couple months. There
> was also an issue with array printing that caused problems in yt (although
> both yt and NumPy quickly did bugfix releases that fixed that).
>
> I guess 1.14 was particularly noisy, but still I?d really appreciate
> having a prerelease version to test against and some time to report issues
> with the prerelease so numpy and other projects can implement workarounds
> as needed without doing a release that might potentially break real users
> who happen to install right after numpy 1.x.0 comes out.
>

There was a 1.14.0rc1
<https://github.com/numpy/numpy/releases/tag/v1.14.0rc1>. I was too quick
for the full release, just waited three weeks, so maybe four this time. Too
few people actually test the candidates and give feedback, so I tend to
regard the *.*.0 releases as the true rc :)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/8f0aea18/attachment.html>

From nathan12343 at gmail.com  Wed Jun 13 21:16:45 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Wed, 13 Jun 2018 20:16:45 -0500
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAB6mnxLApc1Hpr23ZFNDMqKXLPbORumP4c_yR3qgDDGpQtEYAw@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
 <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
 <CAB6mnxLApc1Hpr23ZFNDMqKXLPbORumP4c_yR3qgDDGpQtEYAw@mail.gmail.com>
Message-ID: <CAJXewOmsEnz4OrCjFrK7V_g6Yg1nxPs-o+FxZ2oZnUijASdrmA@mail.gmail.com>

OK I guess I missed that announcement.

I wouldn?t mind more than one email with a reminder to test.

On Wed, Jun 13, 2018 at 7:42 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

> On Wed, Jun 13, 2018 at 6:28 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>
>> Hi Chuck,
>>
>> Are you planning on doing an rc release this time? I think the NumPy 1.14
>> release was unusually bumpy and part of that was the lack of an rc. One
>> example: importing h5py caused a warning under numpy 1.14 and an h5py
>> release didn?t come out with a workaround or fix for a couple months. There
>> was also an issue with array printing that caused problems in yt (although
>> both yt and NumPy quickly did bugfix releases that fixed that).
>>
>> I guess 1.14 was particularly noisy, but still I?d really appreciate
>> having a prerelease version to test against and some time to report issues
>> with the prerelease so numpy and other projects can implement workarounds
>> as needed without doing a release that might potentially break real users
>> who happen to install right after numpy 1.x.0 comes out.
>>
>
> There was a 1.14.0rc1
> <https://github.com/numpy/numpy/releases/tag/v1.14.0rc1>. I was too quick
> for the full release, just waited three weeks, so maybe four this time. Too
> few people actually test the candidates and give feedback, so I tend to
> regard the *.*.0 releases as the true rc :)
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180613/e827b7d4/attachment.html>

From matthew.brett at gmail.com  Thu Jun 14 04:48:14 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Thu, 14 Jun 2018 09:48:14 +0100
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAJXewOmsEnz4OrCjFrK7V_g6Yg1nxPs-o+FxZ2oZnUijASdrmA@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
 <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
 <CAB6mnxLApc1Hpr23ZFNDMqKXLPbORumP4c_yR3qgDDGpQtEYAw@mail.gmail.com>
 <CAJXewOmsEnz4OrCjFrK7V_g6Yg1nxPs-o+FxZ2oZnUijASdrmA@mail.gmail.com>
Message-ID: <CAH6Pt5qd7uz=7N2T2J38_a4cT03FtLOD31vHo10W7J-nz00qyg@mail.gmail.com>

Hi Nathan,

One very helpful think you could do, is add a Travis-CI matrix entry
where you are testing against the latest numpy nightly builds.

I got a bit lost in your tox setup, but the basic idea is that, for
one test entry, you add the following flags to pip:

-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com
--pre

In that case, you'll pull in the latest nightly build of Numpy.  See
the Scipy .travis.yml setup for an example.

Cheers,

Matthew

On Thu, Jun 14, 2018 at 2:16 AM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> OK I guess I missed that announcement.
>
> I wouldn?t mind more than one email with a reminder to test.
>
> On Wed, Jun 13, 2018 at 7:42 PM Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>>
>> On Wed, Jun 13, 2018 at 6:28 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>>>
>>> Hi Chuck,
>>>
>>> Are you planning on doing an rc release this time? I think the NumPy 1.14
>>> release was unusually bumpy and part of that was the lack of an rc. One
>>> example: importing h5py caused a warning under numpy 1.14 and an h5py
>>> release didn?t come out with a workaround or fix for a couple months. There
>>> was also an issue with array printing that caused problems in yt (although
>>> both yt and NumPy quickly did bugfix releases that fixed that).
>>>
>>> I guess 1.14 was particularly noisy, but still I?d really appreciate
>>> having a prerelease version to test against and some time to report issues
>>> with the prerelease so numpy and other projects can implement workarounds as
>>> needed without doing a release that might potentially break real users who
>>> happen to install right after numpy 1.x.0 comes out.
>>
>>
>> There was a 1.14.0rc1. I was too quick for the full release, just waited
>> three weeks, so maybe four this time. Too few people actually test the
>> candidates and give feedback, so I tend to regard the *.*.0 releases as the
>> true rc :)
>>
>> Chuck
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

From m.h.vankerkwijk at gmail.com  Thu Jun 14 10:44:57 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 14 Jun 2018 10:44:57 -0400
Subject: [Numpy-discussion] Updated 1.15.0 release notes
In-Reply-To: <CAH6Pt5qd7uz=7N2T2J38_a4cT03FtLOD31vHo10W7J-nz00qyg@mail.gmail.com>
References: <CAB6mnxKOeTQEfeqcNp-6sZeyFw72fPCx0eFC2Wsd2H=KSGMfXg@mail.gmail.com>
 <CAJXewOmT9Hv6Lv4k0GUaYKvR7VzfRzkyHeOoqrf4KN1wXW94kw@mail.gmail.com>
 <CAB6mnxLApc1Hpr23ZFNDMqKXLPbORumP4c_yR3qgDDGpQtEYAw@mail.gmail.com>
 <CAJXewOmsEnz4OrCjFrK7V_g6Yg1nxPs-o+FxZ2oZnUijASdrmA@mail.gmail.com>
 <CAH6Pt5qd7uz=7N2T2J38_a4cT03FtLOD31vHo10W7J-nz00qyg@mail.gmail.com>
Message-ID: <CAJNV+9sXCBEYsKOKUQViE_n1x9ghv0F+t+Ooi=fYBdc2Mr5Sqw@mail.gmail.com>

Indeed, we do something similar in astropy, with a pre-release failure
being considered breakage (rather than ignorable as for -dev):
https://github.com/astropy/astropy/blob/master/.travis.yml#L142
?-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180614/06227e3a/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 14 13:50:29 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 14 Jun 2018 13:50:29 -0400
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAB6X4sgenxAhjxn55sBKVzXZrDcGuGb+LS6qXXuZWa7sL-Fc6Q@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
 <CAB6X4sgenxAhjxn55sBKVzXZrDcGuGb+LS6qXXuZWa7sL-Fc6Q@mail.gmail.com>
Message-ID: <CAJNV+9tG+yZ6o4oxU3Ey9mZED0ABkGib+h5RQJfq4wQnGeB_Fg@mail.gmail.com>

It seems everyone is in favour - anybody in for making a PR reducing the
travis testing accordingly? (It seems a bit of overkill more generally -
would be good to reduce the kWhr footprint a little...) -- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180614/5caf5dcf/attachment.html>

From teoliphant at gmail.com  Thu Jun 14 14:09:54 2018
From: teoliphant at gmail.com (Travis Oliphant)
Date: Thu, 14 Jun 2018 13:09:54 -0500
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAJNV+9tG+yZ6o4oxU3Ey9mZED0ABkGib+h5RQJfq4wQnGeB_Fg@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
 <CAB6X4sgenxAhjxn55sBKVzXZrDcGuGb+LS6qXXuZWa7sL-Fc6Q@mail.gmail.com>
 <CAJNV+9tG+yZ6o4oxU3Ey9mZED0ABkGib+h5RQJfq4wQnGeB_Fg@mail.gmail.com>
Message-ID: <CAFMmPGM-LCyVtkTSsgC9nNdN4jn9YaDRfXsTq3mqPr-5cAHf_A@mail.gmail.com>

It is a welcome thing to see Python 2.7 support disappearing.

Dropping 3.4 support in new releases sounds like a great idea as well.

NumPy was originally pitched as a Python 3 thing...

Travis

On Thu, Jun 14, 2018, 12:52 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> It seems everyone is in favour - anybody in for making a PR reducing the
> travis testing accordingly? (It seems a bit of overkill more generally -
> would be good to reduce the kWhr footprint a little...) -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180614/7d531e8d/attachment.html>

From matti.picus at gmail.com  Thu Jun 14 14:13:54 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Thu, 14 Jun 2018 11:13:54 -0700
Subject: [Numpy-discussion] Circle CI moving from 1.0 to 2.0
Message-ID: <ac58f8d0-88a0-3382-2265-7dbd823e5177@gmail.com>

I stumbled across this notice (only seems to appear in a failed build)

"This project is currently running on CircleCI 1.0 which will no longer 
be supported after August 31, 2018. Please start migrating this project 
to CircleCI 2.0 <https://circleci.com/docs/2.0/migration/>."

Here is the original link https://circleci.com/gh/numpy/numpy/2080

Is this an artifact that can be ignored or do we need to migrate, if so 
has anyone already done it for their project?
Matti

From einstein.edison at gmail.com  Thu Jun 14 14:35:44 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 14 Jun 2018 14:35:44 -0400
Subject: [Numpy-discussion] Circle CI moving from 1.0 to 2.0
In-Reply-To: <ac58f8d0-88a0-3382-2265-7dbd823e5177@gmail.com>
References: <ac58f8d0-88a0-3382-2265-7dbd823e5177@gmail.com>
Message-ID: <CADViA5Bj3Xsx-VhswGO_j28_LDWwwUfO_PDdhyWoKHi_mGtU4g@mail.gmail.com>

Hi Matti,

It seems the CircleCI config is already on Version 2.0. See here, notice
the 2.0 in front of every successful build.
https://circleci.com/gh/numpy/numpy

I can also see that some failed builds have 1.0 in front of them... But
this shouldn't happen.

Most likely this is a CircleCI issue, not one with our configuration. It
can be safely ignored.

Regards,
Hameer Abbasi


On 14/06/2018 at 23:13, Matti wrote:

I stumbled across this notice (only seems to appear in a failed build)

"This project is currently running on CircleCI 1.0 which will no longer
be supported after August 31, 2018. Please start migrating this project
to CircleCI 2.0 <https://circleci.com/docs/2.0/migration/>."

Here is the original link https://circleci.com/gh/numpy/numpy/2080

Is this an artifact that can be ignored or do we need to migrate, if so
has anyone already done it for their project?
Matti
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180614/c644bc9e/attachment.html>

From einstein.edison at gmail.com  Thu Jun 14 15:09:10 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 14 Jun 2018 12:09:10 -0700
Subject: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16
In-Reply-To: <CAJNV+9tG+yZ6o4oxU3Ey9mZED0ABkGib+h5RQJfq4wQnGeB_Fg@mail.gmail.com>
References: <CAB6mnxJR3C8TqZN+Kokj3+Kfb6c=kSuwThbLe4mWgwHzRAs79w@mail.gmail.com>
 <CAB6X4sgenxAhjxn55sBKVzXZrDcGuGb+LS6qXXuZWa7sL-Fc6Q@mail.gmail.com>
 <CAJNV+9tG+yZ6o4oxU3Ey9mZED0ABkGib+h5RQJfq4wQnGeB_Fg@mail.gmail.com>
Message-ID: <CADViA5AcLsnOpLRk_LEDEXhy8_CCQRPb+gtYYw284JC4Spw_zg@mail.gmail.com>

It was a small task. I created a PR for it here
<https://github.com/numpy/numpy/pull/11337>. Feel free to merge after CI
passes or close.

Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac

On 14. Jun 2018 at 22:50, Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:


It seems everyone is in favour - anybody in for making a PR reducing the
travis testing accordingly? (It seems a bit of overkill more generally -
would be good to reduce the kWhr footprint a little...) -- Marten

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180614/6b0ca531/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Fri Jun 15 10:07:22 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 15 Jun 2018 10:07:22 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9sKgLarTcAviBoJFQK5zFBTWaY16Vad4=aeFOpoJr3X3Q@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAJNV+9t2qP7O_S=4PSCA9HetKTfe1hqfR3b1jmgueLruE9M_Vg@mail.gmail.com>
 <CAL1kJvBmRNqkqqDBX2PhaFFc4i2ct0wi7YqTLSR4Mhb3M46qYA@mail.gmail.com>
 <CAJNV+9sKgLarTcAviBoJFQK5zFBTWaY16Vad4=aeFOpoJr3X3Q@mail.gmail.com>
Message-ID: <CAJNV+9v_hsUu9vaULEZCVHA7uZXu2d3CB4uV4SRDdxz-0CJP1w@mail.gmail.com>

Hi All,

The discussion on the gufunc signature enhancements seems to have stalled a
bit, but while it was going I've tried to update the NEP correspondingly.
The NEP is now merged, so can viewed more easily, at
http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html

My own quite possibly biased summary of the discussion so far is that:

1) Frozen dimensions are generally seen as a good idea; other
implementations may be possible, but are not as clear.

2) Flexible dimensions have little use beyond matmul; the main discussion
is whether there is a better way. In my opinion, the main benefit of the
current proposal is that it allows operator overrides to all work the same
way (via __array_ufunc__), independent of any assumptions about the object
that does the override (such as that it has a shape).

3) Broadcastable dimensions had less support, but mostly for lack of
examples; there now is one beyond all_equal, for which a gufunc is more
clearly the proper route: a weighted average (which has obvious extensions).

A general benefit of course is that there is actual code for all three; it
would certainly be nice if we could fully support `matmul` and `@` in 1.16.

So, the question would seem whether the NEP should be accepted or rejected
(with part acceptance of course being possible, though I note that flexible
and broadcastable share a lot of implementation, so in my opinion it is
somewhat pointless to do just one of them).

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180615/b1186068/attachment.html>

From shoyer at gmail.com  Fri Jun 15 14:17:09 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Fri, 15 Jun 2018 11:17:09 -0700
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAL1kJvADe=-q+Ydx_xyOURctjrNnT+rdr-DU8Da0hZB3wXTWuw@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>
 <CAL1kJvADe=-q+Ydx_xyOURctjrNnT+rdr-DU8Da0hZB3wXTWuw@mail.gmail.com>
Message-ID: <CAEQ_TvfFiAODZMNFfW7g1X1rNw4bTK2Z8XMme9GvEcnR8RUJFQ@mail.gmail.com>

On Mon, Jun 11, 2018 at 11:59 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> I don?t understand your alternative here. If we overload np.matmul using
> *array_function*, then it would not use *ether* of these options for
> writing the operation in terms of other gufuncs. It would simply look for
> an *array_function* attribute, and call that method instead.
>
> Let me explain that suggestion a little more clearly.
>
>    1. There?d be a linalg.matmul2d that performs the real matrix case,
>    which would be easy to make as a ufunc right now.
>    2. __matmul__ and __rmatmul__ would just call np.matmul, as they
>    currently do (for consistency between np.matmul and operator.matmul,
>    needed in python pre- at -operator)
>    3. np.matmul would be implemented as:
>
>    @do_array_function_overridesdef matmul(a, b):
>        if a.ndim != 1 and b.ndim != 1:
>            return matmul2d(a, b)
>        elif a.ndim != 1:
>            return matmul2d(a, b[:,None])[...,0]
>        elif b.ndim != 1:
>            return matmul2d(a[None,:], b)
>        else:
>            # this one probably deserves its own ufunf
>            return matmul2d(a[None,:], b[:,None])[0,0]
>
>    4. Quantity can just override __array_ufunc__ as with any other ufunc
>    5. DataArray, knowing the above doesn?t work, would implement
>    something like
>
>    @matmul.register_array_function(DataArray)def __array_function__(a, b):
>        if a.ndim != 1 and b.ndim != 1:
>            return matmul2d(a, b)
>        else:
>            # either:
>            # - add/remove dummy dimensions in a dataarray-specific way
>            # - downcast to ndarray and do the dimension juggling there
>
>
> Advantages of this approach:
>
>    -
>
>    Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop,
>    need to know about optional dimensions.
>    -
>
>    We get a matmul2d ufunc, that all subclasses support out of the box if
>    they support matmul
>
> Eric
>
OK, this sounds pretty reasonable to me -- assuming we manage to figure out
the __array_function__ proposal!

There's one additional ingredient we would need to make this work well:
some way to guarantee that "ndim" and indexing operations are available
without casting to a base numpy array.

For now, np.asanyarray() would probably suffice, but that isn't quite right
(e.g., this would fail for np.matrix).

In the long term, I think we need a new coercion protocol for "duck"
arrays. Nathaniel Smith and I started writing a NEP on this, but it isn't
quite ready yet.

> ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180615/c5b747f8/attachment.html>

From robert.kern at gmail.com  Sat Jun 16 03:38:57 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 16 Jun 2018 00:38:57 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
Message-ID: <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>

I have incorporated the feedback from this thread, and have significantly
altered the proposal. I think this version will be more palatable to
everyone.

  https://github.com/numpy/numpy/pull/11356

https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst

I'm pretty sure that Kevin Sheppard's prototype already implements the
broad strokes of my proposal (seriously, he thinks of everything; I'm just
playing catch up), so I don't think there is any technical risk. I think
it's just a matter of the fine details of shoving this into numpy.random
per se rather than a third party package.

  https://bashtage.github.io/randomgen/devel/legacy.html

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <robert.kern at gmail.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy
for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we
have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility
guarantee
was to provide exact reproducibility for simulations across numpy versions
in
order to promote reproducible research.  However, this policy has made it
very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the
surrounding
ecosystem of scientific software, we believe that there are now better ways
to
achieve these objectives.  We propose relaxing our strict
stream-compatibility
policy to remove the obstacles that are in the way of accepting
contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods
using the
    same parameters will always produce the same results up to roundoff
error
    except when the values were incorrect.  Incorrect values will be fixed
and
    the NumPy version in which the fix was made will be noted in the
relevant
    docstring.  Extension of existing parameter ranges and the addition of
new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of
weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it
was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by ?we? in this section, we really mean Robert Kern,
let?s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms,
or
just in a different way could cause changes in the stream, with varying
degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method
relies on
``numpy.linalg`` functions.  Even on the same platform, if one links
``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return
completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for
integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide
exact
guarantees across versions.  We still do apply bug fixes when correctness
is at
stake.  And even if we didn?t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for
our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin
all
of the versions of code of your software stack, possibly down to the OS
itself.
The landscape for accomplishing this is much easier today than it was in
2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the
exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a
faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do
so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

Work on a proposed new PRNG subsystem is already underway in the randomgen_
project.  The specifics of the new design are out of scope for this NEP and
up
for much discussion, but we will discuss general policies that will guide
the
evolution of whatever code is adopted.  We will also outline just a few of
the
requirements that such a new system must have to support the policy
proposed in
this NEP.

First, we will maintain API source compatibility just as we do with the
rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness
will
not be considered a bug for this purpose.  Correctness bug fixes that break
stream-compatibility can happen on bugfix releases, per usual, but
developers
should consider if they can wait until the next feature release.  We
encourage
developers to strongly weight user?s pain from the break in
stream-compatibility against the improvements.  One example of a worthwhile
improvement would be to change algorithms for a significant increase in
performance, for example, moving from the `Box-Muller transform
<https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_ method of
Gaussian variate generation to the faster `Ziggurat algorithm
<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_.  An example of a
discouraged improvement would be tweaking the Ziggurat tables just a little
bit
for a small performance improvement.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  A promising design choice is to make these core
uniform PRNGs their own lightweight objects with a minimal set of methods
(randomgen_ calls them ?basic RNGs?).  The broader set of non-uniform
distributions will be its own class that holds a reference to one of these
core
uniform PRNG objects and simply delegates to the core uniform PRNG object
when
it needs uniform random numbers.  To borrow an example from randomgen_, the
class ``MT19937`` is a basic RNG that implements the classic Mersenne
Twister
algorithm.  The class ``RandomGenerator`` wraps around the basic RNG to
provide
all of the non-uniform distribution methods::

    # This is not the only way to instantiate this object.
    # This is just handy for demonstrating the delegation.
    >>> brng = MT19937(seed)
    >>> rg = RandomGenerator(brng)
    >>> x = rg.standard_normal(10)

We will be more strict about a select subset of methods on these basic RNG
objects.  They MUST guarantee stream-compatibility for a specified set
of methods which are chosen to make it easier to compose them to build other
distributions and which are needed to abstract over the implementation
details
of the variety of core PRNG algorithms.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

The distributions class (``RandomGenerator``) SHOULD have all of the same
distribution methods as ``RandomState`` with close-enough function
signatures
such that almost all code that currently works with ``RandomState``
instances
will work with ``RandomGenerator`` instances (ignoring the precise stream
values).  Some variance will be allowed for integer distributions: in order
to
avoid some of the cross-platform problems described above, these SHOULD be
rewritten to work with ``uint64`` numbers on all platforms.

.. _randomgen: https://github.com/bashtage/randomgen


Supporting Unit Tests
:::::::::::::::::::::

Because we did make a strong stream-compatibility guarantee early in numpy?s
life, reliance on stream-compatibility has grown beyond reproducible
simulations.  One use case that remains for stream-compatibility across
numpy
versions is to use pseudorandom streams to generate test data in unit tests.
With care, many of the cross-platform instabilities can be avoided in the
context of small unit tests.

The new PRNG subsystem MUST provide a second, legacy distributions class
that
uses the same implementations of the distribution methods as the current
version of ``numpy.random.RandomState``.  The methods of this class will
keep
the same strict stream-compatibility guarantees.  It is intended that this
class will no longer be modified, except to keep it working when numpy
internals change.  All new development should go into the primary
distributions
class.  The purpose of ``RandomState`` will be documented as providing
certain
fixed functionality for backwards compatibility and stable numbers for the
limited purpose of unit testing, and not making whole programs reproducible
across numpy versions.

This legacy distributions class MUST be accessible under the name
``numpy.random.RandomState`` for backwards compatibility.  All current ways
of
instantiating ``numpy.random.RandomState`` with a given state should
instantiate the Mersenne Twister basic RNG with the same state.  The legacy
distributions class MUST be capable of accepting other basic RNGs.  The
purpose
here is to ensure that one can write a program with a consistent basic RNG
state with a mixture of libraries that may or may not have upgraded from
``RandomState``.  Instances of the legacy distributions class MUST respond
``True`` to ``isinstance(rg, numpy.random.RandomState)`` because there is
current utility code that relies on that check.  Similarly, old pickles of
``numpy.random.RandomState`` instances MUST unpickle correctly.


``numpy.random.*``
::::::::::::::::::

The preferred best practice for getting reproducible pseudorandom numbers
is to
instantiate a generator object with a seed and pass it around.  The implicit
global ``RandomState`` behind the ``numpy.random.*`` convenience functions
can
cause problems, especially when threads or other forms of concurrency are
involved.  Global state is always problematic.  We categorically recommend
avoiding using the convenience functions when reproducibility is involved.

That said, people do use them and use ``numpy.random.seed()`` to control the
state underneath them.  It can be hard to categorize and count API usages
consistently and usefully, but a very common usage is in unit tests where
many
of the problems of global state are less likely.

The initial release of the new PRNG subsystem MUST leave these convenience
functions as aliases to the methods on a global ``RandomState`` that is
initialized with a Mersenne Twister basic RNG object.  A call to
``numpy.random.seed()`` will be forwarded to that basic RNG object.  In
order
to allow certain workarounds, it MUST be possible to replace the basic RNG
underneath the global ``RandomState`` with any other basic RNG object (we
leave
the precise API details up to the new subsystem).  Calling
``numpy.random.seed()``
thereafter SHOULD just pass the given seed to the current basic RNG object
and
not attempt to reset the basic RNG to the Mersenne Twister.  The global
``RandomState`` instance MUST be accessible by the name
``numpy.random.mtrand._rand``: Robert Kern long ago promised
``scikit-learn``
that this name would be stable.  Whoops.

The set of ``numpy.random.*`` convenience functions SHALL remain the same as
they currently are.  They SHALL be aliases to the ``RandomState`` methods
and
not the new less-stable distributions class (``RandomGenerator``, in the
examples above). Users who want to get the fastest, best distributions can
follow best practices and instantiate generator objects explicitly.

After we have experience with the new PRNG subsystem, we can and should
revisit
these issues in future NEPs.


Alternatives
------------

Versioning
::::::::::

For a long time, we considered that the way to allow algorithmic
improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past
versions
of the code, and there would be a way to get the old versions.

We will not be doing this.  If one needs to get the exact bit-for-bit
results
from a given version of ``numpy``, whether one uses random numbers or not,
one
should use the exact version of ``numpy``.

Proposals of how to do RNG versioning varied widely, and we will not
exhaustively list them here.  We spent years going back and forth on these
designs and were not able to find one that sufficed.  Let that time lost,
and
more importantly, the contributors that we lost while we dithered, serve as
evidence against the notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same
code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use*
correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The
way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user?s code would need
to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only
provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such
needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState``
alone
is superfluous.


``StableRandom``
::::::::::::::::

A previous version of this NEP proposed to leave ``RandomState`` completely
alone for a deprecation period and build the new subsystem alongside with
new
names.  To satisfy the unit testing use case, it proposed introducing a
small
distributions class nominally called ``StableRandom``. It would have
provided
a small subset of distribution methods that were considered most useful in
unit
testing, but not the full set such that it would be too likely to be used
outside of the testing context.

During discussion about this proposal, it became apparent that there was no
satisfactory subset.  At least some projects used a fairly broad selection
of
the ``RandomState`` methods in unit tests.

Downstream project owners would have been forced to modify their code to
accomodate the new PRNG subsystem.  Some modifications might be simply
mechanical, but the bulk of the work would have been tedious churn for no
positive improvement to the downstream project, just avoiding being broken.

Furthermore, under this old proposal, we would have had a quite lengthy
deprecation period where ``RandomState`` existed alongside the new system of
basic RNGs and distribution classes. Leaving the implementation of
``RandomState`` fixed meant that it could not use the new basic RNG state
objects.  Developing programs that use a mixture of libraries that have and
have not upgraded would require managing two sets of PRNG states.  This
would
notionally have been time-limited, but we intended the deprecation to be
very
long.

The current proposal solves all of these problems.  All current usages of
``RandomState`` will continue to work in perpetuity, though some may be
discouraged through documentation.  Unit tests can continue to use the full
complement of ``RandomState`` methods.  Mixed
``RandomState/RandomGenerator``
code can safely share the common basic RNG state.  Unmodified
``RandomState``
code can make use of the new features of alternative basic RNGs like
settable
streams.


Discussion
----------

- `NEP discussion <
https://mail.python.org/pipermail/numpy-discussion/2018-June/078126.html>`_
- `Earlier discussion <
https://mail.python.org/pipermail/numpy-discussion/2018-January/077608.html
>`_


Copyright
---------

This document has been placed in the public domain.


-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180616/8b834aa2/attachment-0001.html>

From ralf.gommers at gmail.com  Sat Jun 16 14:01:15 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 16 Jun 2018 11:01:15 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>
Message-ID: <CABL7CQja4197mu+czUnAPmSbfMRRSr58JiAOnh+Yx5YDtCO2Cg@mail.gmail.com>

On Sat, Jun 16, 2018 at 12:38 AM, Robert Kern <robert.kern at gmail.com> wrote:

> I have incorporated the feedback from this thread, and have significantly
> altered the proposal. I think this version will be more palatable to
> everyone.
>
>   https://github.com/numpy/numpy/pull/11356
>   https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-
> 0019-rng-policy.rst
>
> I'm pretty sure that Kevin Sheppard's prototype already implements the
> broad strokes of my proposal (seriously, he thinks of everything; I'm just
> playing catch up), so I don't think there is any technical risk. I think
> it's just a matter of the fine details of shoving this into numpy.random
> per se rather than a third party package.
>
>   https://bashtage.github.io/randomgen/devel/legacy.html
>
> ---
>
> ==============================
> Random Number Generator Policy
> ==============================
>
> :Author: Robert Kern <robert.kern at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>

Thanks Robert. The whole proposal looks good to me now, just one minor
comment below.


>
> The initial release of the new PRNG subsystem MUST leave these convenience
> functions as aliases to the methods on a global ``RandomState`` that is
> initialized with a Mersenne Twister basic RNG object.  A call to
> ``numpy.random.seed()`` will be forwarded to that basic RNG object.  In
> order
> to allow certain workarounds, it MUST be possible to replace the basic RNG
> underneath the global ``RandomState`` with any other basic RNG object (we
> leave
> the precise API details up to the new subsystem).  Calling
> ``numpy.random.seed()``
> thereafter SHOULD just pass the given seed to the current basic RNG object
> and
> not attempt to reset the basic RNG to the Mersenne Twister.  The global
> ``RandomState`` instance MUST be accessible by the name
> ``numpy.random.mtrand._rand``: Robert Kern long ago promised
> ``scikit-learn``
> that this name would be stable.  Whoops.
>

This is a little weird; "mtrand" is an implementation detail already.
There's exactly 3 instances of that in scikit-learn, so replacing those
with a sane name (with a long timeline, say 4 numpy versions at least plus
a major version number bump) doesn't seem unreasonable.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180616/21a5994c/attachment.html>

From robert.kern at gmail.com  Sat Jun 16 17:58:47 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 16 Jun 2018 14:58:47 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CABL7CQja4197mu+czUnAPmSbfMRRSr58JiAOnh+Yx5YDtCO2Cg@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>
 <CABL7CQja4197mu+czUnAPmSbfMRRSr58JiAOnh+Yx5YDtCO2Cg@mail.gmail.com>
Message-ID: <CAF6FJiukKtW4XyKhz1Hk1YFwMyAm5N3NPVXQnL6vtCWMiT2wjQ@mail.gmail.com>

On Sat, Jun 16, 2018 at 11:02 AM Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sat, Jun 16, 2018 at 12:38 AM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> I have incorporated the feedback from this thread, and have significantly
>> altered the proposal. I think this version will be more palatable to
>> everyone.
>>
>>   https://github.com/numpy/numpy/pull/11356
>>
>> https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst
>>
>> I'm pretty sure that Kevin Sheppard's prototype already implements the
>> broad strokes of my proposal (seriously, he thinks of everything; I'm just
>> playing catch up), so I don't think there is any technical risk. I think
>> it's just a matter of the fine details of shoving this into numpy.random
>> per se rather than a third party package.
>>
>>   https://bashtage.github.io/randomgen/devel/legacy.html
>>
>> ---
>>
>> ==============================
>> Random Number Generator Policy
>> ==============================
>>
>> :Author: Robert Kern <robert.kern at gmail.com>
>> :Status: Draft
>> :Type: Standards Track
>> :Created: 2018-05-24
>>
>
> Thanks Robert. The whole proposal looks good to me now, just one minor
> comment below.
>
>
>>
>> The initial release of the new PRNG subsystem MUST leave these convenience
>> functions as aliases to the methods on a global ``RandomState`` that is
>> initialized with a Mersenne Twister basic RNG object.  A call to
>> ``numpy.random.seed()`` will be forwarded to that basic RNG object.  In
>> order
>> to allow certain workarounds, it MUST be possible to replace the basic RNG
>> underneath the global ``RandomState`` with any other basic RNG object (we
>> leave
>> the precise API details up to the new subsystem).  Calling
>> ``numpy.random.seed()``
>> thereafter SHOULD just pass the given seed to the current basic RNG
>> object and
>> not attempt to reset the basic RNG to the Mersenne Twister.  The global
>> ``RandomState`` instance MUST be accessible by the name
>> ``numpy.random.mtrand._rand``: Robert Kern long ago promised
>> ``scikit-learn``
>> that this name would be stable.  Whoops.
>>
>
> This is a little weird; "mtrand" is an implementation detail already.
> There's exactly 3 instances of that in scikit-learn, so replacing those
> with a sane name (with a long timeline, say 4 numpy versions at least plus
> a major version number bump) doesn't seem unreasonable.
>

Everything in this paragraph is explicitly just about the initial release
with the new subsystem. A following paragraph says that we should revisit
all of these in following releases.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180616/5488dda1/attachment.html>

From shoyer at gmail.com  Sat Jun 16 23:55:12 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sat, 16 Jun 2018 20:55:12 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAF6FJiukKtW4XyKhz1Hk1YFwMyAm5N3NPVXQnL6vtCWMiT2wjQ@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>
 <CABL7CQja4197mu+czUnAPmSbfMRRSr58JiAOnh+Yx5YDtCO2Cg@mail.gmail.com>
 <CAF6FJiukKtW4XyKhz1Hk1YFwMyAm5N3NPVXQnL6vtCWMiT2wjQ@mail.gmail.com>
Message-ID: <CAEQ_TvcUn7qSMfO6WBu+uzK3V2gWh4Ws3qEXq6R4Lo8wny5E3w@mail.gmail.com>

>
> This is a little weird; "mtrand" is an implementation detail already.
>> There's exactly 3 instances of that in scikit-learn, so replacing those
>> with a sane name (with a long timeline, say 4 numpy versions at least plus
>> a major version number bump) doesn't seem unreasonable.
>>
>
> Everything in this paragraph is explicitly just about the initial release
> with the new subsystem. A following paragraph says that we should revisit
> all of these in following releases.
>

This already read a little strangely to me -- it sounded like an indefinite
pronouncement. It would be good to clarify :).

Otherwise, I am quite happy with this NEP! It avoids unnecessary churn, and
opens the door to much needed improvements in numpy.random.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180616/dbfb5227/attachment-0001.html>

From robert.kern at gmail.com  Sun Jun 17 00:34:01 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 16 Jun 2018 21:34:01 -0700
Subject: [Numpy-discussion] NEP: Random Number Generator Policy
In-Reply-To: <CAEQ_TvcUn7qSMfO6WBu+uzK3V2gWh4Ws3qEXq6R4Lo8wny5E3w@mail.gmail.com>
References: <CAF6FJivA8Hx_3QCrphMjdEe0h3riAjkORiEB5K2sV6Ku03wz2g@mail.gmail.com>
 <CAF6FJiuczM3Wv2OZahy0u3uAYZPtAPexoQWsv7gLHvt_nsx_pg@mail.gmail.com>
 <CABL7CQja4197mu+czUnAPmSbfMRRSr58JiAOnh+Yx5YDtCO2Cg@mail.gmail.com>
 <CAF6FJiukKtW4XyKhz1Hk1YFwMyAm5N3NPVXQnL6vtCWMiT2wjQ@mail.gmail.com>
 <CAEQ_TvcUn7qSMfO6WBu+uzK3V2gWh4Ws3qEXq6R4Lo8wny5E3w@mail.gmail.com>
Message-ID: <CAF6FJisVdZqR7aHQmtqqMjY+WPC=J-6uZ1m_wnZ_J6a_hCtbdw@mail.gmail.com>

On Sat, Jun 16, 2018 at 8:56 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> This is a little weird; "mtrand" is an implementation detail already.
>>> There's exactly 3 instances of that in scikit-learn, so replacing those
>>> with a sane name (with a long timeline, say 4 numpy versions at least plus
>>> a major version number bump) doesn't seem unreasonable.
>>>
>>
>> Everything in this paragraph is explicitly just about the initial release
>> with the new subsystem. A following paragraph says that we should revisit
>> all of these in following releases.
>>
>
> This already read a little strangely to me -- it sounded like an
> indefinite pronouncement. It would be good to clarify :).
>

 Fair enough. How does this language strike you?

https://github.com/numpy/numpy/pull/11356/commits/15af58f7b1358d430a1af3c12f34a5024735d072

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180616/832fdfab/attachment.html>

From teoliphant at gmail.com  Sun Jun 17 02:51:03 2018
From: teoliphant at gmail.com (Travis Oliphant)
Date: Sun, 17 Jun 2018 01:51:03 -0500
Subject: [Numpy-discussion] A little about XND
Message-ID: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>

Hi everyone,

I'm glad I'm able to contribute back to this discussion thread.  I wanted
to post a quick message to this group to make sure there is no
mis-information about XND which has finally reached the point where it can
be experimented with (http://xnd.io) and commented on.

XND came out of thoughts and conversations we had at Continuum (now
Anaconda) when thinking about cross-language array computing and how to
enable improved features for high-level users in many languages (including
Python, R, Ruby, Node, Scala, Rust, Go, etc.).  Technically there are three
projects that make up XND (thus the name Plures for the Github
organization). All of these projects have a C-library and then a high-level
interface (right now we only have resources to develop the Python interface
but would love to see support for other languages).  xnd (libxnd) is the
typed container.  ndtypes (libndtypes) is the (datashape-like) type system
with a grammar, parser, and type matcher.  gumath (libgumath) are
generalized ufuncs which represent the entire function system on xnd.

We will be talking more about XND in the coming months and years, but for
the purposes of this list, I wanted to make it clear that

1) XND is not trying to replace NumPy. XND is a low-level library and
intended to be such. It would be most welcome if someday NumPy uses XND.
We understand this may be a while and certainly not before NumPy 2.0 or
3.0.

2) Our initial target users are Numba, pandas, Dask, xarray, and other
higher-level objects at the moment. We are eagerly searching for
integration opportunities to connect more developers (or advanced users) to
xnd before making more progress.

3) We do discuss array-like things in the public channels. NumPy users and
developers are welcome in those channels.  Everything is done in public
including the weekly meeting which anyone can attend:

Weekly meeting:  meet.google.com/heo-fmow-omz

Live discussions:  https://gitter.im/Plures/xnd for the libraries
themselves
                             https://gitter.im/Plures/xnd-ml for
integrations.

Issues and PRs: https://github.com/plures  --- under the various projects.

4) We are thinking about adding a custom-dtype to NumPy that uses xnd and
would be happy for anyone's help on that project.

5) We are in the early stages of exploring a high-level array interface
(using the ideas of MoA and the Psi Calculus with Lenore Mullen who worked
on APL years ago).  Likely the first place this will see some initial
progress is in an ND Sparse array that uses XND.

We welcome participation and input from all.  Stefan Krah has written the
majority of the code and so we tend to respect his point of view.  Pearu
Peterson (of f2py and SciPy fame) has made some useful contributions
recently.

Stefan and I have been talking roughly weekly for a couple of years and so
some of the problems currently there, I am certainly responsible for.

Two of our immediate goals are to work with the Numba team to get support
for ndtypes in Numba and allow Numba to use libgumath in no-python mode.

I look forward to continuing the conversation with any of you who want to
participate.  Perhaps some of us can meet up during NumPy sprints to
discuss more.

XND is also currently looking for funding and time from interested parties
to continue its development.

-Travis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180617/44d1f72b/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 17 20:47:02 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 17 Jun 2018 20:47:02 -0400
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
Message-ID: <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>

Hi Travis,

More of a detailed question, but as we are currently thinking about
extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)`
for matrix multiplication), and as you must have thought about this for
libgufunc, could you point me to how one would document the signature in
your new system? (I briefly tried but there's no docs yet and I couldn't
immediately find it in the code). If it is at all similar to numpy's and
you have extended it, we should at least check whether we can do the same
thing.

Thanks, all best wishes,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180617/149906f2/attachment.html>

From teoliphant at gmail.com  Mon Jun 18 01:32:00 2018
From: teoliphant at gmail.com (Travis Oliphant)
Date: Mon, 18 Jun 2018 00:32:00 -0500
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
Message-ID: <CAFMmPGNAZ1-jEHMTEnFyrQu+1qwj6Habn1xKeJyKvNmw9OrEbg@mail.gmail.com>

On Sun, Jun 17, 2018, 7:48 PM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:

> Hi Travis,
>
> More of a detailed question, but as we are currently thinking about
> extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)`
> for matrix multiplication), and as you must have thought about this for
> libgufunc, could you point me to how one would document the signature in
> your new system? (I briefly tried but there's no docs yet and I couldn't
> immediately find it in the code). If it is at all similar to numpy's and
> you have extended it, we should at least check whether we can do the same
> thing.
>

I have been reading with interest these gufunc proposals and have pointed
it out to the gumath devs.  Right now, gumath doesn't go much beyond
NumPy's syntax except for use of a more extensible type system.  It uses
the same notion of the dimension signature, though with a syntax derived
from datashape which you can read more about here:
http://datashape.readthedocs.io/en/latest/

Stefan Krah, Pearu, or Saul may have more comments.

Thanks,

-Travis


> Thanks, all best wishes,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/90b086dd/attachment.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 18 09:58:38 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 18 Jun 2018 09:58:38 -0400
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <CAFMmPGNAZ1-jEHMTEnFyrQu+1qwj6Habn1xKeJyKvNmw9OrEbg@mail.gmail.com>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
 <CAFMmPGNAZ1-jEHMTEnFyrQu+1qwj6Habn1xKeJyKvNmw9OrEbg@mail.gmail.com>
Message-ID: <CAJNV+9s6X23j7GOQ7i20phc-U1i0NOzd-PLZSev=8eDDiCDB0A@mail.gmail.com>

Interesting. If nothing else, it would be a nice way to mark our internal
functions, including the loops. It also should not be difficult to have
(g)ufunc signatures exported in that way, combining `signature` and `types`.

In more detail, I see the grammar clearly allows fixed dimensions in a way
that easily translates, but it isn't immediately obvious to me how one
would express broadcasting or possibly missing ones, so perhaps there is
room for sharing how to indicate that (although it is at a higher level;
the function signature is fine).

-- Marten

For others, direct link to datashape grammar:
http://datashape.readthedocs.io/en/latest/grammar.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/c88ebd04/attachment.html>

From skrah at bytereef.org  Mon Jun 18 10:02:18 2018
From: skrah at bytereef.org (Stefan Krah)
Date: Mon, 18 Jun 2018 16:02:18 +0200
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
Message-ID: <20180618140218.GA17701@bytereef.org>

On Sun, Jun 17, 2018 at 08:47:02PM -0400, Marten van Kerkwijk wrote:
> More of a detailed question, but as we are currently thinking about
> extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)`
> for matrix multiplication), and as you must have thought about this for
> libgufunc, could you point me to how one would document the signature in
> your new system? (I briefly tried but there's no docs yet and I couldn't
> immediately find it in the code).

The docs are a bit scattered across the three libraries, here is something
about types and pattern matching:

   http://ndtypes.readthedocs.io/en/latest/ndtypes/types.html
   http://ndtypes.readthedocs.io/en/latest/ndtypes/pattern-matching.html

A couple of example signatures:

   https://github.com/plures/gumath/blob/5f1f6de3d2c9a003b9dfb224fe09c63ae81bf18b/libgumath/extending/quaternion.c#L121
   https://github.com/plures/gumath/blob/5f1f6de3d2c9a003b9dfb224fe09c63ae81bf18b/libgumath/extending/pdist.c#L115


The function signature for float64-specialized matrix multiplication is:

  "... * N * M * float64, ... * M * P * float64 -> ... * N * P * float64"


The function signature for generic matrix multiplication is:

  "... * N * M * T, ... * M * P * T -> ... * N * P * T"


A function that only accepts scalars:

  "... * N * M * Scalar, ... * M * P * Scalar -> ... * N * P * Scalar"


A couple of observations:  Functions are multimethods, so function dispatch
on concrete arguments works by trying to locate a matching kernel.

For example, if only the above "float64" kernel is present, all other
dtypes will fail.


Casting
-------

It is still under debate how we handle casting.  The current examples
libgumath/kernels simply generate *all* signatures that allow exact
casting of the input for a specific function.


This is feasible for unary and binary kernels, but could lead to case
explosion for functions with many arguments.


The kernel writer however is always free to use the above type variable
or Scalar signatures and handle casting inside the kernel.


Explicit gufuncs
----------------

Gufuncs are explicit and require leading ellipses.  A signature of
"N * M * float64" is not a gufunc and does not allow outer dimensions.


Disable broadcasting
--------------------

  "D... * N * M * float64, D... * M * P * float64 -> D... * N * P * float64"

Dimension variables match a sequence of dimensions, so in the above example
all outer dimensions must be exactly the same.


Non-symbolic matches
--------------------

"... * 2 * 3 * int8" only accepts "2 * 3 * int8" as the inner dimensions.


Sorry for the long mail, I hope this clears up a bit what function signatures
generally look like.


Stefan Krah


From m.h.vankerkwijk at gmail.com  Mon Jun 18 12:34:03 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 18 Jun 2018 12:34:03 -0400
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <20180618140218.GA17701@bytereef.org>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
 <20180618140218.GA17701@bytereef.org>
Message-ID: <CAJNV+9vXOR6+HK30ShPv47JfG_WsxasxHCmG2dUSyxHdgdSGiw@mail.gmail.com>

Hi Stefan,

That looks quite nice and expressive. In the context of a discussion we
have been having about describing `matmul/@` and possibly broadcastable
dimensions, I think from your description it sounds like one would describe
`@` with multiple functions (the multiple dispatch we have been (are?)
considering as well):


"... * N * M * T, ... * M * P * T -> ... * N * P * T"
"M * T, ... * M * P * T -> ... P * T"
"... * N * M * T, M * T -> ... * N * T"
"M * T, M * T -> T"

Is there a way to describe broadcasting?  The sample case we've come up
with is a function that calculates a weighted mean. This might take
(values, sigmas) and return (mean, sigma_mean), which would imply a
signature like:

"... N * T, ... N * T -> ... * T, ... * T"

But would your signature allow indicating that one could pass in a single
sigma? I.e., broadcast the second 1 to N if needed?

I realize that this is no longer about describing precisely what the
function doing the calculation expects, but rather what an upper level is
allowed to do before calling the function (i.e., take a dimension of 1 and
broadcast it).

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/b9583b6a/attachment.html>

From skrah at bytereef.org  Mon Jun 18 15:09:50 2018
From: skrah at bytereef.org (Stefan Krah)
Date: Mon, 18 Jun 2018 21:09:50 +0200
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <CAJNV+9vXOR6+HK30ShPv47JfG_WsxasxHCmG2dUSyxHdgdSGiw@mail.gmail.com>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
 <20180618140218.GA17701@bytereef.org>
 <CAJNV+9vXOR6+HK30ShPv47JfG_WsxasxHCmG2dUSyxHdgdSGiw@mail.gmail.com>
Message-ID: <20180618190950.GA3899@bytereef.org>


Hi Marten,

On Mon, Jun 18, 2018 at 12:34:03PM -0400, Marten van Kerkwijk wrote:
> That looks quite nice and expressive. In the context of a discussion we
> have been having about describing `matmul/@` and possibly broadcastable
> dimensions, I think from your description it sounds like one would describe
> `@` with multiple functions (the multiple dispatch we have been (are?)
> considering as well):
> 
> 
> "... * N * M * T, ... * M * P * T -> ... * N * P * T"
> "M * T, ... * M * P * T -> ... P * T"
> "... * N * M * T, M * T -> ... * N * T"
> "M * T, M * T -> T"

Yes, that's the way, and the outer dimensions (the part matched by the
ellipsis) are always broadcast like in NumPy.


> Is there a way to describe broadcasting?  The sample case we've come up
> with is a function that calculates a weighted mean. This might take
> (values, sigmas) and return (mean, sigma_mean), which would imply a
> signature like:
> 
> "... N * T, ... N * T -> ... * T, ... * T"
> 
> But would your signature allow indicating that one could pass in a single
> sigma? I.e., broadcast the second 1 to N if needed?

Actually I came across this today when implementing optimized matching
for binary functions.

I wanted the faster kernel

  "... * N * int64, ... * N * int64 -> ... * N * int64"

to also match e.g. the input

  "int64, 10 * int64".


The generic datashape spec would forbid this, but perhaps the '?' that
you propose in nep-0020 would offer a way out of this for ndtypes.


It's a bit confusing for datashape, since there is already a questionmark
for missing variable dimensions (that have shape==0 in the data).

  >>> ndt("var * ?var * int64")
  ndt("var * ?var * int64")

This would be the type for e.g. [[0], None, [1,2,3]].


But for symbolic dimensions (which only match fixed dimensions) perhaps this

   "... * ?N * int64, ... * ?N * int64 -> ... * ?N * int64"

or, as in the NEP,

   "... * N? * int64, ... * N? * int64 -> ... * N? * int64"

should mean "At least one input has ndim >= 1, broadcast as necessary".


This still means that for the "all ndim==0" case one would need an
additional kernel "int64, int64 -> int64".


> I realize that this is no longer about describing precisely what the
> function doing the calculation expects, but rather what an upper level is
> allowed to do before calling the function (i.e., take a dimension of 1 and
> broadcast it).

Yes, for datashape the problem is that it also allows non-broadcastable
signatures like "N * float64", really the same as "double x[]" in C.

But the '?' with occasionally one additional kernel for ndim==0 could
solve this.


Stefan Krah


From charlesr.harris at gmail.com  Mon Jun 18 16:20:10 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 18 Jun 2018 14:20:10 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
Message-ID: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>

Hi All,

I've been trying to put out the NumPy 1.15.0rc1, but cannot get
`numpy-wheels` to upload the wheels to rackspace on windows, there is a
certification problem. I note that that requirement was supposedly disabled:

 on_success:
  # Upload the generated wheel package to Rackspace
  # On Windows, Apache Libcloud cannot find a standard CA cert bundle so we
  # disable the ssl checks.

and nothing relevant seems to have changed in our `.appveyor.yml` since the
last successful run 7 days ago, 6 if we count 1.14.5, so I'm thinking a
policy has changed at either at rackspace or appveyor, but that is just a
guess. I'm experimenting with various changes to the script and the
`apache-libcloud` version to see if I can get success, but thought I'd ask
if anyone knew anything that might be helpful.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/f67816af/attachment.html>

From nathan12343 at gmail.com  Mon Jun 18 16:22:01 2018
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Mon, 18 Jun 2018 15:22:01 -0500
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
Message-ID: <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>

I think Matthew Brett needs to fix this.

On Mon, Jun 18, 2018 at 3:20 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> I've been trying to put out the NumPy 1.15.0rc1, but cannot get
> `numpy-wheels` to upload the wheels to rackspace on windows, there is a
> certification problem. I note that that requirement was supposedly disabled:
>
>  on_success:
>   # Upload the generated wheel package to Rackspace
>   # On Windows, Apache Libcloud cannot find a standard CA cert bundle so we
>   # disable the ssl checks.
>
> and nothing relevant seems to have changed in our `.appveyor.yml` since
> the last successful run 7 days ago, 6 if we count 1.14.5, so I'm thinking a
> policy has changed at either at rackspace or appveyor, but that is just a
> guess. I'm experimenting with various changes to the script and the
> `apache-libcloud` version to see if I can get success, but thought I'd ask
> if anyone knew anything that might be helpful.
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/ee129eda/attachment.html>

From charlesr.harris at gmail.com  Mon Jun 18 16:42:31 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 18 Jun 2018 14:42:31 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
Message-ID: <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>

On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum <nathan12343 at gmail.com>
wrote:

> I think Matthew Brett needs to fix this.
>

That would be nice, but I'm not convinced it is helpful :) I note that
latest `apache-libcloud` does not install directly on windows, there seem
to be some missing dependencies.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/153954fc/attachment-0001.html>

From matthew.brett at gmail.com  Mon Jun 18 17:13:21 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 18 Jun 2018 22:13:21 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
Message-ID: <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>

Hi,

On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>>
>> I think Matthew Brett needs to fix this.
>
>
> That would be nice, but I'm not convinced it is helpful :) I note that
> latest `apache-libcloud` does not install directly on windows, there seem to
> be some missing dependencies.>

I'm happy to give it a go - Chuck - can I cancel the various builds
running on my account, so I can do some debugging.

Cheers,

Matthew

From charlesr.harris at gmail.com  Mon Jun 18 19:24:41 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 18 Jun 2018 17:24:41 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
Message-ID: <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>

On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> > wrote:
> >>
> >> I think Matthew Brett needs to fix this.
> >
> >
> > That would be nice, but I'm not convinced it is helpful :) I note that
> > latest `apache-libcloud` does not install directly on windows, there
> seem to
> > be some missing dependencies.>
>
> I'm happy to give it a go - Chuck - can I cancel the various builds
> running on my account, so I can do some debugging.
>

Absolutely! Nuke those suckers ...

Chuck

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/3348484a/attachment.html>

From matthew.brett at gmail.com  Mon Jun 18 19:58:28 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 19 Jun 2018 00:58:28 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
Message-ID: <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>

On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> > wrote:
>> >>
>> >> I think Matthew Brett needs to fix this.
>> >
>> >
>> > That would be nice, but I'm not convinced it is helpful :) I note that
>> > latest `apache-libcloud` does not install directly on windows, there
>> > seem to
>> > be some missing dependencies.>
>>
>> I'm happy to give it a go - Chuck - can I cancel the various builds
>> running on my account, so I can do some debugging.
>
>
> Absolutely! Nuke those suckers ...

Hmm - I just tried installing certifi to get the SSL certificates, and
removed --no-ssl-check.  I wonder if something changed in the
Rackspace protocols, or something.

In case it's useful, I'm using a little repo that runs an Appveyor job
then drops into an RDP server for me to log into, with the relevant
bit here:

https://github.com/matthew-brett/appvfutz/blob/master/appveyor.yml#L24

See: https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-desktop-connection

That said, maybe the fix doesn't work, let's wait on the builds.

Cheers,

Matthew

From m.h.vankerkwijk at gmail.com  Mon Jun 18 21:04:19 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 18 Jun 2018 21:04:19 -0400
Subject: [Numpy-discussion] A little about XND
In-Reply-To: <20180618190950.GA3899@bytereef.org>
References: <CAFMmPGMsHGnpjnyrCE9n=EOL2oP1ewuL1puFrsfADd8SE8+ywg@mail.gmail.com>
 <CAJNV+9vYRc5PwPDDtbbVacp1ETk8U8RY1ij2LPwT8nY6b2YbWA@mail.gmail.com>
 <20180618140218.GA17701@bytereef.org>
 <CAJNV+9vXOR6+HK30ShPv47JfG_WsxasxHCmG2dUSyxHdgdSGiw@mail.gmail.com>
 <20180618190950.GA3899@bytereef.org>
Message-ID: <CAJNV+9tpVjPRtdmcE2Pp429RKVf+rx+Gw24PrwLr=d_rESSnKg@mail.gmail.com>

Hi Stefan,

Just to clarify: the ? we propose in the NEP is really for matmul - it
indicates a true missing dimension (i.e., the array cannot have outer
broadcast dimensions as well). For inner loop broadcasting, I'm proposing a
"|1" post-fix, which means a dimension could also be missing, but can also
be there and be 1, in which case it can do outer broadcast as well.  So,
for your function in your notation, it might look like:

"... * N|1 * int64, ... * N|1 * int64 -> ... * N * int64"

(Note that the output of course always has N - if both inputs have 1 then
N=1; it is not meant to be absent).

I think that actually looks quite clear, although perhaps one might want
parentheses around it (since "|" = "or" normally does not have precedence
over "*" = multiply), i.e.,

"... * (N|1) * int64, ... * (N|1) * int64 -> ... * N * int64"

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/4f0f0e93/attachment.html>

From charlesr.harris at gmail.com  Mon Jun 18 21:44:16 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 18 Jun 2018 19:44:16 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
Message-ID: <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>

On Mon, Jun 18, 2018 at 5:58 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum <
> nathan12343 at gmail.com>
> >> > wrote:
> >> >>
> >> >> I think Matthew Brett needs to fix this.
> >> >
> >> >
> >> > That would be nice, but I'm not convinced it is helpful :) I note that
> >> > latest `apache-libcloud` does not install directly on windows, there
> >> > seem to
> >> > be some missing dependencies.>
> >>
> >> I'm happy to give it a go - Chuck - can I cancel the various builds
> >> running on my account, so I can do some debugging.
> >
> >
> > Absolutely! Nuke those suckers ...
>
> Hmm - I just tried installing certifi to get the SSL certificates, and
> removed --no-ssl-check.  I wonder if something changed in the
> Rackspace protocols, or something.
>
> In case it's useful, I'm using a little repo that runs an Appveyor job
> then drops into an RDP server for me to log into, with the relevant
> bit here:
>
> https://github.com/matthew-brett/appvfutz/blob/master/appveyor.yml#L24
>
> See: https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-
> desktop-connection
>
> That said, maybe the fix doesn't work, let's wait on the builds.
>
>
Looks like that fixes the problem. Probably scipy-wheels will need that fix
also. Do you know if new wheels with the same name will overwrite the old
ones? ISTR that that is the case.

BTW, there don't seem to be any nightly builds, does something need
reconfiguration?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180618/afbd4529/attachment-0001.html>

From matthew.brett at gmail.com  Tue Jun 19 06:57:05 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 19 Jun 2018 11:57:05 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
Message-ID: <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>

Hi,

On Tue, Jun 19, 2018 at 2:44 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jun 18, 2018 at 5:58 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett <matthew.brett at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum
>> >> > <nathan12343 at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> I think Matthew Brett needs to fix this.
>> >> >
>> >> >
>> >> > That would be nice, but I'm not convinced it is helpful :) I note
>> >> > that
>> >> > latest `apache-libcloud` does not install directly on windows, there
>> >> > seem to
>> >> > be some missing dependencies.>
>> >>
>> >> I'm happy to give it a go - Chuck - can I cancel the various builds
>> >> running on my account, so I can do some debugging.
>> >
>> >
>> > Absolutely! Nuke those suckers ...
>>
>> Hmm - I just tried installing certifi to get the SSL certificates, and
>> removed --no-ssl-check.  I wonder if something changed in the
>> Rackspace protocols, or something.
>>
>> In case it's useful, I'm using a little repo that runs an Appveyor job
>> then drops into an RDP server for me to log into, with the relevant
>> bit here:
>>
>> https://github.com/matthew-brett/appvfutz/blob/master/appveyor.yml#L24
>>
>> See:
>> https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-desktop-connection
>>
>> That said, maybe the fix doesn't work, let's wait on the builds.
>>
>
> Looks like that fixes the problem. Probably scipy-wheels will need that fix
> also.

I put it in.

> Do you know if new wheels with the same name will overwrite the old
> ones? ISTR that that is the case.

Right - they overwrite the old ones.

> BTW, there don't seem to be any nightly builds, does something need
> reconfiguration?

For Appveyor?  You need a cron-enabled account.  My account is
enabled, I just emailed the appveyor support with my username, and an
explanation.  Maybe worth doing the same for the numpy account?
Thereafter, you can just enter the cron time string in the settings,
to enable daily builds.

Cheers,

Matthew

From cimrman3 at ntc.zcu.cz  Tue Jun 19 07:52:31 2018
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Tue, 19 Jun 2018 13:52:31 +0200
Subject: [Numpy-discussion] ANN: SfePy 2018.2
Message-ID: <31b88a31-b853-4a00-415a-55f935472ab3@ntc.zcu.cz>

I am pleased to announce release 2018.2 of SfePy.

Description
-----------

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (limited support). It is distributed under the new BSD
license.

Home page: http://sfepy.org
Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/
Git (source) repository, issue tracker: https://github.com/sfepy/sfepy

Highlights of this release
--------------------------

- generalized-alpha and velocity Verlet elastodynamics solvers
- terms for dispersion in fluids
- caching of reference coordinates for faster repeated use of probes
- new wrapper of MUMPS linear solver for parallel runs

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Cheers,
Robert Cimrman

---

Contributors to this release in alphabetical order:

Robert Cimrman
Lubos Kejzlar
Vladimir Lukes
Matyas Novak

From charlesr.harris at gmail.com  Tue Jun 19 09:46:08 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 19 Jun 2018 07:46:08 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
Message-ID: <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>

On Tue, Jun 19, 2018 at 4:57 AM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> On Tue, Jun 19, 2018 at 2:44 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Jun 18, 2018 at 5:58 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
> >> >> <charlesr.harris at gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum
> >> >> > <nathan12343 at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> I think Matthew Brett needs to fix this.
> >> >> >
> >> >> >
> >> >> > That would be nice, but I'm not convinced it is helpful :) I note
> >> >> > that
> >> >> > latest `apache-libcloud` does not install directly on windows,
> there
> >> >> > seem to
> >> >> > be some missing dependencies.>
> >> >>
> >> >> I'm happy to give it a go - Chuck - can I cancel the various builds
> >> >> running on my account, so I can do some debugging.
> >> >
> >> >
> >> > Absolutely! Nuke those suckers ...
> >>
> >> Hmm - I just tried installing certifi to get the SSL certificates, and
> >> removed --no-ssl-check.  I wonder if something changed in the
> >> Rackspace protocols, or something.
> >>
> >> In case it's useful, I'm using a little repo that runs an Appveyor job
> >> then drops into an RDP server for me to log into, with the relevant
> >> bit here:
> >>
> >> https://github.com/matthew-brett/appvfutz/blob/master/appveyor.yml#L24
> >>
> >> See:
> >> https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-
> desktop-connection
> >>
> >> That said, maybe the fix doesn't work, let's wait on the builds.
> >>
> >
> > Looks like that fixes the problem. Probably scipy-wheels will need that
> fix
> > also.
>
> I put it in.
>
> > Do you know if new wheels with the same name will overwrite the old
> > ones? ISTR that that is the case.
>
> Right - they overwrite the old ones.
>
> > BTW, there don't seem to be any nightly builds, does something need
> > reconfiguration?
>
> For Appveyor?  You need a cron-enabled account.  My account is
> enabled, I just emailed the appveyor support with my username, and an
> explanation.  Maybe worth doing the same for the numpy account?
> Thereafter, you can just enter the cron time string in the settings,
> to enable daily builds.
>
>
What I was curious about is that there were no more "daily" builds of
master.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180619/f0b83150/attachment.html>

From matthew.brett at gmail.com  Tue Jun 19 12:36:29 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 19 Jun 2018 17:36:29 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
Message-ID: <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>

On Tue, Jun 19, 2018 at 2:46 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Jun 19, 2018 at 4:57 AM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Tue, Jun 19, 2018 at 2:44 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Jun 18, 2018 at 5:58 PM, Matthew Brett <matthew.brett at gmail.com>
>> > wrote:
>> >>
>> >> On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett
>> >> > <matthew.brett at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
>> >> >> <charlesr.harris at gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum
>> >> >> > <nathan12343 at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> I think Matthew Brett needs to fix this.
>> >> >> >
>> >> >> >
>> >> >> > That would be nice, but I'm not convinced it is helpful :) I note
>> >> >> > that
>> >> >> > latest `apache-libcloud` does not install directly on windows,
>> >> >> > there
>> >> >> > seem to
>> >> >> > be some missing dependencies.>
>> >> >>
>> >> >> I'm happy to give it a go - Chuck - can I cancel the various builds
>> >> >> running on my account, so I can do some debugging.
>> >> >
>> >> >
>> >> > Absolutely! Nuke those suckers ...
>> >>
>> >> Hmm - I just tried installing certifi to get the SSL certificates, and
>> >> removed --no-ssl-check.  I wonder if something changed in the
>> >> Rackspace protocols, or something.
>> >>
>> >> In case it's useful, I'm using a little repo that runs an Appveyor job
>> >> then drops into an RDP server for me to log into, with the relevant
>> >> bit here:
>> >>
>> >> https://github.com/matthew-brett/appvfutz/blob/master/appveyor.yml#L24
>> >>
>> >> See:
>> >>
>> >> https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-desktop-connection
>> >>
>> >> That said, maybe the fix doesn't work, let's wait on the builds.
>> >>
>> >
>> > Looks like that fixes the problem. Probably scipy-wheels will need that
>> > fix
>> > also.
>>
>> I put it in.
>>
>> > Do you know if new wheels with the same name will overwrite the old
>> > ones? ISTR that that is the case.
>>
>> Right - they overwrite the old ones.
>>
>> > BTW, there don't seem to be any nightly builds, does something need
>> > reconfiguration?
>>
>> For Appveyor?  You need a cron-enabled account.  My account is
>> enabled, I just emailed the appveyor support with my username, and an
>> explanation.  Maybe worth doing the same for the numpy account?
>> Thereafter, you can just enter the cron time string in the settings,
>> to enable daily builds.
>>
>
> What I was curious about is that there were no more "daily" builds of
> master.

Is that right?  That there were daily builds of master, on Appveyor?
I don't know how those worked, I only recently got cron permission ...

Cheers,

Matthew

From charlesr.harris at gmail.com  Tue Jun 19 12:58:03 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 19 Jun 2018 10:58:03 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
Message-ID: <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>

On Tue, Jun 19, 2018 at 10:36 AM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> On Tue, Jun 19, 2018 at 2:46 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Tue, Jun 19, 2018 at 4:57 AM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Tue, Jun 19, 2018 at 2:44 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Mon, Jun 18, 2018 at 5:58 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> On Tue, Jun 19, 2018 at 12:24 AM, Charles R Harris
> >> >> <charlesr.harris at gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> > On Mon, Jun 18, 2018 at 3:13 PM, Matthew Brett
> >> >> > <matthew.brett at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Mon, Jun 18, 2018 at 9:42 PM, Charles R Harris
> >> >> >> <charlesr.harris at gmail.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, Jun 18, 2018 at 2:22 PM, Nathan Goldbaum
> >> >> >> > <nathan12343 at gmail.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> I think Matthew Brett needs to fix this.
> >> >> >> >
> >> >> >> >
> >> >> >> > That would be nice, but I'm not convinced it is helpful :) I
> note
> >> >> >> > that
> >> >> >> > latest `apache-libcloud` does not install directly on windows,
> >> >> >> > there
> >> >> >> > seem to
> >> >> >> > be some missing dependencies.>
> >> >> >>
> >> >> >> I'm happy to give it a go - Chuck - can I cancel the various
> builds
> >> >> >> running on my account, so I can do some debugging.
> >> >> >
> >> >> >
> >> >> > Absolutely! Nuke those suckers ...
> >> >>
> >> >> Hmm - I just tried installing certifi to get the SSL certificates,
> and
> >> >> removed --no-ssl-check.  I wonder if something changed in the
> >> >> Rackspace protocols, or something.
> >> >>
> >> >> In case it's useful, I'm using a little repo that runs an Appveyor
> job
> >> >> then drops into an RDP server for me to log into, with the relevant
> >> >> bit here:
> >> >>
> >> >> https://github.com/matthew-brett/appvfutz/blob/master/
> appveyor.yml#L24
> >> >>
> >> >> See:
> >> >>
> >> >> https://www.gep13.co.uk/blog/how-to-use-appveyor-remote-
> desktop-connection
> >> >>
> >> >> That said, maybe the fix doesn't work, let's wait on the builds.
> >> >>
> >> >
> >> > Looks like that fixes the problem. Probably scipy-wheels will need
> that
> >> > fix
> >> > also.
> >>
> >> I put it in.
> >>
> >> > Do you know if new wheels with the same name will overwrite the old
> >> > ones? ISTR that that is the case.
> >>
> >> Right - they overwrite the old ones.
> >>
> >> > BTW, there don't seem to be any nightly builds, does something need
> >> > reconfiguration?
> >>
> >> For Appveyor?  You need a cron-enabled account.  My account is
> >> enabled, I just emailed the appveyor support with my username, and an
> >> explanation.  Maybe worth doing the same for the numpy account?
> >> Thereafter, you can just enter the cron time string in the settings,
> >> to enable daily builds.
> >>
> >
> > What I was curious about is that there were no more "daily" builds of
> > master.
>
> Is that right?  That there were daily builds of master, on Appveyor?
> I don't know how those worked, I only recently got cron permission ...
>

No, but there used to be daily builds on travis. They stopped 8 days ago,
https://travis-ci.org/MacPython/numpy-wheels/builds.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180619/27a5dd33/attachment.html>

From matti.picus at gmail.com  Tue Jun 19 13:27:39 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 19 Jun 2018 10:27:39 -0700
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
Message-ID: <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180619/2b7e19a0/attachment-0001.html>

From matthew.brett at gmail.com  Tue Jun 19 13:57:31 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 19 Jun 2018 18:57:31 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
Message-ID: <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>

Hi,

On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com> wrote:
> On 19/06/18 09:58, Charles R Harris wrote:
>>
>> > What I was curious about is that there were no more "daily" builds of
>> > master.
>>
>> Is that right?  That there were daily builds of master, on Appveyor?
>> I don't know how those worked, I only recently got cron permission ...
>
>
> No, but there used to be daily builds on travis. They stopped 8 days ago,
> https://travis-ci.org/MacPython/numpy-wheels/builds.

Oops - yes - sorry - I retired the 'daily' branch, in favor of
'master', but forgot to update the Travis-CI settings.

Done now.

Cheers,

Matthew

From sidky at uchicago.edu  Tue Jun 19 13:51:12 2018
From: sidky at uchicago.edu (Emil Sidky)
Date: Tue, 19 Jun 2018 12:51:12 -0500
Subject: [Numpy-discussion] question about array slicing and element
 assignment
Message-ID: <88fa73eb-4ef9-c0f8-09f1-32b967b79b15@uchicago.edu>

Hello,
The following is an example where an array element assignment didn't work
as I expected.
Create a 6 x 3 matrix:

In [70]: a =? randn(6,3)

In [71]: a
Out[71]:
array([[ 1.73266816,? 0.948849? ,? 0.69188222],
 ?????? [-0.61840161, -0.03449826,? 0.15032552],
 ?????? [ 0.4963306 ,? 0.77028209, -0.63076396],
 ?????? [-1.92273602, -1.03146536,? 0.27744612],
 ?????? [ 0.70736325,? 1.54687964, -0.75573888],
 ?????? [ 0.16316043, -0.34814532,? 0.3683143 ]])

Create a 3x3 boolean array:
In [72]: mask = randn(3,3)>0.

In [73]: mask
Out[73]:
array([[ True,? True,? True],
 ?????? [False,? True,? True],
 ?????? [ True, False,? True]], dtype=bool)

Try to modify elements of "a" with the following line:
In [74]: a[(2,3,5),][mask] = 1.
No elements are changed in "a":
In [75]: a
Out[75]:
array([[ 1.73266816,? 0.948849? ,? 0.69188222],
 ?????? [-0.61840161, -0.03449826,? 0.15032552],
 ?????? [ 0.4963306 ,? 0.77028209, -0.63076396],
 ?????? [-1.92273602, -1.03146536,? 0.27744612],
 ?????? [ 0.70736325,? 1.54687964, -0.75573888],
 ?????? [ 0.16316043, -0.34814532,? 0.3683143 ]])

Instead try to modify elements of "a" with this line:
In [76]: a[::2,][mask] = 1.

This time it works:
In [77]: a
Out[77]:
array([[ 1.??????? ,? 1.??????? ,? 1.??????? ],
 ?????? [-0.61840161, -0.03449826,? 0.15032552],
 ?????? [ 0.4963306 ,? 1.??????? ,? 1.??????? ],
 ?????? [-1.92273602, -1.03146536,? 0.27744612],
 ?????? [ 1.??????? ,? 1.54687964,? 1.??????? ],
 ?????? [ 0.16316043, -0.34814532,? 0.3683143 ]])


Is there a way where I can modify the elements of "a" selected by an 
expression like "a[(2,3,5),][mask]" ?

Thanks , Emil

From shoyer at gmail.com  Tue Jun 19 17:09:24 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 19 Jun 2018 14:09:24 -0700
Subject: [Numpy-discussion] question about array slicing and element
 assignment
In-Reply-To: <88fa73eb-4ef9-c0f8-09f1-32b967b79b15@uchicago.edu>
References: <88fa73eb-4ef9-c0f8-09f1-32b967b79b15@uchicago.edu>
Message-ID: <CAEQ_TveBiSGanXwxELH7w4dC_76zfZRQooRBiEz+cmAF2Le1=w@mail.gmail.com>

You will need to convert "a[(2,3,5),][mask]" into a single indexing
expression, e.g, by using utility functions like np.nonzero() on mask.
NumPy can't support assignment in chained indexing.

On Tue, Jun 19, 2018 at 1:25 PM Emil Sidky <sidky at uchicago.edu> wrote:

> Hello,
> The following is an example where an array element assignment didn't work
> as I expected.
> Create a 6 x 3 matrix:
>
> In [70]: a =  randn(6,3)
>
> In [71]: a
> Out[71]:
> array([[ 1.73266816,  0.948849  ,  0.69188222],
>         [-0.61840161, -0.03449826,  0.15032552],
>         [ 0.4963306 ,  0.77028209, -0.63076396],
>         [-1.92273602, -1.03146536,  0.27744612],
>         [ 0.70736325,  1.54687964, -0.75573888],
>         [ 0.16316043, -0.34814532,  0.3683143 ]])
>
> Create a 3x3 boolean array:
> In [72]: mask = randn(3,3)>0.
>
> In [73]: mask
> Out[73]:
> array([[ True,  True,  True],
>         [False,  True,  True],
>         [ True, False,  True]], dtype=bool)
>
> Try to modify elements of "a" with the following line:
> In [74]: a[(2,3,5),][mask] = 1.
> No elements are changed in "a":
> In [75]: a
> Out[75]:
> array([[ 1.73266816,  0.948849  ,  0.69188222],
>         [-0.61840161, -0.03449826,  0.15032552],
>         [ 0.4963306 ,  0.77028209, -0.63076396],
>         [-1.92273602, -1.03146536,  0.27744612],
>         [ 0.70736325,  1.54687964, -0.75573888],
>         [ 0.16316043, -0.34814532,  0.3683143 ]])
>
> Instead try to modify elements of "a" with this line:
> In [76]: a[::2,][mask] = 1.
>
> This time it works:
> In [77]: a
> Out[77]:
> array([[ 1.        ,  1.        ,  1.        ],
>         [-0.61840161, -0.03449826,  0.15032552],
>         [ 0.4963306 ,  1.        ,  1.        ],
>         [-1.92273602, -1.03146536,  0.27744612],
>         [ 1.        ,  1.54687964,  1.        ],
>         [ 0.16316043, -0.34814532,  0.3683143 ]])
>
>
> Is there a way where I can modify the elements of "a" selected by an
> expression like "a[(2,3,5),][mask]" ?
>
> Thanks , Emil
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180619/0879e883/attachment.html>

From diagonaldevice at gmail.com  Tue Jun 19 19:37:21 2018
From: diagonaldevice at gmail.com (Michael Lamparski)
Date: Tue, 19 Jun 2018 19:37:21 -0400
Subject: [Numpy-discussion] Forcing new dimensions to appear at front in
 advanced indexing
Message-ID: <CAGfQJGEU=eQXKzX6WjfoE6YOHmqRW8MzhUAq_az_GKWmvwRVHQ@mail.gmail.com>

Hi all,

So, in advanced indexing, numpy decides where to put new axes based on
whether the "advanced indices" are all next to each other.

>>> np.random.random((3,4,5,6,7,8))[:, [[0,0],[0,0]], 1, :].shape
(3, 2, 2, 6, 7, 8)
>>> np.random.random((3,4,5,6,7,8))[:, [[0,0],[0,0]], :, 1].shape
(2, 2, 3, 5, 7, 8)

In creating a wrapper type around arrays, I'm finding myself needing to
suppress this behavior, so that the new axes consistently appear in the
front.  I thought of a dumb hat trick:

def index(x, indices):
    return x[(True, None) + indices]

Which certainly gets the new dimensions where I want them, but it
introduces a ghost dimension of 1 (and sometimes two such dimensions!) in a
place where I'm not sure I can easily find it.

>>> np.random.random((3,4,5,6,7,8))[True, None, 1].shape
(1, 1, 4, 5, 6, 7, 8)
>>> np.random.random((3,4,5,6,7,8))[True, None, :, [[0,0],[0,0]], 1,
:].shape
(2, 2, 1, 3, 6, 7, 8)
>>> np.random.random((3,4,5,6,7,8))[True, None, :, [[0,0],[0,0]], :,
1].shape
(2, 2, 1, 3, 5, 7, 8)

any better ideas?

---

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180619/4d20d518/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun 20 05:34:42 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 20 Jun 2018 11:34:42 +0200
Subject: [Numpy-discussion] Forcing new dimensions to appear at front in
 advanced indexing
In-Reply-To: <CAGfQJGEU=eQXKzX6WjfoE6YOHmqRW8MzhUAq_az_GKWmvwRVHQ@mail.gmail.com>
References: <CAGfQJGEU=eQXKzX6WjfoE6YOHmqRW8MzhUAq_az_GKWmvwRVHQ@mail.gmail.com>
Message-ID: <c309276613e665e48645788eb6cc82a2821febac.camel@sipsolutions.net>

On Tue, 2018-06-19 at 19:37 -0400, Michael Lamparski wrote:
> Hi all,
> 
> So, in advanced indexing, numpy decides where to put new axes based
> on whether the "advanced indices" are all next to each other.
> 
> >>> np.random.random((3,4,5,6,7,8))[:, [[0,0],[0,0]], 1, :].shape
> (3, 2, 2, 6, 7, 8)
> >>> np.random.random((3,4,5,6,7,8))[:, [[0,0],[0,0]], :, 1].shape
> (2, 2, 3, 5, 7, 8)
> 
> In creating a wrapper type around arrays, I'm finding myself needing
> to suppress this behavior, so that the new axes consistently appear
> in the front.  I thought of a dumb hat trick:
> 
> def index(x, indices):
>     return x[(True, None) + indices]
> 
> Which certainly gets the new dimensions where I want them, but it
> introduces a ghost dimension of 1 (and sometimes two such
> dimensions!) in a place where I'm not sure I can easily find it.
> 
> >>> np.random.random((3,4,5,6,7,8))[True, None, 1].shape
> (1, 1, 4, 5, 6, 7, 8)
> >>> np.random.random((3,4,5,6,7,8))[True, None, :, [[0,0],[0,0]], 1,
> :].shape
> (2, 2, 1, 3, 6, 7, 8)
> >>> np.random.random((3,4,5,6,7,8))[True, None, :, [[0,0],[0,0]], :,
> 1].shape
> (2, 2, 1, 3, 5, 7, 8)
> 
> any better ideas?
> 

We have proposed `arr.vindex[...]` to do this and there are is a pure
python implementation of it out there, I think it may be linked here
somewhere:

https://github.com/numpy/numpy/pull/6256

There is a way that will generally work using triple indexing:

arr[..., None, None][orig_indx * (slice(None), np.array(0))][..., 0]

The first and last indexing operation is just a view creation, so it is
basically a no-op. Now doing this gives me the shiver, but it will work
always. If you want to have a no-copy behaviour in case your original
index is ont an advanced indexing operation, you should replace the
np.array(0) with just 0.

- Sebastian


> ---
> 
> Michael
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180620/16fdc492/attachment.sig>

From diagonaldevice at gmail.com  Wed Jun 20 09:15:27 2018
From: diagonaldevice at gmail.com (Michael Lamparski)
Date: Wed, 20 Jun 2018 09:15:27 -0400
Subject: [Numpy-discussion] Forcing new dimensions to appear at front in
 advanced indexing
In-Reply-To: <c309276613e665e48645788eb6cc82a2821febac.camel@sipsolutions.net>
References: <CAGfQJGEU=eQXKzX6WjfoE6YOHmqRW8MzhUAq_az_GKWmvwRVHQ@mail.gmail.com>
 <c309276613e665e48645788eb6cc82a2821febac.camel@sipsolutions.net>
Message-ID: <CAGfQJGGa3R_tOTZkTNAytqtZ=u6+e9GE0=u=G8=Q6+8kTASnjQ@mail.gmail.com>

> There is a way that will generally work using triple indexing:
>
> arr[..., None, None][orig_indx + (slice(None), np.array(0))][..., 0]

Impressive! (note: I fixed the * typo in the quote)

> The first and last indexing operation is just a view creation, so it is
> basically a no-op. Now doing this gives me the shiver, but it will work
> always. If you want to have a no-copy behaviour in case your original
> index is ont an advanced indexing operation, you should replace the
> np.array(0) with just 0.

I agree about the shivers, but any workaround is good to have nonetheless.

If the index is not an advanced indexing operation, does it not suffice to
simply apply the index tuple as-is?

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180620/42f1e1a4/attachment.html>

From sebastian at sipsolutions.net  Wed Jun 20 09:30:49 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 20 Jun 2018 15:30:49 +0200
Subject: [Numpy-discussion] Forcing new dimensions to appear at front in
 advanced indexing
In-Reply-To: <CAGfQJGGa3R_tOTZkTNAytqtZ=u6+e9GE0=u=G8=Q6+8kTASnjQ@mail.gmail.com>
References: <CAGfQJGEU=eQXKzX6WjfoE6YOHmqRW8MzhUAq_az_GKWmvwRVHQ@mail.gmail.com>
 <c309276613e665e48645788eb6cc82a2821febac.camel@sipsolutions.net>
 <CAGfQJGGa3R_tOTZkTNAytqtZ=u6+e9GE0=u=G8=Q6+8kTASnjQ@mail.gmail.com>
Message-ID: <fbd805c228970f9e9df0fb3b2974fd4478487d8b.camel@sipsolutions.net>

On Wed, 2018-06-20 at 09:15 -0400, Michael Lamparski wrote:
> > There is a way that will generally work using triple indexing:
> >
> > arr[..., None, None][orig_indx + (slice(None), np.array(0))][...,
> 0]
> 
> Impressive! (note: I fixed the * typo in the quote)
> 
> > The first and last indexing operation is just a view creation, so
> it is
> > basically a no-op. Now doing this gives me the shiver, but it will
> work
> > always. If you want to have a no-copy behaviour in case your
> original
> > index is ont an advanced indexing operation, you should replace the
> > np.array(0) with just 0.
> 
> I agree about the shivers, but any workaround is good to have
> nonetheless.
> 
> If the index is not an advanced indexing operation, does it not
> suffice to simply apply the index tuple as-is?

Yes, with the `np.array(0)` however, the result will forced to be a
copy and not a view into the original array, when writing the line
first I thought of "force advanced indexing", which there is likely no
reason for though.
If you replace it with 0, the result will be an identical view when the
index is not advanced (with only a tiny bit of call overhead).

So it might be nice to just use 0 instead, since if your index is
advanced indexing, there is no difference between the two. But then you
do not have to check if there is advanced indexing going on at all.

Btw. if you want to use it for an object, I might suggest to actually
use:

object.vindex[...]

notation for this logic (requires a slightly annoying helper class).
The NEP is basically just a draft/proposal status, but xarray is
already using that indexing method/property IIRC, so that name is
relatively certain by now.

I frankly am not sure right now if the vindex proposal was with a
forced copy or not, probably it was.

- Sebastian


> 
> Michael
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180620/686bb9a9/attachment.sig>

From matti.picus at gmail.com  Thu Jun 21 12:25:31 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Thu, 21 Jun 2018 09:25:31 -0700
Subject: [Numpy-discussion] Remove sctypeNA and typeNA from numpy core
Message-ID: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>

numpy.core has many ways to catalogue dtype names: sctypeDict, typeDict 
(which is precisely sctypeDict), typecodes, and typename. We also 
generate sctypeNA and typeNA but, as issue 11241 shows, it is sometimes 
wrong. They are also not documented and never used inside numpy. Instead 
of fixing it, I propose to remove sctypeNA and typeNA.

Any thoughts or objections?
Matti

From matti.picus at gmail.com  Thu Jun 21 13:31:50 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Thu, 21 Jun 2018 10:31:50 -0700
Subject: [Numpy-discussion] Remove sctypeNA and typeNA from numpy core
In-Reply-To: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>
References: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>
Message-ID: <10d9d518-5075-8b9c-2383-792a1589a9e0@gmail.com>

On 21/06/18 09:25, Matti Picus wrote:
> numpy.core has many ways to catalogue dtype names: sctypeDict, 
> typeDict (which is precisely sctypeDict), typecodes, and typename. We 
> also generate sctypeNA and typeNA but, as issue 11241 shows, it is 
> sometimes wrong. They are also not documented and never used inside 
> numpy. Instead of fixing it, I propose to remove sctypeNA and typeNA.
>
> Any thoughts or objections?
> Matti
Whoops? 11340 (not 11241) which has been merged.
Matti

From charlesr.harris at gmail.com  Thu Jun 21 13:34:08 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 21 Jun 2018 11:34:08 -0600
Subject: [Numpy-discussion] NumPy 1.15.0rc1 released
Message-ID: <CAB6mnxJdmmoXD6uu5CD+2t+w_iYwzmk-0zZOacNwC3D+xGdq-Q@mail.gmail.com>

Hi All,

On behalf of the NumPy team I'm pleased to announce the release of NumPy
1.15.0rc1.
This release has an unusual number of cleanups, many deprecations of old
functions,
and improvements to many existing functions. A total of 423 pull reguests
were merged
for this release, please look at the release notes
<https://github.com/numpy/numpy/releases/tag/v1.15.0rc1>for details. Some
highlights are:

   - NumPy has switched to pytest for testing.
   - A new  `numpy.printoptions` context manager.
   - Many improvements to the histogram functions.
   - Support for unicode field names in python 2.7.
   - Improved support for PyPy.

The Python versions supported by this release are 2.7, 3.4-3.6.  The wheels
are linked with
OpenBLAS 3.0, which should fix some of the linalg problems reported for
NumPy 1.14,
and the source archives were created using Cython 0.28.2 and should work
with the upcoming
Python 3.7.

Wheels for this release can be downloaded from PyPI
<https://pypi.org/project/numpy/#description>, source archives are available
from Github <https://github.com/numpy/numpy/releases/tag/v1.15.0rc1>.

A total of 128 people contributed to this release.  People with a "+" by
their
names contributed a patch for the first time.

   - Aaron Critchley +
   - Aarthi +
   - Aarthi Agurusa +
   - Alex Thomas +
   - Alexander Belopolsky
   - Allan Haldane
   - Anas Khan +
   - Andras Deak
   - Andrey Portnoy +
   - Anna Chiara
   - Aurelien Jarno +
   - Baurzhan Muftakhidinov
   - Berend Kapelle +
   - Bernhard M. Wiedemann
   - Bjoern Thiel +
   - Bob Eldering
   - Cenny Wenner +
   - Charles Harris
   - ChloeColeongco +
   - Chris Billington +
   - Christopher +
   - Chun-Wei Yuan +
   - Claudio Freire +
   - Daniel Smith
   - Darcy Meyer +
   - David Abdurachmanov +
   - David Freese
   - Deepak Kumar Gouda +
   - Dennis Weyland +
   - Derrick Williams +
   - Dmitriy Shalyga +
   - Eric Cousineau +
   - Eric Larson
   - Eric Wieser
   - Evgeni Burovski
   - Frederick Lefebvre +
   - Gaspar Karm +
   - Geoffrey Irving
   - Gerhard Hobler +
   - Gerrit Holl
   - Guo Ci +
   - Hameer Abbasi +
   - Han Shen
   - Hiroyuki V. Yamazaki +
   - Hong Xu
   - Ihor Melnyk +
   - Jaime Fernandez
   - Jake VanderPlas +
   - James Tocknell +
   - Jarrod Millman
   - Jeff VanOss +
   - John Kirkham
   - Jonas Rauber +
   - Jonathan March +
   - Joseph Fox-Rabinovitz
   - Julian Taylor
   - Junjie Bai +
   - Juris Bogusevs +
   - J?rg D?pfert
   - Kenichi Maehashi +
   - Kevin Sheppard
   - Kimikazu Kato +
   - Kirit Thadaka +
   - Kritika Jalan +
   - Lakshay Garg +
   - Lars G +
   - Licht Takeuchi
   - Louis Potok +
   - Luke Zoltan Kelley
   - MSeifert04 +
   - Mads R. B. Kristensen +
   - Malcolm Smith +
   - Mark Harfouche +
   - Marten H. van Kerkwijk +
   - Marten van Kerkwijk
   - Matheus Vieira Portela +
   - Mathieu Lamarre
   - Mathieu Sornay +
   - Matthew Brett
   - Matthew Rocklin +
   - Matthias Bussonnier
   - Matti Picus
   - Michael Droettboom
   - Miguel S?nchez de Le?n Peque +
   - Mike Toews +
   - Milo +
   - Nathaniel J. Smith
   - Nelle Varoquaux
   - Nicholas Nadeau +
   - Nick Minkyu Lee +
   - Nikita +
   - Nikita Kartashov +
   - Nils Becker +
   - Oleg Zabluda
   - Orestis Floros +
   - Pat Gunn +
   - Paul van Mulbregt +
   - Pauli Virtanen
   - Pierre Chanial +
   - Ralf Gommers
   - Raunak Shah +
   - Robert Kern
   - Russell Keith-Magee +
   - Ryan Soklaski +
   - Samuel Jackson +
   - Sebastian Berg
   - Siavash Eliasi +
   - Simon Conseil
   - Simon Gibbons
   - Stefan Krah +
   - Stefan van der Walt
   - Stephan Hoyer
   - Subhendu +
   - Subhendu Ranjan Mishra +
   - Tai-Lin Wu +
   - Tobias Fischer +
   - Toshiki Kataoka +
   - Tyler Reddy +
   - Varun Nayyar
   - Victor Rodriguez +
   - Warren Weckesser
   - Zane Bradley +
   - fo40225
   - lumbric +
   - luzpaz +
   - mamrehn +
   - tynn +
   - xoviat

Cheers

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180621/49a9bd61/attachment.html>

From sebastian at sipsolutions.net  Thu Jun 21 14:07:04 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 21 Jun 2018 20:07:04 +0200
Subject: [Numpy-discussion] Remove sctypeNA and typeNA from numpy core
In-Reply-To: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>
References: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>
Message-ID: <955d87a8233a004338d0fa571dae23cfbe4c44cc.camel@sipsolutions.net>

On Thu, 2018-06-21 at 09:25 -0700, Matti Picus wrote:
> numpy.core has many ways to catalogue dtype names: sctypeDict,
> typeDict 
> (which is precisely sctypeDict), typecodes, and typename. We also 
> generate sctypeNA and typeNA but, as issue 11241 shows, it is
> sometimes 
> wrong. They are also not documented and never used inside numpy.
> Instead 
> of fixing it, I propose to remove sctypeNA and typeNA.
> 

Sounds like a good idea, we have too much stuff in there, and this one
is not even useful (I bet the NA is for the missing value support that
never happened).

Might be good to do a quick deprecation anyway though, mostly out of
principle.

- Sebastian

> Any thoughts or objections?
> Matti
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180621/52799903/attachment-0001.sig>

From wieser.eric+numpy at gmail.com  Thu Jun 21 14:22:16 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Thu, 21 Jun 2018 11:22:16 -0700
Subject: [Numpy-discussion] Remove sctypeNA and typeNA from numpy core
In-Reply-To: <955d87a8233a004338d0fa571dae23cfbe4c44cc.camel@sipsolutions.net>
References: <04fb0382-9a42-f4e8-bf32-baf215df09c6@gmail.com>
 <955d87a8233a004338d0fa571dae23cfbe4c44cc.camel@sipsolutions.net>
Message-ID: <CAL1kJvBmHzJbWkoA9aLoM6LEHvkjXs7ghKbfkP1-fcCBCZ2qgQ@mail.gmail.com>

> I bet the NA is for the missing value support thatnever happened

Nope - NA stands for NumArray

Eric

On Thu, 21 Jun 2018 at 11:07 Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Thu, 2018-06-21 at 09:25 -0700, Matti Picus wrote:
> > numpy.core has many ways to catalogue dtype names: sctypeDict,
> > typeDict
> > (which is precisely sctypeDict), typecodes, and typename. We also
> > generate sctypeNA and typeNA but, as issue 11241 shows, it is
> > sometimes
> > wrong. They are also not documented and never used inside numpy.
> > Instead
> > of fixing it, I propose to remove sctypeNA and typeNA.
> >
>
> Sounds like a good idea, we have too much stuff in there, and this one
> is not even useful (I bet the NA is for the missing value support that
> never happened).
>
> Might be good to do a quick deprecation anyway though, mostly out of
> principle.
>
> - Sebastian
>
> > Any thoughts or objections?
> > Matti
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180621/ec1f48c8/attachment.html>

From shoyer at gmail.com  Mon Jun 25 17:30:02 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 25 Jun 2018 17:30:02 -0400
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing
Message-ID: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>

Sebastian and I have revised a Numpy Enhancement Proposal that he started
three years ago for overhauling NumPy's advanced indexing. We'd now like to
present it for official consideration.

Minor inline comments (e.g., typos) can be added to the latest pull request
(https://github.com/numpy/numpy/pull/11414/files), but otherwise let's keep
discussion on the mailing list. The NumPy website should update shortly
with a rendered version (
http://www.numpy.org/neps/nep-0021-advanced-indexing.html), but until then
please see the full text below.

Cheers,
Stephan

=========================================
Simplified and explicit advanced indexing
=========================================

:Author: Sebastian Berg
:Author: Stephan Hoyer <shoyer at google.com>
:Status: Draft
:Type: Standards Track
:Created: 2015-08-27


Abstract
--------

NumPy's "advanced" indexing support for indexing arrays with other arrays is
one of its most powerful and popular features. Unfortunately, the existing
rules for advanced indexing with multiple array indices are typically
confusing
to both new, and in many cases even old, users of NumPy. Here we propose an
overhaul and simplification of advanced indexing, including two new
"indexer"
attributes ``oindex`` and ``vindex`` to facilitate explicit indexing.

Background
----------

Existing indexing operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NumPy arrays currently support a flexible range of indexing operations:

- "Basic" indexing involving only slices, integers, ``np.newaxis`` and
ellipsis
  (``...``), e.g., ``x[0, :3, np.newaxis]`` for selecting the first element
  from the 0th axis, the first three elements from the 1st axis and
inserting a
  new axis of size 1 at the end. Basic indexing always return a view of the
  indexed array's data.
- "Advanced" indexing, also called "fancy" indexing, includes all cases
where
  arrays are indexed by other arrays. Advanced indexing always makes a copy:

  - "Boolean" indexing by boolean arrays, e.g., ``x[x > 0]`` for
    selecting positive elements.
  - "Vectorized" indexing by one or more integer arrays, e.g., ``x[[0, 1]]``
    for selecting the first two elements along the first axis. With multiple
    arrays, vectorized indexing uses broadcasting rules to combine indices
along
    multiple dimensions. This allows for producing a result of arbitrary
shape
    with arbitrary elements from the original arrays.
  - "Mixed" indexing involving any combinations of the other advancing
types.
    This is no more powerful than vectorized indexing, but is sometimes more
    convenient.

For clarity, we will refer to these existing rules as "legacy indexing".
This is only a high-level summary; for more details, see NumPy's
documentation
and and `Examples` below.

Outer indexing
~~~~~~~~~~~~~~

One broadly useful class of indexing operations is not supported:

- "Outer" or orthogonal indexing treats one-dimensional arrays equivalently
to
  slices for determining output shapes. The rule for outer indexing is that
the
  result should be equivalent to independently indexing along each dimension
  with integer or boolean arrays as if both the indexed and indexing arrays
  were one-dimensional. This form of indexing is familiar to many users of
other
  programming languages such as MATLAB, Fortran and R.

The reason why NumPy omits support for outer indexing is that the rules for
outer and vectorized conflict. Consider indexing a 2D array by two 1D
integer
arrays, e.g., ``x[[0, 1], [0, 1]]``:

- Outer indexing is equivalent to combining multiple integer indices with
  ``itertools.product()``. The result in this case is another 2D array with
  all combinations of indexed elements, e.g.,
  ``np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])``
- Vectorized indexing is equivalent to combining multiple integer indices
with
  ``zip()``. The result in this case is a 1D array containing the diagonal
  elements, e.g., ``np.array([x[0, 0], x[1, 1]])``.

This difference is a frequent stumbling block for new NumPy users. The outer
indexing model is easier to understand, and is a natural generalization of
slicing rules. But NumPy instead chose to support vectorized indexing,
because
it is strictly more powerful.

It is always possible to emulate outer indexing by vectorized indexing with
the right indices. To make this easier, NumPy includes utility objects and
functions such as ``np.ogrid`` and ``np.ix_``, e.g.,
``x[np.ix_([0, 1], [0, 1])]``. However, there are no utilities for emulating
fully general/mixed outer indexing, which could unambiguously allow for
slices,
integers, and 1D boolean and integer arrays.

Mixed indexing
~~~~~~~~~~~~~~

NumPy's existing rules for combining multiple types of indexing in the same
operation are quite complex, involving a number of edge cases.

One reason why mixed indexing is particularly confusing is that at first
glance
the result works deceptively like outer indexing. Returning to our example
of a
2D array, both ``x[:2, [0, 1]]`` and ``x[[0, 1], :2]`` return 2D arrays with
axes in the same order as the original array.

However, as soon as two or more non-slice objects (including integers) are
introduced, vectorized indexing rules apply. The axes introduced by the
array
indices are at the front, unless all array indices are consecutive, in which
case NumPy deduces where the user "expects" them to be. Consider indexing a
3D
array ``arr`` with shape ``(X, Y, Z)``:

1. ``arr[:, [0, 1], 0]`` has shape ``(X, 2)``.
2. ``arr[[0, 1], 0, :]`` has shape ``(2, Z)``.
3. ``arr[0, :, [0, 1]]`` has shape ``(2, Y)``, not ``(Y, 2)``!

These first two cases are intuitive and consistent with outer indexing, but
this last case is quite surprising, even to many higly experienced NumPy
users.

Mixed cases involving multiple array indices are also surprising, and only
less problematic because the current behavior is so useless that it is
rarely
encountered in practice. When a boolean array index is mixed with another
boolean or
integer array, boolean array is converted to integer array indices
(equivalent
to ``np.nonzero()``) and then broadcast. For example, indexing a 2D array of
size ``(2, 2)`` like ``x[[True, False], [True, False]]`` produces a 1D
vector
with shape ``(1,)``, not a 2D sub-matrix with shape ``(1, 1)``.

Mixed indexing seems so tricky that it is tempting to say that it never
should
be used. However, it is not easy to avoid, because NumPy implicitly adds
full
slices if there are fewer indices than the full dimensionality of the
indexed
array. This means that indexing a 2D array like `x[[0, 1]]`` is equivalent
to
``x[[0, 1], :]``. These cases are not surprising, but they constrain the
behavior of mixed indexing.

Indexing in other Python array libraries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Indexing is a useful and widely recognized mechanism for accessing
multi-dimensional array data, so it is no surprise that many other
libraries in
the scientific Python ecosystem also support array indexing.

Unfortunately, the full complexity of NumPy's indexing rules mean that it is
both challenging and undesirable for other libraries to copy its behavior
in all
of its nuance. The only full implementation of NumPy-style indexing is NumPy
itself. This includes projects like dask.array and h5py, which support
*most*
types of array indexing in some form, and otherwise attempt to copy NumPy's
API
exactly.

Vectorized indexing in particular can be challenging to implement with array
storage backends not based on NumPy. In contrast, indexing by 1D arrays
along
at least one dimension in the style of outer indexing is much more
acheivable.
This has led many libraries (including dask and h5py) to attempt to define a
safe subset of NumPy-style indexing that is equivalent to outer indexing,
e.g.,
by only allowing indexing with an array along at most one dimension.
However,
this is quite challenging to do correctly in a general enough way to be
useful.
For example, the current versions of dask and h5py both handle mixed
indexing
in case 3 above inconsistently with NumPy. This is quite likely to lead to
bugs.

These inconsistencies, in addition to the broader challenge of implementing
every type of indexing logic, make it challenging to write high-level array
libraries like xarray or dask.array that can interchangeably index many
types of
array storage. In contrast, explicit APIs for outer and vectorized indexing
in
NumPy would provide a model that external libraries could reliably emulate,
even
if they don't support every type of indexing.

High level changes
------------------

Inspired by multiple "indexer" attributes for controlling different types
of indexing behavior in pandas, we propose to:

1. Introduce ``arr.oindex[indices]`` which allows array indices, but
   uses outer indexing logic.
2. Introduce ``arr.vindex[indices]`` which use the current
   "vectorized"/broadcasted logic but with two differences from
   legacy indexing:

   * Boolean indices are not supported. All indices must be integers,
     integer arrays or slices.
   * The integer index result dimensions are always the first axes
     of the result array. No transpose is done, even for a single
     integer array index.

3. Plain indexing on arrays will start to give warnings and eventually
   errors in cases where one of the explicit indexers should be preferred:

   * First, in all cases where legacy and outer indexing would give
     different results.
   * Later, potentially in all cases involving an integer array.

These constraints are sufficient for making indexing generally consistent
with expectations and providing a less surprising learning curve with
``oindex``.

Note that all things mentioned here apply both for assignment as well as
subscription.

Understanding these details is *not* easy. The `Examples` section in the
discussion gives code examples.
And the hopefully easier `Motivational Example` provides some
motivational use-cases for the general ideas and is likely a good start for
anyone not intimately familiar with advanced indexing.


Detailed Description
--------------------

Proposed rules
~~~~~~~~~~~~~~

>From the three problems noted above some expectations for NumPy can
be deduced:

1. There should be a prominent outer/orthogonal indexing method such as
   ``arr.oindex[indices]``.

2. Considering how confusing vectorized/fancy indexing can be, it should
   be possible to be made more explicitly (e.g. ``arr.vindex[indices]``).

3. A new ``arr.vindex[indices]`` method, would not be tied to the
   confusing transpose rules of fancy indexing, which is for example
   needed for the simple case of a single advanced index. Thus,
   no transposing should be done. The axes created by the integer array
   indices are always inserted at the front, even for a single index.

4. Boolean indexing is conceptionally outer indexing. Broadcasting
   together with other advanced indices in the manner of legacy
   indexing is generally not helpful or well defined.
   A user who wishes the "``nonzero``" plus broadcast behaviour can thus
   be expected to do this manually. Thus, ``vindex`` does not need to
   support boolean index arrays.

5. An ``arr.legacy_index`` attribute should be implemented to support
   legacy indexing. This gives a simple way to update existing codebases
   using legacy indexing, which will make the deprecation of plain indexing
   behavior easier. The longer name ``legacy_index`` is intentionally chosen
   to be explicit and discourage its use in new code.

6. Plain indexing ``arr[...]`` should return an error for ambiguous cases.
   For the beginning, this probably means cases where ``arr[ind]`` and
   ``arr.oindex[ind]`` return different results give deprecation warnings.
   This includes every use of vectorized indexing with multiple integer
arrays.
   Due to the transposing behaviour, this means that``arr[0, :, index_arr]``
   will be deprecated, but ``arr[:, 0, index_arr]`` will not for the time
being.

7. To ensure that existing subclasses of `ndarray` that override indexing
   do not inadvertently revert to default behavior for indexing attributes,
   these attribute should have explicit checks that disable them if
   ``__getitem__`` or ``__setitem__`` has been overriden.

Unlike plain indexing, the new indexing attributes are explicitly aimed
at higher dimensional indexing, several additional changes should be
implemented:

* The indexing attributes will enforce exact dimension and indexing match.
  This means that no implicit ellipsis (``...``) will be added. Unless
  an ellipsis is present the indexing expression will thus only work for
  an array with a specific number of dimensions.
  This makes the expression more explicit and safeguards against wrong
  dimensionality of arrays.
  There should be no implications for "duck typing" compatibility with
  builtin Python sequences, because Python sequences only support a limited
  form of "basic indexing" with integers and slices.

* The current plain indexing allows for the use of non-tuples for
  multi-dimensional indexing such as ``arr[[slice(None), 2]]``.
  This creates some inconsistencies and thus the indexing attributes
  should only allow plain python tuples for this purpose.
  (Whether or not this should be the case for plain indexing is a
  different issue.)

* The new attributes should not use getitem to implement setitem,
  since it is a cludge and not useful for vectorized
  indexing. (not implemented yet)


Open Questions
~~~~~~~~~~~~~~

* The names ``oindex``, ``vindex`` and ``legacy_index`` are just
suggestions at
  the time of writing this, another name NumPy has used for something like
  ``oindex`` is ``np.ix_``. See also below.

* ``oindex`` and ``vindex`` could always return copies, even when no array
  operation occurs. One argument for allowing a view return is that this way
  ``oindex`` can be used as a general index replacement.
  However, there is one argument for returning copies. It is possible for
  ``arr.vindex[array_scalar, ...]``, where ``array_scalar`` should be
  a 0-D array but is not, since 0-D arrays tend to be converted.
  Copying always "fixes" this possible inconsistency.

* The final state to morph plain indexing in is not fixed in this PEP.
  It is for example possible that `arr[index]`` will be equivalent to
  ``arr.oindex`` at some point in the future.
  Since such a change will take years, it seems unnecessary to make
  specific decisions at this time.

* The proposed changes to plain indexing could be postponed indefinitely or
  not taken in order to not break or force major fixes to existing code
bases.


Alternative Names
~~~~~~~~~~~~~~~~~

Possible names suggested (more suggestions will be added).

==============  ============ ========
**Orthogonal**  oindex       oix
**Vectorized**  vindex       vix
**Legacy**      legacy_index l/findex
==============  ============ ========


Subclasses
~~~~~~~~~~

Subclasses are a bit problematic in the light of these changes. There are
some possible solutions for this. For most subclasses (those which do not
provide ``__getitem__`` or ``__setitem__``) the special attributes should
just work. Subclasses that *do* provide it must be updated accordingly
and should preferably not subclass working versions of these attributes.

All subclasses will inherit the attributes, however, the implementation
of ``__getitem__`` on these attributes should test
``subclass.__getitem__ is ndarray.__getitem__``. If not, the
subclass has special handling for indexing and ``NotImplementedError``
should be raised, requiring that the indexing attributes is also explicitly
overwritten. Likewise, implementations of ``__setitem__`` should check to
see
if ``__setitem__`` is overriden.

A further question is how to facilitate implementing the special attributes.
Also there is the weird functionality where ``__setitem__`` calls
``__getitem__`` for non-advanced indices. It might be good to avoid it for
the new attributes, but on the other hand, that may make it even more
confusing.

To facilitate implementations we could provide functions similar to
``operator.itemgetter`` and ``operator.setitem`` for the attributes.
Possibly a mixin could be provided to help implementation. These
improvements
are not essential to the initial implementation, so they are saved for
future work.

Implementation
--------------

Implementation would start with writing special indexing objects available
through ``arr.oindex``, ``arr.vindex``, and ``arr.legacy_index`` to allow
these
indexing operations. Also, we would need to start to deprecate those plain
index
operations which are not ambiguous.
Furthermore, the NumPy code base will need to use the new attributes and
tests will have to be adapted.


Backward compatibility
----------------------

As a new feature, no backward compatibility issues with the new ``vindex``
and ``oindex`` attributes would arise. To facilitate backwards compatibility
as much as possible, we expect a long deprecation cycle for legacy indexing
behavior and propose the new ``legacy_index`` attribute.
Some forward compatibility issues with subclasses that do not specifically
implement the new methods may arise.


Alternatives
------------

NumPy may not choose to offer these different type of indexing methods, or
choose to only offer them through specific functions instead of the proposed
notation above.

We don't think that new functions are a good alternative, because indexing
notation ``[]`` offer some syntactic advantages in Python (i.e., direct
creation of slice objects) compared to functions.

A more reasonable alternative would be write new wrapper objects for
alternative
indexing with functions rather than methods (e.g.,
``np.oindex(arr)[indices]``
instead of ``arr.oindex[indices]``). Functionally, this would be equivalent,
but indexing is such a common operation that we think it is important to
minimize syntax and worth implementing it directly on `ndarray` objects
themselves. Indexing attributes also define a clear interface that is easier
for alternative array implementations to copy, nonwithstanding ongoing
efforts to make it easier to override NumPy functions [2]_.

Discussion
----------

The original discussion about vectorized vs outer/orthogonal indexing arose
on the NumPy mailing list:

 * https://mail.python.org/pipermail/numpy-discussion/2015-April/072550.html

Some discussion can be found on the original pull request for this NEP:

 * https://github.com/numpy/numpy/pull/6256

Python implementations of the indexing operations can be found at:

 * https://github.com/numpy/numpy/pull/5749
 * https://gist.github.com/shoyer/c700193625347eb68fee4d1f0dc8c0c8


Examples
~~~~~~~~

Since the various kinds of indexing is hard to grasp in many cases, these
examples hopefully give some more insights. Note that they are all in terms
of shape.
In the examples, all original dimensions have 5 or more elements,
advanced indexing inserts smaller dimensions.
These examples may be hard to grasp without working knowledge of advanced
indexing as of NumPy 1.9.

Example array::

    >>> arr = np.ones((5, 6, 7, 8))


Legacy fancy indexing
---------------------

Note that the same result can be achieved with ``arr.legacy_index``, but the
"future error" will still work in this case.

Single index is transposed (this is the same for all indexing types)::

    >>> arr[[0], ...].shape
    (1, 6, 7, 8)
    >>> arr[:, [0], ...].shape
    (5, 1, 7, 8)


Multiple indices are transposed *if* consecutive::

    >>> arr[:, [0], [0], :].shape  # future error
    (5, 1, 8)
    >>> arr[:, [0], :, [0]].shape  # future error
    (1, 5, 7)


It is important to note that a scalar *is* integer array index in this sense
(and gets broadcasted with the other advanced index)::

    >>> arr[:, [0], 0, :].shape
    (5, 1, 8)
    >>> arr[:, [0], :, 0].shape  # future error (scalar is "fancy")
    (1, 5, 7)


Single boolean index can act on multiple dimensions (especially the whole
array). It has to match (as of 1.10. a deprecation warning) the dimensions.
The boolean index is otherwise identical to (multiple consecutive) integer
array indices::

    >>> # Create boolean index with one True value for the last two
dimensions:
    >>> bindx = np.zeros((7, 8), dtype=np.bool_)
    >>> bindx[0, 0] = True
    >>> arr[:, 0, bindx].shape
    (5, 1)
    >>> arr[0, :, bindx].shape
    (1, 6)


The combination with anything that is not a scalar is confusing, e.g.::

    >>> arr[[0], :, bindx].shape  # bindx result broadcasts with [0]
    (1, 6)
    >>> arr[:, [0, 1], bindx].shape  # IndexError


Outer indexing
--------------

Multiple indices are "orthogonal" and their result axes are inserted
at the same place (they are not broadcasted)::

    >>> arr.oindex[:, [0], [0, 1], :].shape
    (5, 1, 2, 8)
    >>> arr.oindex[:, [0], :, [0, 1]].shape
    (5, 1, 7, 2)
    >>> arr.oindex[:, [0], 0, :].shape
    (5, 1, 8)
    >>> arr.oindex[:, [0], :, 0].shape
    (5, 1, 7)


Boolean indices results are always inserted where the index is::

    >>> # Create boolean index with one True value for the last two
dimensions:
    >>> bindx = np.zeros((7, 8), dtype=np.bool_)
    >>> bindx[0, 0] = True
    >>> arr.oindex[:, 0, bindx].shape
    (5, 1)
    >>> arr.oindex[0, :, bindx].shape
    (6, 1)


Nothing changed in the presence of other advanced indices since::

    >>> arr.oindex[[0], :, bindx].shape
    (1, 6, 1)
    >>> arr.oindex[:, [0, 1], bindx].shape
    (5, 2, 1)


Vectorized/inner indexing
-------------------------

Multiple indices are broadcasted and iterated as one like fancy indexing,
but the new axes area always inserted at the front::

    >>> arr.vindex[:, [0], [0, 1], :].shape
    (2, 5, 8)
    >>> arr.vindex[:, [0], :, [0, 1]].shape
    (2, 5, 7)
    >>> arr.vindex[:, [0], 0, :].shape
    (1, 5, 8)
    >>> arr.vindex[:, [0], :, 0].shape
    (1, 5, 7)


Boolean indices results are always inserted where the index is, exactly
as in ``oindex`` given how specific they are to the axes they operate on::

    >>> # Create boolean index with one True value for the last two
dimensions:
    >>> bindx = np.zeros((7, 8), dtype=np.bool_)
    >>> bindx[0, 0] = True
    >>> arr.vindex[:, 0, bindx].shape
    (5, 1)
    >>> arr.vindex[0, :, bindx].shape
    (6, 1)


But other advanced indices are again transposed to the front::

    >>> arr.vindex[[0], :, bindx].shape
    (1, 6, 1)
    >>> arr.vindex[:, [0, 1], bindx].shape
    (2, 5, 1)


Motivational Example
~~~~~~~~~~~~~~~~~~~~

Imagine having a data acquisition software storing ``D`` channels and
``N`` datapoints along the time. She stores this into an ``(N, D)`` shaped
array. During data analysis, we needs to fetch a pool of channels, for
example
to calculate a mean over them.

This data can be faked using::

    >>> arr = np.random.random((100, 10))

Now one may remember indexing with an integer array and find the correct
code::

    >>> group = arr[:, [2, 5]]
    >>> mean_value = arr.mean()

However, assume that there were some specific time points (first dimension
of the data) that need to be specially considered. These time points are
already known and given by::

    >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp)

Now to fetch them, we may try to modify the previous code::

    >>> group_at_it = arr[interesting_times, [2, 5]]
    IndexError: Ambiguous index, use `.oindex` or `.vindex`

An error such as this will point to read up the indexing documentation.
This should make it clear, that ``oindex`` behaves more like slicing.
So, out of the different methods it is the obvious choice
(for now, this is a shape mismatch, but that could possibly also mention
``oindex``)::

    >>> group_at_it = arr.oindex[interesting_times, [2, 5]]

Now of course one could also have used ``vindex``, but it is much less
obvious how to achieve the right thing!::

    >>> reshaped_times = interesting_times[:, np.newaxis]
    >>> group_at_it = arr.vindex[reshaped_times, [2, 5]]


One may find, that for example our data is corrupt in some places.
So, we need to replace these values by zero (or anything else) for these
times. The first column may for example give the necessary information,
so that changing the values becomes easy remembering boolean indexing::

    >>> bad_data = arr[:, 0] > 0.5
    >>> arr[bad_data, :] = 0  # (corrupts further examples)

Again, however, the columns may need to be handled more individually (but in
groups), and the ``oindex`` attribute works well::

    >>> arr.oindex[bad_data, [2, 5]] = 0

Note that it would be very hard to do this using legacy fancy indexing.
The only way would be to create an integer array first::

    >>> bad_data_indx = np.nonzero(bad_data)[0]
    >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis]
    >>> arr[bad_data_indx_reshaped, [2, 5]]

In any case we can use only ``oindex`` to do all of this without getting
into any trouble or confused by the whole complexity of advanced indexing.

But, some new features are added to the data acquisition. Different sensors
have to be used depending on the times. Let us assume we already have
created an array of indices::

    >>> correct_sensors = np.random.randint(10, size=(100, 2))

Which lists for each time the two correct sensors in an ``(N, 2)`` array.

A first try to achieve this may be ``arr[:, correct_sensors]`` and this does
not work. It should be clear quickly that slicing cannot achieve the desired
thing. But hopefully users will remember that there is ``vindex`` as a more
powerful and flexible approach to advanced indexing.
One may, if trying ``vindex`` randomly, be confused about::

    >>> new_arr = arr.vindex[:, correct_sensors]

which is neither the same, nor the correct result (see transposing rules)!
This is because slicing works still the same in ``vindex``. However, reading
the documentation and examples, one can hopefully quickly find the desired
solution::

    >>> rows = np.arange(len(arr))
    >>> rows = rows[:, np.newaxis]  # make shape fit with correct_sensors
    >>> new_arr = arr.vindex[rows, correct_sensors]

At this point we have left the straight forward world of ``oindex`` but can
do random picking of any element from the array. Note that in the last
example
a method such as mentioned in the ``Related Questions`` section could be
more
straight forward. But this approach is even more flexible, since ``rows``
does not have to be a simple ``arange``, but could be ``intersting_times``::

    >>> interesting_times = np.array([0, 4, 8, 9, 10])
    >>> correct_sensors_at_it = correct_sensors[interesting_times, :]
    >>> interesting_times_reshaped = interesting_times[:, np.newaxis]
    >>> new_arr_it = arr[interesting_times_reshaped, correct_sensors_at_it]

Truly complex situation would arise now if you would for example pool ``L``
experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this
should
not result into surprises. ``vindex``, being more powerful, will quite
certainly create some confusion in this case but also cover pretty much all
eventualities.


Copyright
---------

This document is placed under the CC0 1.0 Universell (CC0 1.0) Public
Domain Dedication [1]_.


References and Footnotes
------------------------

.. [1] To the extent possible under law, the person who associated CC0
   with this work has waived all copyright and related or neighboring
   rights to this work. The CC0 license may be found at
   https://creativecommons.org/publicdomain/zero/1.0/
.. [2] e.g., see NEP 18,
   http://www.numpy.org/neps/nep-0018-array-function-protocol.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180625/3fbf25aa/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Mon Jun 25 23:06:42 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Mon, 25 Jun 2018 20:06:42 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
Message-ID: <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>

Generally +1 on this, but I don?t think we need

To ensure that existing subclasses of ndarray that override indexing
do not inadvertently revert to default behavior for indexing attributes,
these attribute should have explicit checks that disable them if
__getitem__ or __setitem__ has been overriden.

Repeating my proposal from github, I think we should introduce some
internal indexing objects - something simple like:

# np.core.*class Indexer(object):  # importantly not iterable
    def __init__(self, value):
        self.value = valueclass OrthogonalIndexer(Indexer): passclass
VectorizedIndexer(Indexer): pass

Keeping the proposed syntax, we?d implement:

   - arr.oindex[ind] as arr[np.core.OrthogonalIndexer(ind)]
   - arr.vindex[ind] as arr[np.core.VectorizedIndexer(ind)]

This means that subclasses like the following

class LoggingIndexer(np.ndarray):
    def __getitem__(self, ind):
        ret = super().__getitem__(ind)
        print("Got an index")
        return ret

will continue to work without issues. This includes np.ma.MaskedArray and
np.memmap, so this already has value internally.

For classes like np.matrix which inspect the index object itself, an error
will still be raised from __getitem__, since it looks nothing like the
values normally passed - most likely of the form

TypeError: 'numpy.core.VectorizedIndexer' object does not support indexing
TypeError: 'numpy.core.VectorizedIndexer' object is not iterable

This could potentially be caught in oindex.__getitem__ and converted into a
more useful error message.

So to summarize the benefits of the above tweaks:

   - Pass-through subclasses get the new behavior for free
   - No additional descriptor helpers are needed to let non-passthrough
   subclasses implement the new indexable attributes - only a change to
   __getitem__ is needed

And the costs:

   - A less clear error message when new indexing is used on old types (can
   chain with a more useful exception on python 3)
   - Class construction overhead for indexing via the attributes (skippable
   for base ndarray if significant)

Eric
?

On Mon, 25 Jun 2018 at 14:30 Stephan Hoyer <shoyer at gmail.com> wrote:

> Sebastian and I have revised a Numpy Enhancement Proposal that he started
> three years ago for overhauling NumPy's advanced indexing. We'd now like to
> present it for official consideration.
>
> Minor inline comments (e.g., typos) can be added to the latest pull
> request (https://github.com/numpy/numpy/pull/11414/files), but otherwise
> let's keep discussion on the mailing list. The NumPy website should update
> shortly with a rendered version (
> http://www.numpy.org/neps/nep-0021-advanced-indexing.html), but until
> then please see the full text below.
>
> Cheers,
> Stephan
>
> =========================================
> Simplified and explicit advanced indexing
> =========================================
>
> :Author: Sebastian Berg
> :Author: Stephan Hoyer <shoyer at google.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2015-08-27
>
>
> Abstract
> --------
>
> NumPy's "advanced" indexing support for indexing arrays with other arrays
> is
> one of its most powerful and popular features. Unfortunately, the existing
> rules for advanced indexing with multiple array indices are typically
> confusing
> to both new, and in many cases even old, users of NumPy. Here we propose an
> overhaul and simplification of advanced indexing, including two new
> "indexer"
> attributes ``oindex`` and ``vindex`` to facilitate explicit indexing.
>
> Background
> ----------
>
> Existing indexing operations
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NumPy arrays currently support a flexible range of indexing operations:
>
> - "Basic" indexing involving only slices, integers, ``np.newaxis`` and
> ellipsis
>   (``...``), e.g., ``x[0, :3, np.newaxis]`` for selecting the first element
>   from the 0th axis, the first three elements from the 1st axis and
> inserting a
>   new axis of size 1 at the end. Basic indexing always return a view of the
>   indexed array's data.
> - "Advanced" indexing, also called "fancy" indexing, includes all cases
> where
>   arrays are indexed by other arrays. Advanced indexing always makes a
> copy:
>
>   - "Boolean" indexing by boolean arrays, e.g., ``x[x > 0]`` for
>     selecting positive elements.
>   - "Vectorized" indexing by one or more integer arrays, e.g., ``x[[0,
> 1]]``
>     for selecting the first two elements along the first axis. With
> multiple
>     arrays, vectorized indexing uses broadcasting rules to combine indices
> along
>     multiple dimensions. This allows for producing a result of arbitrary
> shape
>     with arbitrary elements from the original arrays.
>   - "Mixed" indexing involving any combinations of the other advancing
> types.
>     This is no more powerful than vectorized indexing, but is sometimes
> more
>     convenient.
>
> For clarity, we will refer to these existing rules as "legacy indexing".
> This is only a high-level summary; for more details, see NumPy's
> documentation
> and and `Examples` below.
>
> Outer indexing
> ~~~~~~~~~~~~~~
>
> One broadly useful class of indexing operations is not supported:
>
> - "Outer" or orthogonal indexing treats one-dimensional arrays
> equivalently to
>   slices for determining output shapes. The rule for outer indexing is
> that the
>   result should be equivalent to independently indexing along each
> dimension
>   with integer or boolean arrays as if both the indexed and indexing arrays
>   were one-dimensional. This form of indexing is familiar to many users of
> other
>   programming languages such as MATLAB, Fortran and R.
>
> The reason why NumPy omits support for outer indexing is that the rules for
> outer and vectorized conflict. Consider indexing a 2D array by two 1D
> integer
> arrays, e.g., ``x[[0, 1], [0, 1]]``:
>
> - Outer indexing is equivalent to combining multiple integer indices with
>   ``itertools.product()``. The result in this case is another 2D array with
>   all combinations of indexed elements, e.g.,
>   ``np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])``
> - Vectorized indexing is equivalent to combining multiple integer indices
> with
>   ``zip()``. The result in this case is a 1D array containing the diagonal
>   elements, e.g., ``np.array([x[0, 0], x[1, 1]])``.
>
> This difference is a frequent stumbling block for new NumPy users. The
> outer
> indexing model is easier to understand, and is a natural generalization of
> slicing rules. But NumPy instead chose to support vectorized indexing,
> because
> it is strictly more powerful.
>
> It is always possible to emulate outer indexing by vectorized indexing with
> the right indices. To make this easier, NumPy includes utility objects and
> functions such as ``np.ogrid`` and ``np.ix_``, e.g.,
> ``x[np.ix_([0, 1], [0, 1])]``. However, there are no utilities for
> emulating
> fully general/mixed outer indexing, which could unambiguously allow for
> slices,
> integers, and 1D boolean and integer arrays.
>
> Mixed indexing
> ~~~~~~~~~~~~~~
>
> NumPy's existing rules for combining multiple types of indexing in the same
> operation are quite complex, involving a number of edge cases.
>
> One reason why mixed indexing is particularly confusing is that at first
> glance
> the result works deceptively like outer indexing. Returning to our example
> of a
> 2D array, both ``x[:2, [0, 1]]`` and ``x[[0, 1], :2]`` return 2D arrays
> with
> axes in the same order as the original array.
>
> However, as soon as two or more non-slice objects (including integers) are
> introduced, vectorized indexing rules apply. The axes introduced by the
> array
> indices are at the front, unless all array indices are consecutive, in
> which
> case NumPy deduces where the user "expects" them to be. Consider indexing
> a 3D
> array ``arr`` with shape ``(X, Y, Z)``:
>
> 1. ``arr[:, [0, 1], 0]`` has shape ``(X, 2)``.
> 2. ``arr[[0, 1], 0, :]`` has shape ``(2, Z)``.
> 3. ``arr[0, :, [0, 1]]`` has shape ``(2, Y)``, not ``(Y, 2)``!
>
> These first two cases are intuitive and consistent with outer indexing, but
> this last case is quite surprising, even to many higly experienced NumPy
> users.
>
> Mixed cases involving multiple array indices are also surprising, and only
> less problematic because the current behavior is so useless that it is
> rarely
> encountered in practice. When a boolean array index is mixed with another
> boolean or
> integer array, boolean array is converted to integer array indices
> (equivalent
> to ``np.nonzero()``) and then broadcast. For example, indexing a 2D array
> of
> size ``(2, 2)`` like ``x[[True, False], [True, False]]`` produces a 1D
> vector
> with shape ``(1,)``, not a 2D sub-matrix with shape ``(1, 1)``.
>
> Mixed indexing seems so tricky that it is tempting to say that it never
> should
> be used. However, it is not easy to avoid, because NumPy implicitly adds
> full
> slices if there are fewer indices than the full dimensionality of the
> indexed
> array. This means that indexing a 2D array like `x[[0, 1]]`` is equivalent
> to
> ``x[[0, 1], :]``. These cases are not surprising, but they constrain the
> behavior of mixed indexing.
>
> Indexing in other Python array libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Indexing is a useful and widely recognized mechanism for accessing
> multi-dimensional array data, so it is no surprise that many other
> libraries in
> the scientific Python ecosystem also support array indexing.
>
> Unfortunately, the full complexity of NumPy's indexing rules mean that it
> is
> both challenging and undesirable for other libraries to copy its behavior
> in all
> of its nuance. The only full implementation of NumPy-style indexing is
> NumPy
> itself. This includes projects like dask.array and h5py, which support
> *most*
> types of array indexing in some form, and otherwise attempt to copy
> NumPy's API
> exactly.
>
> Vectorized indexing in particular can be challenging to implement with
> array
> storage backends not based on NumPy. In contrast, indexing by 1D arrays
> along
> at least one dimension in the style of outer indexing is much more
> acheivable.
> This has led many libraries (including dask and h5py) to attempt to define
> a
> safe subset of NumPy-style indexing that is equivalent to outer indexing,
> e.g.,
> by only allowing indexing with an array along at most one dimension.
> However,
> this is quite challenging to do correctly in a general enough way to be
> useful.
> For example, the current versions of dask and h5py both handle mixed
> indexing
> in case 3 above inconsistently with NumPy. This is quite likely to lead to
> bugs.
>
> These inconsistencies, in addition to the broader challenge of implementing
> every type of indexing logic, make it challenging to write high-level array
> libraries like xarray or dask.array that can interchangeably index many
> types of
> array storage. In contrast, explicit APIs for outer and vectorized
> indexing in
> NumPy would provide a model that external libraries could reliably
> emulate, even
> if they don't support every type of indexing.
>
> High level changes
> ------------------
>
> Inspired by multiple "indexer" attributes for controlling different types
> of indexing behavior in pandas, we propose to:
>
> 1. Introduce ``arr.oindex[indices]`` which allows array indices, but
>    uses outer indexing logic.
> 2. Introduce ``arr.vindex[indices]`` which use the current
>    "vectorized"/broadcasted logic but with two differences from
>    legacy indexing:
>
>    * Boolean indices are not supported. All indices must be integers,
>      integer arrays or slices.
>    * The integer index result dimensions are always the first axes
>      of the result array. No transpose is done, even for a single
>      integer array index.
>
> 3. Plain indexing on arrays will start to give warnings and eventually
>    errors in cases where one of the explicit indexers should be preferred:
>
>    * First, in all cases where legacy and outer indexing would give
>      different results.
>    * Later, potentially in all cases involving an integer array.
>
> These constraints are sufficient for making indexing generally consistent
> with expectations and providing a less surprising learning curve with
> ``oindex``.
>
> Note that all things mentioned here apply both for assignment as well as
> subscription.
>
> Understanding these details is *not* easy. The `Examples` section in the
> discussion gives code examples.
> And the hopefully easier `Motivational Example` provides some
> motivational use-cases for the general ideas and is likely a good start for
> anyone not intimately familiar with advanced indexing.
>
>
> Detailed Description
> --------------------
>
> Proposed rules
> ~~~~~~~~~~~~~~
>
> From the three problems noted above some expectations for NumPy can
> be deduced:
>
> 1. There should be a prominent outer/orthogonal indexing method such as
>    ``arr.oindex[indices]``.
>
> 2. Considering how confusing vectorized/fancy indexing can be, it should
>    be possible to be made more explicitly (e.g. ``arr.vindex[indices]``).
>
> 3. A new ``arr.vindex[indices]`` method, would not be tied to the
>    confusing transpose rules of fancy indexing, which is for example
>    needed for the simple case of a single advanced index. Thus,
>    no transposing should be done. The axes created by the integer array
>    indices are always inserted at the front, even for a single index.
>
> 4. Boolean indexing is conceptionally outer indexing. Broadcasting
>    together with other advanced indices in the manner of legacy
>    indexing is generally not helpful or well defined.
>    A user who wishes the "``nonzero``" plus broadcast behaviour can thus
>    be expected to do this manually. Thus, ``vindex`` does not need to
>    support boolean index arrays.
>
> 5. An ``arr.legacy_index`` attribute should be implemented to support
>    legacy indexing. This gives a simple way to update existing codebases
>    using legacy indexing, which will make the deprecation of plain indexing
>    behavior easier. The longer name ``legacy_index`` is intentionally
> chosen
>    to be explicit and discourage its use in new code.
>
> 6. Plain indexing ``arr[...]`` should return an error for ambiguous cases.
>    For the beginning, this probably means cases where ``arr[ind]`` and
>    ``arr.oindex[ind]`` return different results give deprecation warnings.
>    This includes every use of vectorized indexing with multiple integer
> arrays.
>    Due to the transposing behaviour, this means that``arr[0, :,
> index_arr]``
>    will be deprecated, but ``arr[:, 0, index_arr]`` will not for the time
> being.
>
> 7. To ensure that existing subclasses of `ndarray` that override indexing
>    do not inadvertently revert to default behavior for indexing attributes,
>    these attribute should have explicit checks that disable them if
>    ``__getitem__`` or ``__setitem__`` has been overriden.
>
> Unlike plain indexing, the new indexing attributes are explicitly aimed
> at higher dimensional indexing, several additional changes should be
> implemented:
>
> * The indexing attributes will enforce exact dimension and indexing match.
>   This means that no implicit ellipsis (``...``) will be added. Unless
>   an ellipsis is present the indexing expression will thus only work for
>   an array with a specific number of dimensions.
>   This makes the expression more explicit and safeguards against wrong
>   dimensionality of arrays.
>   There should be no implications for "duck typing" compatibility with
>   builtin Python sequences, because Python sequences only support a limited
>   form of "basic indexing" with integers and slices.
>
> * The current plain indexing allows for the use of non-tuples for
>   multi-dimensional indexing such as ``arr[[slice(None), 2]]``.
>   This creates some inconsistencies and thus the indexing attributes
>   should only allow plain python tuples for this purpose.
>   (Whether or not this should be the case for plain indexing is a
>   different issue.)
>
> * The new attributes should not use getitem to implement setitem,
>   since it is a cludge and not useful for vectorized
>   indexing. (not implemented yet)
>
>
> Open Questions
> ~~~~~~~~~~~~~~
>
> * The names ``oindex``, ``vindex`` and ``legacy_index`` are just
> suggestions at
>   the time of writing this, another name NumPy has used for something like
>   ``oindex`` is ``np.ix_``. See also below.
>
> * ``oindex`` and ``vindex`` could always return copies, even when no array
>   operation occurs. One argument for allowing a view return is that this
> way
>   ``oindex`` can be used as a general index replacement.
>   However, there is one argument for returning copies. It is possible for
>   ``arr.vindex[array_scalar, ...]``, where ``array_scalar`` should be
>   a 0-D array but is not, since 0-D arrays tend to be converted.
>   Copying always "fixes" this possible inconsistency.
>
> * The final state to morph plain indexing in is not fixed in this PEP.
>   It is for example possible that `arr[index]`` will be equivalent to
>   ``arr.oindex`` at some point in the future.
>   Since such a change will take years, it seems unnecessary to make
>   specific decisions at this time.
>
> * The proposed changes to plain indexing could be postponed indefinitely or
>   not taken in order to not break or force major fixes to existing code
> bases.
>
>
> Alternative Names
> ~~~~~~~~~~~~~~~~~
>
> Possible names suggested (more suggestions will be added).
>
> ==============  ============ ========
> **Orthogonal**  oindex       oix
> **Vectorized**  vindex       vix
> **Legacy**      legacy_index l/findex
> ==============  ============ ========
>
>
> Subclasses
> ~~~~~~~~~~
>
> Subclasses are a bit problematic in the light of these changes. There are
> some possible solutions for this. For most subclasses (those which do not
> provide ``__getitem__`` or ``__setitem__``) the special attributes should
> just work. Subclasses that *do* provide it must be updated accordingly
> and should preferably not subclass working versions of these attributes.
>
> All subclasses will inherit the attributes, however, the implementation
> of ``__getitem__`` on these attributes should test
> ``subclass.__getitem__ is ndarray.__getitem__``. If not, the
> subclass has special handling for indexing and ``NotImplementedError``
> should be raised, requiring that the indexing attributes is also explicitly
> overwritten. Likewise, implementations of ``__setitem__`` should check to
> see
> if ``__setitem__`` is overriden.
>
> A further question is how to facilitate implementing the special
> attributes.
> Also there is the weird functionality where ``__setitem__`` calls
> ``__getitem__`` for non-advanced indices. It might be good to avoid it for
> the new attributes, but on the other hand, that may make it even more
> confusing.
>
> To facilitate implementations we could provide functions similar to
> ``operator.itemgetter`` and ``operator.setitem`` for the attributes.
> Possibly a mixin could be provided to help implementation. These
> improvements
> are not essential to the initial implementation, so they are saved for
> future work.
>
> Implementation
> --------------
>
> Implementation would start with writing special indexing objects available
> through ``arr.oindex``, ``arr.vindex``, and ``arr.legacy_index`` to allow
> these
> indexing operations. Also, we would need to start to deprecate those plain
> index
> operations which are not ambiguous.
> Furthermore, the NumPy code base will need to use the new attributes and
> tests will have to be adapted.
>
>
> Backward compatibility
> ----------------------
>
> As a new feature, no backward compatibility issues with the new ``vindex``
> and ``oindex`` attributes would arise. To facilitate backwards
> compatibility
> as much as possible, we expect a long deprecation cycle for legacy indexing
> behavior and propose the new ``legacy_index`` attribute.
> Some forward compatibility issues with subclasses that do not specifically
> implement the new methods may arise.
>
>
> Alternatives
> ------------
>
> NumPy may not choose to offer these different type of indexing methods, or
> choose to only offer them through specific functions instead of the
> proposed
> notation above.
>
> We don't think that new functions are a good alternative, because indexing
> notation ``[]`` offer some syntactic advantages in Python (i.e., direct
> creation of slice objects) compared to functions.
>
> A more reasonable alternative would be write new wrapper objects for
> alternative
> indexing with functions rather than methods (e.g.,
> ``np.oindex(arr)[indices]``
> instead of ``arr.oindex[indices]``). Functionally, this would be
> equivalent,
> but indexing is such a common operation that we think it is important to
> minimize syntax and worth implementing it directly on `ndarray` objects
> themselves. Indexing attributes also define a clear interface that is
> easier
> for alternative array implementations to copy, nonwithstanding ongoing
> efforts to make it easier to override NumPy functions [2]_.
>
> Discussion
> ----------
>
> The original discussion about vectorized vs outer/orthogonal indexing arose
> on the NumPy mailing list:
>
>  *
> https://mail.python.org/pipermail/numpy-discussion/2015-April/072550.html
>
> Some discussion can be found on the original pull request for this NEP:
>
>  * https://github.com/numpy/numpy/pull/6256
>
> Python implementations of the indexing operations can be found at:
>
>  * https://github.com/numpy/numpy/pull/5749
>  * https://gist.github.com/shoyer/c700193625347eb68fee4d1f0dc8c0c8
>
>
> Examples
> ~~~~~~~~
>
> Since the various kinds of indexing is hard to grasp in many cases, these
> examples hopefully give some more insights. Note that they are all in terms
> of shape.
> In the examples, all original dimensions have 5 or more elements,
> advanced indexing inserts smaller dimensions.
> These examples may be hard to grasp without working knowledge of advanced
> indexing as of NumPy 1.9.
>
> Example array::
>
>     >>> arr = np.ones((5, 6, 7, 8))
>
>
> Legacy fancy indexing
> ---------------------
>
> Note that the same result can be achieved with ``arr.legacy_index``, but
> the
> "future error" will still work in this case.
>
> Single index is transposed (this is the same for all indexing types)::
>
>     >>> arr[[0], ...].shape
>     (1, 6, 7, 8)
>     >>> arr[:, [0], ...].shape
>     (5, 1, 7, 8)
>
>
> Multiple indices are transposed *if* consecutive::
>
>     >>> arr[:, [0], [0], :].shape  # future error
>     (5, 1, 8)
>     >>> arr[:, [0], :, [0]].shape  # future error
>     (1, 5, 7)
>
>
> It is important to note that a scalar *is* integer array index in this
> sense
> (and gets broadcasted with the other advanced index)::
>
>     >>> arr[:, [0], 0, :].shape
>     (5, 1, 8)
>     >>> arr[:, [0], :, 0].shape  # future error (scalar is "fancy")
>     (1, 5, 7)
>
>
> Single boolean index can act on multiple dimensions (especially the whole
> array). It has to match (as of 1.10. a deprecation warning) the dimensions.
> The boolean index is otherwise identical to (multiple consecutive) integer
> array indices::
>
>     >>> # Create boolean index with one True value for the last two
> dimensions:
>     >>> bindx = np.zeros((7, 8), dtype=np.bool_)
>     >>> bindx[0, 0] = True
>     >>> arr[:, 0, bindx].shape
>     (5, 1)
>     >>> arr[0, :, bindx].shape
>     (1, 6)
>
>
> The combination with anything that is not a scalar is confusing, e.g.::
>
>     >>> arr[[0], :, bindx].shape  # bindx result broadcasts with [0]
>     (1, 6)
>     >>> arr[:, [0, 1], bindx].shape  # IndexError
>
>
> Outer indexing
> --------------
>
> Multiple indices are "orthogonal" and their result axes are inserted
> at the same place (they are not broadcasted)::
>
>     >>> arr.oindex[:, [0], [0, 1], :].shape
>     (5, 1, 2, 8)
>     >>> arr.oindex[:, [0], :, [0, 1]].shape
>     (5, 1, 7, 2)
>     >>> arr.oindex[:, [0], 0, :].shape
>     (5, 1, 8)
>     >>> arr.oindex[:, [0], :, 0].shape
>     (5, 1, 7)
>
>
> Boolean indices results are always inserted where the index is::
>
>     >>> # Create boolean index with one True value for the last two
> dimensions:
>     >>> bindx = np.zeros((7, 8), dtype=np.bool_)
>     >>> bindx[0, 0] = True
>     >>> arr.oindex[:, 0, bindx].shape
>     (5, 1)
>     >>> arr.oindex[0, :, bindx].shape
>     (6, 1)
>
>
> Nothing changed in the presence of other advanced indices since::
>
>     >>> arr.oindex[[0], :, bindx].shape
>     (1, 6, 1)
>     >>> arr.oindex[:, [0, 1], bindx].shape
>     (5, 2, 1)
>
>
> Vectorized/inner indexing
> -------------------------
>
> Multiple indices are broadcasted and iterated as one like fancy indexing,
> but the new axes area always inserted at the front::
>
>     >>> arr.vindex[:, [0], [0, 1], :].shape
>     (2, 5, 8)
>     >>> arr.vindex[:, [0], :, [0, 1]].shape
>     (2, 5, 7)
>     >>> arr.vindex[:, [0], 0, :].shape
>     (1, 5, 8)
>     >>> arr.vindex[:, [0], :, 0].shape
>     (1, 5, 7)
>
>
> Boolean indices results are always inserted where the index is, exactly
> as in ``oindex`` given how specific they are to the axes they operate on::
>
>     >>> # Create boolean index with one True value for the last two
> dimensions:
>     >>> bindx = np.zeros((7, 8), dtype=np.bool_)
>     >>> bindx[0, 0] = True
>     >>> arr.vindex[:, 0, bindx].shape
>     (5, 1)
>     >>> arr.vindex[0, :, bindx].shape
>     (6, 1)
>
>
> But other advanced indices are again transposed to the front::
>
>     >>> arr.vindex[[0], :, bindx].shape
>     (1, 6, 1)
>     >>> arr.vindex[:, [0, 1], bindx].shape
>     (2, 5, 1)
>
>
> Motivational Example
> ~~~~~~~~~~~~~~~~~~~~
>
> Imagine having a data acquisition software storing ``D`` channels and
> ``N`` datapoints along the time. She stores this into an ``(N, D)`` shaped
> array. During data analysis, we needs to fetch a pool of channels, for
> example
> to calculate a mean over them.
>
> This data can be faked using::
>
>     >>> arr = np.random.random((100, 10))
>
> Now one may remember indexing with an integer array and find the correct
> code::
>
>     >>> group = arr[:, [2, 5]]
>     >>> mean_value = arr.mean()
>
> However, assume that there were some specific time points (first dimension
> of the data) that need to be specially considered. These time points are
> already known and given by::
>
>     >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp)
>
> Now to fetch them, we may try to modify the previous code::
>
>     >>> group_at_it = arr[interesting_times, [2, 5]]
>     IndexError: Ambiguous index, use `.oindex` or `.vindex`
>
> An error such as this will point to read up the indexing documentation.
> This should make it clear, that ``oindex`` behaves more like slicing.
> So, out of the different methods it is the obvious choice
> (for now, this is a shape mismatch, but that could possibly also mention
> ``oindex``)::
>
>     >>> group_at_it = arr.oindex[interesting_times, [2, 5]]
>
> Now of course one could also have used ``vindex``, but it is much less
> obvious how to achieve the right thing!::
>
>     >>> reshaped_times = interesting_times[:, np.newaxis]
>     >>> group_at_it = arr.vindex[reshaped_times, [2, 5]]
>
>
> One may find, that for example our data is corrupt in some places.
> So, we need to replace these values by zero (or anything else) for these
> times. The first column may for example give the necessary information,
> so that changing the values becomes easy remembering boolean indexing::
>
>     >>> bad_data = arr[:, 0] > 0.5
>     >>> arr[bad_data, :] = 0  # (corrupts further examples)
>
> Again, however, the columns may need to be handled more individually (but
> in
> groups), and the ``oindex`` attribute works well::
>
>     >>> arr.oindex[bad_data, [2, 5]] = 0
>
> Note that it would be very hard to do this using legacy fancy indexing.
> The only way would be to create an integer array first::
>
>     >>> bad_data_indx = np.nonzero(bad_data)[0]
>     >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis]
>     >>> arr[bad_data_indx_reshaped, [2, 5]]
>
> In any case we can use only ``oindex`` to do all of this without getting
> into any trouble or confused by the whole complexity of advanced indexing.
>
> But, some new features are added to the data acquisition. Different sensors
> have to be used depending on the times. Let us assume we already have
> created an array of indices::
>
>     >>> correct_sensors = np.random.randint(10, size=(100, 2))
>
> Which lists for each time the two correct sensors in an ``(N, 2)`` array.
>
> A first try to achieve this may be ``arr[:, correct_sensors]`` and this
> does
> not work. It should be clear quickly that slicing cannot achieve the
> desired
> thing. But hopefully users will remember that there is ``vindex`` as a more
> powerful and flexible approach to advanced indexing.
> One may, if trying ``vindex`` randomly, be confused about::
>
>     >>> new_arr = arr.vindex[:, correct_sensors]
>
> which is neither the same, nor the correct result (see transposing rules)!
> This is because slicing works still the same in ``vindex``. However,
> reading
> the documentation and examples, one can hopefully quickly find the desired
> solution::
>
>     >>> rows = np.arange(len(arr))
>     >>> rows = rows[:, np.newaxis]  # make shape fit with correct_sensors
>     >>> new_arr = arr.vindex[rows, correct_sensors]
>
> At this point we have left the straight forward world of ``oindex`` but can
> do random picking of any element from the array. Note that in the last
> example
> a method such as mentioned in the ``Related Questions`` section could be
> more
> straight forward. But this approach is even more flexible, since ``rows``
> does not have to be a simple ``arange``, but could be
> ``intersting_times``::
>
>     >>> interesting_times = np.array([0, 4, 8, 9, 10])
>     >>> correct_sensors_at_it = correct_sensors[interesting_times, :]
>     >>> interesting_times_reshaped = interesting_times[:, np.newaxis]
>     >>> new_arr_it = arr[interesting_times_reshaped, correct_sensors_at_it]
>
> Truly complex situation would arise now if you would for example pool ``L``
> experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this
> should
> not result into surprises. ``vindex``, being more powerful, will quite
> certainly create some confusion in this case but also cover pretty much all
> eventualities.
>
>
> Copyright
> ---------
>
> This document is placed under the CC0 1.0 Universell (CC0 1.0) Public
> Domain Dedication [1]_.
>
>
> References and Footnotes
> ------------------------
>
> .. [1] To the extent possible under law, the person who associated CC0
>    with this work has waived all copyright and related or neighboring
>    rights to this work. The CC0 license may be found at
>    https://creativecommons.org/publicdomain/zero/1.0/
> .. [2] e.g., see NEP 18,
>    http://www.numpy.org/neps/nep-0018-array-function-protocol.html
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180625/9a6cf536/attachment-0001.html>

From jni.soma at gmail.com  Tue Jun 26 02:24:13 2018
From: jni.soma at gmail.com (Juan Nunez-Iglesias)
Date: Tue, 26 Jun 2018 16:24:13 +1000
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
Message-ID: <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>

> Plain indexing arr[...] should return an error for ambiguous cases.
> [...] This includes every use of vectorized indexing with multiple
> integer arrays.
This line concerns me. In scikit-image, we often do:

rr, cc = coords.T  # coords is an (n, 2) array of integer coordinates
values = image[rr, cc]

Are you saying that this use is deprecated? Because we love it at scikit-
image. I would be very very very sad to lose this syntax.
> The current plain indexing allows for the use of non-tuples for multi-
> dimensional indexing.
I believe this paragraph is itself deprecated? Didn't non-non-tuple
indexing just get deprecated with 1.15?

Other general comments:
- oindex in general seems very intuitive and I'm :+1:
- I would much prefer some extremely compact notation such as arr.ox[]
  and arr.vx.- Depending on the above concern I am either -1 or (-1/0) on the
  deprecation. Deprecating (all) old vindex behaviour doesn't seem to
  bring many benefits while potentially causing a lot of pain to
  downstream libraries.
Juan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/7d03ec69/attachment.html>

From andyfaff at gmail.com  Tue Jun 26 02:28:22 2018
From: andyfaff at gmail.com (Andrew Nelson)
Date: Tue, 26 Jun 2018 16:28:22 +1000
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
Message-ID: <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>

On Tue, 26 Jun 2018 at 16:24, Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> > Plain indexing arr[...] should return an error for ambiguous cases.
> [...] This includes every use of vectorized indexing with multiple integer
> arrays.
>
> This line concerns me. In scikit-image, we often do:
>
> rr, cc = coords.T  # coords is an (n, 2) array of integer coordinates
> values = image[rr, cc]
>
> Are you saying that this use is deprecated? Because we love it at
> scikit-image. I would be very very very sad to lose this syntax.
>

 I second Juan's sentiments wholeheartedly here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/1672564c/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 02:45:34 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 25 Jun 2018 23:45:34 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
Message-ID: <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>

On Mon, Jun 25, 2018 at 11:29 PM Andrew Nelson <andyfaff at gmail.com> wrote:

> On Tue, 26 Jun 2018 at 16:24, Juan Nunez-Iglesias <jni.soma at gmail.com>
> wrote:
>
>> > Plain indexing arr[...] should return an error for ambiguous cases.
>> [...] This includes every use of vectorized indexing with multiple integer
>> arrays.
>>
>> This line concerns me. In scikit-image, we often do:
>>
>> rr, cc = coords.T  # coords is an (n, 2) array of integer coordinates
>> values = image[rr, cc]
>>
>> Are you saying that this use is deprecated? Because we love it at
>> scikit-image. I would be very very very sad to lose this syntax.
>>
>
>  I second Juan's sentiments wholeheartedly here.
>

And thirded. This should not be considered deprecated or discouraged. As I
mentioned in the previous iteration of this discussion, this is the
behavior I want more often than the orthogonal indexing. It's a really
common way to work with images and other kinds of raster data, so I don't
think it should be relegated to the "officially discouraged" ghetto of
`.legacy_index`. It should not issue warnings or (eventual) errors. I would
reserve warnings for the cases where the current behavior is something no
one really wants, like mixing slices and integer arrays.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180625/8493bd20/attachment.html>

From wieser.eric+numpy at gmail.com  Tue Jun 26 03:11:57 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Tue, 26 Jun 2018 00:11:57 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
Message-ID: <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>

> I don't think it should be relegated to the "officially discouraged"
ghetto of `.legacy_index`

The way I read it, the new spelling lof that would be the explicit but not
discouraged `image.vindex[rr, cc]`.

> I would reserve warnings for the cases where the current behavior is
something no one really wants, like mixing slices and integer arrays.

These are the cases that would only be available under `legacy_index`.

Eric

On Mon, 25 Jun 2018 at 23:54 Robert Kern <robert.kern at gmail.com> wrote:

> On Mon, Jun 25, 2018 at 11:29 PM Andrew Nelson <andyfaff at gmail.com> wrote:
>
>> On Tue, 26 Jun 2018 at 16:24, Juan Nunez-Iglesias <jni.soma at gmail.com>
>> wrote:
>>
>>> > Plain indexing arr[...] should return an error for ambiguous cases.
>>> [...] This includes every use of vectorized indexing with multiple integer
>>> arrays.
>>>
>>> This line concerns me. In scikit-image, we often do:
>>>
>>> rr, cc = coords.T  # coords is an (n, 2) array of integer coordinates
>>> values = image[rr, cc]
>>>
>>> Are you saying that this use is deprecated? Because we love it at
>>> scikit-image. I would be very very very sad to lose this syntax.
>>>
>>
>>  I second Juan's sentiments wholeheartedly here.
>>
>
> And thirded. This should not be considered deprecated or discouraged. As I
> mentioned in the previous iteration of this discussion, this is the
> behavior I want more often than the orthogonal indexing. It's a really
> common way to work with images and other kinds of raster data, so I don't
> think it should be relegated to the "officially discouraged" ghetto of
> `.legacy_index`. It should not issue warnings or (eventual) errors. I would
> reserve warnings for the cases where the current behavior is something no
> one really wants, like mixing slices and integer arrays.
>
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/ac6babba/attachment-0001.html>

From andyfaff at gmail.com  Tue Jun 26 03:30:01 2018
From: andyfaff at gmail.com (Andrew Nelson)
Date: Tue, 26 Jun 2018 17:30:01 +1000
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
Message-ID: <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>

On Tue, 26 Jun 2018 at 17:12, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> > I don't think it should be relegated to the "officially discouraged"
> ghetto of `.legacy_index`
>
> The way I read it, the new spelling lof that would be the explicit but not
> discouraged `image.vindex[rr, cc]`.
>

If I'm understanding correctly what can be achieved now by `arr[rr, cc]`
would have to be modified to use `arr.vindex[rr, cc]`, which is a very
large change in behaviour. I suspect that there a lot of situations out
there which use `arr[idxs]` where `idxs` can mean one of a range of things
depending on the code path followed. If any of those change, or a mix of
nomenclatures are required to access the different cases, then havoc will
probably ensue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/f3baf438/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 03:46:02 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 00:46:02 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
Message-ID: <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>

On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> > I don't think it should be relegated to the "officially discouraged"
> ghetto of `.legacy_index`
>
> The way I read it, the new spelling lof that would be the explicit but not
> discouraged `image.vindex[rr, cc]`.
>

Okay, I missed that the first time through. I think having more
self-contained descriptions of the semantics of each of these would be a
good idea. The current description of `.vindex` spends more time talking
about what it doesn't do, compared to the other methods, than what it does.

Some more typical, less-exotic examples would be a good idea.

> I would reserve warnings for the cases where the current behavior is
> something no one really wants, like mixing slices and integer arrays.
>
> These are the cases that would only be available under `legacy_index`.
>

I'm still leaning towards not warning on current, unproblematic common
uses. It's unnecessary churn for currently working, understandable code. I
would still reserve warnings and deprecation for the cases where the
current behavior gives us something that no one wants. Those are the real
traps that people need to be warned away from.

If someone is mixing slices and integer indices, that's a really good sign
that they thought indexing behaved in a different way (e.g. orthogonal
indexing).

If someone is just using multiple index arrays that would currently not
give an error, that's actually a really good sign that they are using it
correctly and are getting the semantics that they desired. If they wanted
orthogonal indexing, it is *really* likely that their index arrays would
*not* broadcast together. And even if they did, the wrong shape of the
result is one of the more easily noticed things. These are not silent
errors that would motivate adding a new warning.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/569afde2/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 03:54:43 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 00:54:43 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
Message-ID: <CAF6FJivHigo-xZzFYVSBfWX87qPy46QdYv3MvQsZ=OeC68Xn5A@mail.gmail.com>

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> > I would reserve warnings for the cases where the current behavior is
>> something no one really wants, like mixing slices and integer arrays.
>>
>> These are the cases that would only be available under `legacy_index`.
>>
>
> I'm still leaning towards not warning on current, unproblematic common
> uses. It's unnecessary churn for currently working, understandable code. I
> would still reserve warnings and deprecation for the cases where the
> current behavior gives us something that no one wants. Those are the real
> traps that people need to be warned away from.
>
> If someone is mixing slices and integer indices, that's a really good sign
> that they thought indexing behaved in a different way (e.g. orthogonal
> indexing).
>
> If someone is just using multiple index arrays that would currently not
> give an error, that's actually a really good sign that they are using it
> correctly and are getting the semantics that they desired. If they wanted
> orthogonal indexing, it is *really* likely that their index arrays would
> *not* broadcast together. And even if they did, the wrong shape of the
> result is one of the more easily noticed things. These are not silent
> errors that would motivate adding a new warning.
>

Of course, I would definitely support adding more information to the
various IndexError messages to point people to `.oindex` and `.vindex`. I
think that would guide more people to correct their code than adding a new
warning to code that currently executes (which is likely not erroneous),
and it would cause no churn.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/2d98d61d/attachment.html>

From sebastian at sipsolutions.net  Tue Jun 26 03:57:07 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 26 Jun 2018 09:57:07 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
Message-ID: <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>

On Tue, 2018-06-26 at 17:30 +1000, Andrew Nelson wrote:
> On Tue, 26 Jun 2018 at 17:12, Eric Wieser <wieser.eric+numpy at gmail.co
> m> wrote:
> > > I don't think it should be relegated to the "officially
> > discouraged" ghetto of `.legacy_index`
> > 
> > The way I read it, the new spelling lof that would be the explicit
> > but not discouraged `image.vindex[rr, cc]`.
> > 
> 
> If I'm understanding correctly what can be achieved now by `arr[rr,
> cc]` would have to be modified to use `arr.vindex[rr, cc]`, which is
> a very large change in behaviour. I suspect that there a lot of
> situations out there which use `arr[idxs]` where `idxs` can mean one
> of a range of things depending on the code path followed. If any of
> those change, or a mix of nomenclatures are required to access the
> different cases, then havoc will probably ensue.

Yes, that is true, but I doubt you will find a lot of code path that
need the current indexing as opposed to vindex here, and the idea was
to have a method to get the old behaviour indefinitely. You will need
to add the `.vindex`, but that should be the only code change needed,
and it would be easy to find where with errors/warnings.
I see a possible problem with code that has to work on different numpy
versions, but only in meaning we need to delay deprecations.

The only thing I could imagine where this might happen is if you
forward someone elses indexing objects and different users are used to
different results.
Otherwise, there is mostly one case which would get annoying, and that
is `arr[:, rr, cc]` since `arr.vindex[:, rr, cc]` would not be exactly
the same. Because, yes, in some cases the current logic is convenient,
just incredibly surprising as well.

- Sebastian

> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/e26ff80f/attachment-0001.sig>

From einstein.edison at gmail.com  Tue Jun 26 04:01:24 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 26 Jun 2018 04:01:24 -0400
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
Message-ID: <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>

 I second this design. If we were to consider the general case of a tuple
`idx`, then we?d not be moving forward at all. Design changes would be
impossible. I?d argue that this newer model would be easier for library
maintainers overall (who are the kind of people using this), reducing
maintenance cost in the long run because it?d lead to simpler code.

I would also that the ?internal? classes expressing outer as vectorised
indexing etc. should be exposed, for maintainers of duck arrays to use. God
knows how many utility functions I?ve had to write to avoid relying on
undocumented NumPy internals for pydata/sparse, fearing that I?d have to
rewrite/modify them when behaviour changes or I find other corner cases.

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac

On 26. Jun 2018 at 09:46, Robert Kern <robert.kern at gmail.com> wrote:


On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> > I don't think it should be relegated to the "officially discouraged"
> ghetto of `.legacy_index`
>
> The way I read it, the new spelling lof that would be the explicit but not
> discouraged `image.vindex[rr, cc]`.
>

Okay, I missed that the first time through. I think having more
self-contained descriptions of the semantics of each of these would be a
good idea. The current description of `.vindex` spends more time talking
about what it doesn't do, compared to the other methods, than what it does.

Some more typical, less-exotic examples would be a good idea.

> I would reserve warnings for the cases where the current behavior is
> something no one really wants, like mixing slices and integer arrays.
>
> These are the cases that would only be available under `legacy_index`.
>

I'm still leaning towards not warning on current, unproblematic common
uses. It's unnecessary churn for currently working, understandable code. I
would still reserve warnings and deprecation for the cases where the
current behavior gives us something that no one wants. Those are the real
traps that people need to be warned away from.

If someone is mixing slices and integer indices, that's a really good sign
that they thought indexing behaved in a different way (e.g. orthogonal
indexing).

If someone is just using multiple index arrays that would currently not
give an error, that's actually a really good sign that they are using it
correctly and are getting the semantics that they desired. If they wanted
orthogonal indexing, it is *really* likely that their index arrays would
*not* broadcast together. And even if they did, the wrong shape of the
result is one of the more easily noticed things. These are not silent
errors that would motivate adding a new warning.

-- 
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/42078ccd/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 04:21:00 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 01:21:00 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
Message-ID: <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>

On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2018-06-26 at 17:30 +1000, Andrew Nelson wrote:
> > On Tue, 26 Jun 2018 at 17:12, Eric Wieser <wieser.eric+numpy at gmail.co
> > m> wrote:
> > > > I don't think it should be relegated to the "officially
> > > discouraged" ghetto of `.legacy_index`
> > >
> > > The way I read it, the new spelling lof that would be the explicit
> > > but not discouraged `image.vindex[rr, cc]`.
> > >
> >
> > If I'm understanding correctly what can be achieved now by `arr[rr,
> > cc]` would have to be modified to use `arr.vindex[rr, cc]`, which is
> > a very large change in behaviour. I suspect that there a lot of
> > situations out there which use `arr[idxs]` where `idxs` can mean one
> > of a range of things depending on the code path followed. If any of
> > those change, or a mix of nomenclatures are required to access the
> > different cases, then havoc will probably ensue.
>
> Yes, that is true, but I doubt you will find a lot of code path that
> need the current indexing as opposed to vindex here,


That's probably true! But I think it's besides the point. I'd wager that
most code paths that will use .vindex would work perfectly well with
current indexing, too. Most of the time, people aren't getting into the
hairy corners of advanced indexing.

Adding to the toolbox is great, but I don't see a good reason to take out
the ones that are commonly used quite safely.


> and the idea was
> to have a method to get the old behaviour indefinitely. You will need
> to add the `.vindex`, but that should be the only code change needed,
> and it would be easy to find where with errors/warnings.
>

It's not necessarily hard; it's just churn for no benefit to the downstream
code. They didn't get a new feature; they just have to run faster to stay
in the same place.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/a1508b08/attachment.html>

From einstein.edison at gmail.com  Tue Jun 26 04:23:23 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 26 Jun 2018 04:23:23 -0400
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
Message-ID: <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>

 > Boolean indices are not supported. All indices must be integers, integer
arrays or slices.

I would hope that there?s at least some way to do boolean indexing. I often
find myself needing it. I realise that
`arr.vindex[np.nonzero(boolean_idx)]` works, but it is slightly too verbose
for my liking. Maybe we can have `arr.bindex[boolean_index]` as an alias to
exactly that?

Or is boolean indexing preserved as-is n the newest proposal? If so, great!

Another thing I?d say is `arr.?index` should be replaced with `arr.?idx`. I
personally prefer `arr.?x` for my fingers but I realise that for someone
not super into NumPy indexing, this is kind of opaque to read, so I propose
this less verbose but hopefully equally clear version, for my (and others?)
brains.

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/833b16c5/attachment-0001.html>

From teoliphant at gmail.com  Tue Jun 26 04:24:06 2018
From: teoliphant at gmail.com (Travis Oliphant)
Date: Tue, 26 Jun 2018 02:24:06 -0600
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
Message-ID: <CAFMmPGMse5_-7PRmEUcPGWV0zqn-SAYvD-XitEZikBO44a1QMg@mail.gmail.com>

I like the proposal generally.  NumPy could use a good orthogonal indexing
method and a vectorized-indexing method is fine too.

Robert Kern is spot on with his concerns as well.  Please do not change
what arr[idx] does except to provide warnings and perhaps point people to
new .oix and .vix methods.  What indexing does is documented (if hard to
understand and surprising in a particular sub-case).

There is one specific place in the code where I would make a change to
raise an error rather than change the order of the axes of the output to
provide a consistent subspace.  Even then, it should be done as a
deprecation warning and then raise the error.

Otherwise, just add the new methods and don't make any other changes until
a major release.

-Travis


On Tue, Jun 26, 2018 at 2:03 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> I second this design. If we were to consider the general case of a tuple
> `idx`, then we?d not be moving forward at all. Design changes would be
> impossible. I?d argue that this newer model would be easier for library
> maintainers overall (who are the kind of people using this), reducing
> maintenance cost in the long run because it?d lead to simpler code.
>
> I would also that the ?internal? classes expressing outer as vectorised
> indexing etc. should be exposed, for maintainers of duck arrays to use. God
> knows how many utility functions I?ve had to write to avoid relying on
> undocumented NumPy internals for pydata/sparse, fearing that I?d have to
> rewrite/modify them when behaviour changes or I find other corner cases.
>
> Best Regards,
> Hameer Abbasi
> Sent from Astro <https://www.helloastro.com> for Mac
>
> On 26. Jun 2018 at 09:46, Robert Kern <robert.kern at gmail.com> wrote:
>
>
> On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> > I don't think it should be relegated to the "officially discouraged"
>> ghetto of `.legacy_index`
>>
>> The way I read it, the new spelling lof that would be the explicit but
>> not discouraged `image.vindex[rr, cc]`.
>>
>
> Okay, I missed that the first time through. I think having more
> self-contained descriptions of the semantics of each of these would be a
> good idea. The current description of `.vindex` spends more time talking
> about what it doesn't do, compared to the other methods, than what it does.
>
> Some more typical, less-exotic examples would be a good idea.
>
> > I would reserve warnings for the cases where the current behavior is
>> something no one really wants, like mixing slices and integer arrays.
>>
>> These are the cases that would only be available under `legacy_index`.
>>
>
> I'm still leaning towards not warning on current, unproblematic common
> uses. It's unnecessary churn for currently working, understandable code. I
> would still reserve warnings and deprecation for the cases where the
> current behavior gives us something that no one wants. Those are the real
> traps that people need to be warned away from.
>
> If someone is mixing slices and integer indices, that's a really good sign
> that they thought indexing behaved in a different way (e.g. orthogonal
> indexing).
>
> If someone is just using multiple index arrays that would currently not
> give an error, that's actually a really good sign that they are using it
> correctly and are getting the semantics that they desired. If they wanted
> orthogonal indexing, it is *really* likely that their index arrays would
> *not* broadcast together. And even if they did, the wrong shape of the
> result is one of the more easily noticed things. These are not silent
> errors that would motivate adding a new warning.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/4cb45089/attachment.html>

From wieser.eric+numpy at gmail.com  Tue Jun 26 04:28:09 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Tue, 26 Jun 2018 01:28:09 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>
Message-ID: <CAL1kJvDurU0KiyLDxRWw-Ld4s18YRPSKfpV9o2QYOvdcxkm+Kw@mail.gmail.com>

Another thing I?d say is arr.?index should be replaced with arr.?idx.

Or perhaps arr.o_[] and arr.v_[], to match the style of our existing
np.r_, np.c_, np.s_, etc?

From robert.kern at gmail.com  Tue Jun 26 04:33:15 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 01:33:15 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAFMmPGMse5_-7PRmEUcPGWV0zqn-SAYvD-XitEZikBO44a1QMg@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
 <CAFMmPGMse5_-7PRmEUcPGWV0zqn-SAYvD-XitEZikBO44a1QMg@mail.gmail.com>
Message-ID: <CAF6FJisCBBzMsc=pqw2Zhwbp2WSKFAGZR4gZL5v5B1WrZMXQuQ@mail.gmail.com>

On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant at gmail.com>
wrote:

> I like the proposal generally.  NumPy could use a good orthogonal indexing
> method and a vectorized-indexing method is fine too.
>
> Robert Kern is spot on with his concerns as well.  Please do not change
> what arr[idx] does except to provide warnings and perhaps point people to
> new .oix and .vix methods.  What indexing does is documented (if hard to
> understand and surprising in a particular sub-case).
>
> There is one specific place in the code where I would make a change to
> raise an error rather than change the order of the axes of the output to
> provide a consistent subspace.  Even then, it should be done as a
> deprecation warning and then raise the error.
>
> Otherwise, just add the new methods and don't make any other changes until
> a major release.
>

I'd suggest that the NEP explicitly disclaim deprecating current behavior.
Let the NEP just be about putting the new features out there. Once we have
some experience with them for a year or three, then let's talk about
deprecating parts of the current behavior and make a new NEP then if we
want to go that route. We're only contemplating *long* deprecation cycles
anyways; we're not in a race. The success of these new features doesn't
really rely on the deprecation of current indexing, so let's separate those
issues.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/01505555/attachment.html>

From einstein.edison at gmail.com  Tue Jun 26 04:34:37 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 26 Jun 2018 04:34:37 -0400
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAL1kJvDurU0KiyLDxRWw-Ld4s18YRPSKfpV9o2QYOvdcxkm+Kw@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>
 <CAL1kJvDurU0KiyLDxRWw-Ld4s18YRPSKfpV9o2QYOvdcxkm+Kw@mail.gmail.com>
Message-ID: <CADViA5AHe85S0+iN5kPEi9P7SX0uFOgFD0+=M7VgvWg46Rg6gg@mail.gmail.com>

I actually had to think a lot, read docs, use SO and so on to realise what
those meant the first time around, I didn?t understand them on sight.

And I had to keep coming back to the docs from time to time as I wasn?t
exactly using them too much (for exactly this reason, when some problems
could be solved more simply by doing just that).

I?d prefer something that sticks in your head and ?underscore? for
?indexing? didn't do that for me.

Of course, this was my experience as a first-timer. I?d prefer not to up
the learning curve for others in the same situation.

An experienced user might disagree. :-)

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac

On 26. Jun 2018 at 10:28, Eric Wieser <wieser.eric+numpy at gmail.com> wrote:


Another thing I?d say is arr.?index should be replaced with arr.?idx.

Or perhaps arr.o_[] and arr.v_[], to match the style of our existing
np.r_, np.c_, np.s_, etc?
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/b4b7fa96/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Jun 26 04:35:20 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 26 Jun 2018 10:35:20 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
Message-ID: <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>

On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
> On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:

<snip>

> > 
> > Yes, that is true, but I doubt you will find a lot of code path
> > that
> > need the current indexing as opposed to vindex here,
> 
> That's probably true! But I think it's besides the point. I'd wager
> that most code paths that will use .vindex would work perfectly well
> with current indexing, too. Most of the time, people aren't getting
> into the hairy corners of advanced indexing.
> 

Right, the proposal was to have DeprecationWarnings when they differ,
now I also thought DeprecationWarnings on two advanced indexes in
general is good, because it is good for new users.
I have to agree with your argument that most of the confused should be
running into broadcast errors (if they expect oindex vs. fancy). So I
see this as a point that we likely should just limit ourselves at least
for now to the cases for example with sudden transposing going on.

However, I would like to point out that the reason for the more broad
warnings is that it could allow warping normal indexing at some point.
Also it decreases traps with array-likes that behave differently.


> Adding to the toolbox is great, but I don't see a good reason to take
> out the ones that are commonly used quite safely.
>  
> > and the idea was
> > to have a method to get the old behaviour indefinitely. You will
> > need
> > to add the `.vindex`, but that should be the only code change
> > needed,
> > and it would be easy to find where with errors/warnings.
> 
> It's not necessarily hard; it's just churn for no benefit to the
> downstream code. They didn't get a new feature; they just have to run
> faster to stay in the same place.
> 

So, yes, it is annoying for quite a few projects that correctly use
fancy indexing, but if we choose to not annoy you a little, we will
have much less long term options which also includes such projects
compatibility to new/current array-likes.
So basically one point is: if we annoy scikit-image now, their code
will work better for dask arrays in the future hopefully.

- Sebastian


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/a85f2c8d/attachment.sig>

From sebastian at sipsolutions.net  Tue Jun 26 04:41:24 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 26 Jun 2018 10:41:24 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CADViA5BqsJxq2648jzwT76CBissR1QDAtLP_chjYMCWwBDXL+w@mail.gmail.com>
Message-ID: <7ce86e31ffa2ad107c4a77b35b9b817562c54193.camel@sipsolutions.net>

On Tue, 2018-06-26 at 04:23 -0400, Hameer Abbasi wrote:
> > Boolean indices are not supported. All indices must be integers,
> integer arrays or slices.
> 
> I would hope that there?s at least some way to do boolean indexing. I
> often find myself needing it. I realise that
> `arr.vindex[np.nonzero(boolean_idx)]` works, but it is slightly too
> verbose for my liking. Maybe we can have `arr.bindex[boolean_index]`
> as an alias to exactly that?
> 

That part is limited to `vindex` only. A single boolean index would
always work in plain indexing and you can mix it all up inside of
`oindex`. But with fancy indexing mixing boolean + integer seems
currently pretty much useless (and thus the same is true for `vindex`,
in `oindex` things make sense).
Now you could invent some new logic for such a mixing case in `vindex`,
but it seems easier to just ignore it for the moment.

- Sebastian


> Or is boolean indexing preserved as-is n the newest proposal? If so,
> great!
> 
> Another thing I?d say is `arr.?index` should be replaced with
> `arr.?idx`. I personally prefer `arr.?x` for my fingers but I realise
> that for someone not super into NumPy indexing, this is kind of
> opaque to read, so I propose this less verbose but hopefully equally
> clear version, for my (and others?) brains.
> 
> Best Regards,
> Hameer Abbasi
> Sent from Astro for Mac
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/0cf50f9c/attachment.sig>

From einstein.edison at gmail.com  Tue Jun 26 04:48:22 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 26 Jun 2018 04:48:22 -0400
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJisCBBzMsc=pqw2Zhwbp2WSKFAGZR4gZL5v5B1WrZMXQuQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
 <CAFMmPGMse5_-7PRmEUcPGWV0zqn-SAYvD-XitEZikBO44a1QMg@mail.gmail.com>
 <CAF6FJisCBBzMsc=pqw2Zhwbp2WSKFAGZR4gZL5v5B1WrZMXQuQ@mail.gmail.com>
Message-ID: <CADViA5CQV4BTxgsDXBKjT2L4=bJk32hsxSu-k0sQmk2aDiphbw@mail.gmail.com>

 I would disagree here. For libraries like Dask, XArray, pydata/sparse,
XND, etc., it would be bad for them if there was continued use of ?weird?
indexing behaviour (no warnings means more code written that?s? well? not
exactly the best design). Of course, we could just choose to not support
it. But that means a lot of code won?t support us, or support us later than
we desire.

I agree with your design of ?let?s limit the number of
warnings/deprecations to cases that make very little sense? but there
should be warnings.

Specifically, I recommend warnings for mixed slices and fancy indexes, and
warnings followed by errors for cases where the transposing behaviour
occurs.

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac

On 26. Jun 2018 at 10:33, Robert Kern <robert.kern at gmail.com> wrote:


On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant at gmail.com>
wrote:

> I like the proposal generally.  NumPy could use a good orthogonal indexing
> method and a vectorized-indexing method is fine too.
>
> Robert Kern is spot on with his concerns as well.  Please do not change
> what arr[idx] does except to provide warnings and perhaps point people to
> new .oix and .vix methods.  What indexing does is documented (if hard to
> understand and surprising in a particular sub-case).
>
> There is one specific place in the code where I would make a change to
> raise an error rather than change the order of the axes of the output to
> provide a consistent subspace.  Even then, it should be done as a
> deprecation warning and then raise the error.
>
> Otherwise, just add the new methods and don't make any other changes until
> a major release.
>

I'd suggest that the NEP explicitly disclaim deprecating current behavior.
Let the NEP just be about putting the new features out there. Once we have
some experience with them for a year or three, then let's talk about
deprecating parts of the current behavior and make a new NEP then if we
want to go that route. We're only contemplating *long* deprecation cycles
anyways; we're not in a race. The success of these new features doesn't
really rely on the deprecation of current indexing, so let's separate those
issues.

-- 
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/e410d9d2/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 04:57:54 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 01:57:54 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CADViA5CQV4BTxgsDXBKjT2L4=bJk32hsxSu-k0sQmk2aDiphbw@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
 <CAFMmPGMse5_-7PRmEUcPGWV0zqn-SAYvD-XitEZikBO44a1QMg@mail.gmail.com>
 <CAF6FJisCBBzMsc=pqw2Zhwbp2WSKFAGZR4gZL5v5B1WrZMXQuQ@mail.gmail.com>
 <CADViA5CQV4BTxgsDXBKjT2L4=bJk32hsxSu-k0sQmk2aDiphbw@mail.gmail.com>
Message-ID: <CAF6FJiu62wruEh9z8dYfSRsHo_uN_1h1WDO3wCd1f4tD8BgskA@mail.gmail.com>

On Tue, Jun 26, 2018 at 1:49 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:
>
> On 26. Jun 2018 at 10:33, Robert Kern <robert.kern at gmail.com> wrote:
>
> > I'd suggest that the NEP explicitly disclaim deprecating current
behavior. Let the NEP just be about putting the new features out there.
Once we have some experience with them for a year or three, then let's talk
about deprecating parts of the current behavior and make a new NEP then if
we want to go that route. We're only contemplating *long* deprecation
cycles anyways; we're not in a race. The success of these new features
doesn't really rely on the deprecation of current indexing, so let's
separate those issues.
>
> I would disagree here. For libraries like Dask, XArray, pydata/sparse,
XND, etc., it would be bad for them if there was continued use of ?weird?
indexing behaviour (no warnings means more code written that?s? well? not
exactly the best design). Of course, we could just choose to not support
it. But that means a lot of code won?t support us, or support us later than
we desire.
>
> I agree with your design of ?let?s limit the number of
warnings/deprecations to cases that make very little sense? but there
should be warnings.

I'm still in favor of warnings in these cases. I didn't mean to suggest
excluding those from the NEP. I just don't think they should be
deprecations; we shouldn't suggest that they will eventually turn into
errors. At least until we get these features out there, get some experience
with them, then have a new NEP at that time just proposing deprecation.

P.S. Would you mind bottom-posting? It helps maintain the context of what
you are commenting on and my reply to those comments. I tried writing this
reply without it, and it felt like it was missing context. Thanks!

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/fbd554dc/attachment-0001.html>

From robert.kern at gmail.com  Tue Jun 26 05:27:22 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 02:27:22 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
Message-ID: <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>

On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
> > On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg
> > <sebastian at sipsolutions.net> wrote:
>
> <snip>
>
> > >
> > > Yes, that is true, but I doubt you will find a lot of code path
> > > that
> > > need the current indexing as opposed to vindex here,
> >
> > That's probably true! But I think it's besides the point. I'd wager
> > that most code paths that will use .vindex would work perfectly well
> > with current indexing, too. Most of the time, people aren't getting
> > into the hairy corners of advanced indexing.
> >
>
> Right, the proposal was to have DeprecationWarnings when they differ,
> now I also thought DeprecationWarnings on two advanced indexes in
> general is good, because it is good for new users.
> I have to agree with your argument that most of the confused should be
> running into broadcast errors (if they expect oindex vs. fancy). So I
> see this as a point that we likely should just limit ourselves at least
> for now to the cases for example with sudden transposing going on.
>
> However, I would like to point out that the reason for the more broad
> warnings is that it could allow warping normal indexing at some point.
>

I don't really understand this. You would discourage the "normal" syntax in
favor of these more specific named syntaxes, so you can introduce different
behavior for the "normal" syntax and encourage everyone to use it again?
Just add more named syntaxes if you want new behavior! That's the beauty of
the design underlying this NEP.


> Also it decreases traps with array-likes that behave differently.
>

If we were to take this seriously, then no one should use a bare [] ever.

I'll go on record as saying that array-likes should respond to `a[rr, cc]`,
as in Juan's example, with the current behavior. And if they don't, they
don't deserve to be operated on by skimage functions.

If I'm reading the NEP correctly, the main thrust of the issue with
array-likes is that it is difficult for some of them to implement the full
spectrum of indexing possibilities. This NEP does not actually make it
*easier* for those array-likes to implement every possibility. It just
offers some APIs that more naturally express common use cases which can
sometimes be implemented more naturally than if expressed in the current
indexing. For instance, you can achieve the same effect as orthogonal
indexing with the current implementation, but you have to manipulate the
indices before you pass them over to __getitem__(), losing information
along the way that could be used to make a more efficient lookup in some
array-likes.

The NEP design is essentially more of a way to give these array-likes
standard places to raise NotImplementedError than it is to help them get
rid of all of their NotImplementedErrors. More specifically, if these
array-likes can't implement `a[rr, cc]`, they're not going to implement
`a.vindex[rr, cc]`, either.

I think most of the problems that caused these libraries to make different
choices in their __getitem__() implementation are due to the fact that
these expressive APIs didn't exist, so they had to shoehorn them into
__getitem__(); orthogonal indexing was too useful and efficient not to
implement! I think that once we have .oindex and .vindex out there, they
will be able to clean up their __getitem__()s to consistently support
whatever of the current behavior that they can and raise
NotImplementedError where they can't.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/008c0aa2/attachment.html>

From sebastian at sipsolutions.net  Tue Jun 26 06:48:11 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 26 Jun 2018 12:48:11 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
Message-ID: <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>

On Tue, 2018-06-26 at 02:27 -0700, Robert Kern wrote:
> On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian at sipsolution
> s.net> wrote:
> > On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
> > > On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg
> > > <sebastian at sipsolutions.net> wrote:
> > 
> > <snip>
> > 
> > > > 
> > > > Yes, that is true, but I doubt you will find a lot of code path
> > > > that
> > > > need the current indexing as opposed to vindex here,
> > > 
> > > That's probably true! But I think it's besides the point. I'd
> > wager
> > > that most code paths that will use .vindex would work perfectly
> > well
> > > with current indexing, too. Most of the time, people aren't
> > getting
> > > into the hairy corners of advanced indexing.
> > > 
> > 
> > Right, the proposal was to have DeprecationWarnings when they
> > differ,
> > now I also thought DeprecationWarnings on two advanced indexes in
> > general is good, because it is good for new users.
> > I have to agree with your argument that most of the confused should
> > be
> > running into broadcast errors (if they expect oindex vs. fancy). So
> > I
> > see this as a point that we likely should just limit ourselves at
> > least
> > for now to the cases for example with sudden transposing going on.
> > 
> > However, I would like to point out that the reason for the more
> > broad
> > warnings is that it could allow warping normal indexing at some
> > point.
> > 
> 
> I don't really understand this. You would discourage the "normal"
> syntax in favor of these more specific named syntaxes, so you can
> introduce different behavior for the "normal" syntax and encourage
> everyone to use it again? Just add more named syntaxes if you want
> new behavior! That's the beauty of the design underlying this NEP.
>  
> > Also it decreases traps with array-likes that behave differently.
> 
> If we were to take this seriously, then no one should use a bare []
> ever.
> 
> I'll go on record as saying that array-likes should respond to `a[rr,
> cc]`, as in Juan's example, with the current behavior. And if they
> don't, they don't deserve to be operated on by skimage functions.
> 
> If I'm reading the NEP correctly, the main thrust of the issue with
> array-likes is that it is difficult for some of them to implement the
> full spectrum of indexing possibilities. This NEP does not actually
> make it *easier* for those array-likes to implement every
> possibility. It just offers some APIs that more naturally express
> common use cases which can sometimes be implemented more naturally
> than if expressed in the current indexing. For instance, you can
> achieve the same effect as orthogonal indexing with the current
> implementation, but you have to manipulate the indices before you
> pass them over to __getitem__(), losing information along the way
> that could be used to make a more efficient lookup in some array-
> likes.
> 
> The NEP design is essentially more of a way to give these array-likes 
> standard places to raise NotImplementedError than it is to help them
> get rid of all of their NotImplementedErrors. More specifically, if
> these array-likes can't implement `a[rr, cc]`, they're not going to
> implement `a.vindex[rr, cc]`, either.
> 
> I think most of the problems that caused these libraries to make
> different choices in their __getitem__() implementation are due to
> the fact that these expressive APIs didn't exist, so they had to
> shoehorn them into __getitem__(); orthogonal indexing was too useful
> and efficient not to implement! I think that once we have .oindex and
> .vindex out there, they will be able to clean up their __getitem__()s
> to consistently support whatever of the current behavior that they
> can and raise NotImplementedError where they can't.
> 

Right, it helps mostly to be clear about what an object can and cannot
do. So h5py or whatever could error out for plain indexing and only
support `.oindex`, and we have all options cleanly available.

And yes, I agree that in itself is a big step forward.

The thing is there are also very strong opinions that the fancy
indexing behaviour is so confusing that it would ideally not be the
default since it breaks comparing analogy slice objects.

So, personally, I would argue that if we were to start over from
scratch, fancy indexing (multiple indexes), would not be the default
plain indexing behaviour.
Now, maybe the pain of a few warnings is too high, but if we wish to
move, no matter how slowly, in such regard, we will have to swallow it
eventually.
The suggestion was to make that as easy as possible with adding an
attribute indefinitely.
Otherwise, even a possible numpy replacement might have difficulties to
chose a different default for indexing for years to come...

Practically, I guess some warnings might have to wait a longer while,
just because it could be almost impossible to avoid them in code
working with different numpy versions.

- Sebastian


> -- 
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/64806f8a/attachment.sig>

From kevin.k.sheppard at gmail.com  Tue Jun 26 06:50:20 2018
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Tue, 26 Jun 2018 10:50:20 +0000
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing
In-Reply-To: <mailman.880.1530003507.7729.numpy-discussion@python.org>
References: <mailman.880.1530003507.7729.numpy-discussion@python.org>
Message-ID: <VI1PR0701MB2957DCEB099C46FC7C20E949FE490@VI1PR0701MB2957.eurprd07.prod.outlook.com>

This seems like progress and a clear method to outer indexing will help many users.

As for names, I prefer .ox and .vx for shorthand of .oindex and .vindex.  I don?t like the .ox_ or .o_ syntax.

Before any deprecation warnings or any other warnings are added it would be helpful to have some way to set a flag on Python to show some sort of HiddenDeprecationWarning (or OnlyShowIfFlagPassesDeprecationWarning) that would automatically be filtered by default but could be shown if someone was interested.  This will allow library writers to see problems before any start showing up for users. These could then be promoted to Visible or Future later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/6a2946c8/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Jun 26 11:03:19 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 26 Jun 2018 17:03:19 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
Message-ID: <b00c55993176f23caaf31aa56edd8420fcc5ff7c.camel@sipsolutions.net>

On Tue, 2018-06-26 at 04:01 -0400, Hameer Abbasi wrote:
> I second this design. If we were to consider the general case of a
> tuple `idx`, then we?d not be moving forward at all. Design changes
> would be impossible. I?d argue that this newer model would be easier
> for library maintainers overall (who are the kind of people using
> this), reducing maintenance cost in the long run because it?d lead to
> simpler code.
> 
> I would also that the ?internal? classes expressing outer as
> vectorised indexing etc. should be exposed, for maintainers of duck
> arrays to use. God knows how many utility functions I?ve had to write
> to avoid relying on undocumented NumPy internals for pydata/sparse,
> fearing that I?d have to rewrite/modify them when behaviour changes
> or I find other corner cases.

Could you list some examples what you would need? We can expose some of
the internals, or maybe even provide funcs to map e.g. oindex to vindex
or vindex to plain indexing, etc. but it would be helpful to know what
downstream actually might need. For all I know the things that you are
thinking of may not even exist...

- Sebastian


> 
> Best Regards,
> Hameer Abbasi
> Sent from Astro for Mac
> 
> > On 26. Jun 2018 at 09:46, Robert Kern <robert.kern at gmail.com>
> > wrote:
> > 
> > On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gma
> > il.com> wrote:
> > > > I don't think it should be relegated to the "officially
> > > discouraged" ghetto of `.legacy_index`
> > > 
> > > The way I read it, the new spelling lof that would be the
> > > explicit but not discouraged `image.vindex[rr, cc]`.
> > > 
> > 
> > Okay, I missed that the first time through. I think having more
> > self-contained descriptions of the semantics of each of these would
> > be a good idea. The current description of `.vindex` spends more
> > time talking about what it doesn't do, compared to the other
> > methods, than what it does.
> > 
> > Some more typical, less-exotic examples would be a good idea.
> > 
> > > > I would reserve warnings for the cases where the current
> > > behavior is something no one really wants, like mixing slices and
> > > integer arrays. 
> > > 
> > > These are the cases that would only be available under
> > > `legacy_index`.
> > > 
> > 
> > I'm still leaning towards not warning on current, unproblematic
> > common uses. It's unnecessary churn for currently working,
> > understandable code. I would still reserve warnings and deprecation
> > for the cases where the current behavior gives us something that no
> > one wants. Those are the real traps that people need to be warned
> > away from.
> > 
> > If someone is mixing slices and integer indices, that's a really
> > good sign that they thought indexing behaved in a different way
> > (e.g. orthogonal indexing).
> > 
> > If someone is just using multiple index arrays that would currently
> > not give an error, that's actually a really good sign that they are
> > using it correctly and are getting the semantics that they desired.
> > If they wanted orthogonal indexing, it is *really* likely that
> > their index arrays would *not* broadcast together. And even if they
> > did, the wrong shape of the result is one of the more easily
> > noticed things. These are not silent errors that would motivate
> > adding a new warning.
> > 
> > -- 
> > Robert Kern
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/5e0ab696/attachment.sig>

From wieser.eric+numpy at gmail.com  Tue Jun 26 12:36:39 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Tue, 26 Jun 2018 09:36:39 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <b00c55993176f23caaf31aa56edd8420fcc5ff7c.camel@sipsolutions.net>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
 <b00c55993176f23caaf31aa56edd8420fcc5ff7c.camel@sipsolutions.net>
Message-ID: <CAL1kJvCZ1dg0JP5EK-gbkaoCCT=_NrU7rDGn5eT9ONnCv0EOww@mail.gmail.com>

We can expose some of the internals

These could be expressed as methods on the internal indexing objects I
proposed in the first reply to this thread, which has seen no responses.

I think Hameer Abbasi is looking for something like
OrthogonalIndexer(...).to_vindex()
-> VectorizedIndexer such that arr.oindex[ind] selects the same elements as
arr.vindex[OrthogonalIndexer(ind).to_vindex()]

Eric
?

On Tue, 26 Jun 2018 at 08:04 Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2018-06-26 at 04:01 -0400, Hameer Abbasi wrote:
> > I second this design. If we were to consider the general case of a
> > tuple `idx`, then we?d not be moving forward at all. Design changes
> > would be impossible. I?d argue that this newer model would be easier
> > for library maintainers overall (who are the kind of people using
> > this), reducing maintenance cost in the long run because it?d lead to
> > simpler code.
> >
> > I would also that the ?internal? classes expressing outer as
> > vectorised indexing etc. should be exposed, for maintainers of duck
> > arrays to use. God knows how many utility functions I?ve had to write
> > to avoid relying on undocumented NumPy internals for pydata/sparse,
> > fearing that I?d have to rewrite/modify them when behaviour changes
> > or I find other corner cases.
>
> Could you list some examples what you would need? We can expose some of
> the internals, or maybe even provide funcs to map e.g. oindex to vindex
> or vindex to plain indexing, etc. but it would be helpful to know what
> downstream actually might need. For all I know the things that you are
> thinking of may not even exist...
>
> - Sebastian
>
>
>
> >
> > Best Regards,
> > Hameer Abbasi
> > Sent from Astro for Mac
> >
> > > On 26. Jun 2018 at 09:46, Robert Kern <robert.kern at gmail.com>
> > > wrote:
> > >
> > > On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy at gma
> > > il.com> wrote:
> > > > > I don't think it should be relegated to the "officially
> > > > discouraged" ghetto of `.legacy_index`
> > > >
> > > > The way I read it, the new spelling lof that would be the
> > > > explicit but not discouraged `image.vindex[rr, cc]`.
> > > >
> > >
> > > Okay, I missed that the first time through. I think having more
> > > self-contained descriptions of the semantics of each of these would
> > > be a good idea. The current description of `.vindex` spends more
> > > time talking about what it doesn't do, compared to the other
> > > methods, than what it does.
> > >
> > > Some more typical, less-exotic examples would be a good idea.
> > >
> > > > > I would reserve warnings for the cases where the current
> > > > behavior is something no one really wants, like mixing slices and
> > > > integer arrays.
> > > >
> > > > These are the cases that would only be available under
> > > > `legacy_index`.
> > > >
> > >
> > > I'm still leaning towards not warning on current, unproblematic
> > > common uses. It's unnecessary churn for currently working,
> > > understandable code. I would still reserve warnings and deprecation
> > > for the cases where the current behavior gives us something that no
> > > one wants. Those are the real traps that people need to be warned
> > > away from.
> > >
> > > If someone is mixing slices and integer indices, that's a really
> > > good sign that they thought indexing behaved in a different way
> > > (e.g. orthogonal indexing).
> > >
> > > If someone is just using multiple index arrays that would currently
> > > not give an error, that's actually a really good sign that they are
> > > using it correctly and are getting the semantics that they desired.
> > > If they wanted orthogonal indexing, it is *really* likely that
> > > their index arrays would *not* broadcast together. And even if they
> > > did, the wrong shape of the result is one of the more easily
> > > noticed things. These are not silent errors that would motivate
> > > adding a new warning.
> > >
> > > --
> > > Robert Kern
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/0a9dbac7/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 26 14:25:25 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 26 Jun 2018 14:25:25 -0400
Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAEQ_TvfFiAODZMNFfW7g1X1rNw4bTK2Z8XMme9GvEcnR8RUJFQ@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>
 <CAL1kJvADe=-q+Ydx_xyOURctjrNnT+rdr-DU8Da0hZB3wXTWuw@mail.gmail.com>
 <CAEQ_TvfFiAODZMNFfW7g1X1rNw4bTK2Z8XMme9GvEcnR8RUJFQ@mail.gmail.com>
Message-ID: <CAJNV+9t9XoDdM1WyJ0y6degHNe89GMQZPJ_ASZqooLUgKeqwfg@mail.gmail.com>

Hi All,

Matti asked me to make a PR accepting my own NEP -
https://github.com/numpy/numpy/pull/11429

Any objections?

As noted in my earlier summary of the discussion, in principle we can
choose to accept only parts, although I think it became clear that the most
contentious is also the one arguably most needed, the flexible dimensions
for matmul.

Moving forward has the advantage that in 1.16 we will actually be able to
deal with matmul.

All the best,

Marten

On Fri, Jun 15, 2018 at 2:17 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Mon, Jun 11, 2018 at 11:59 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> I don?t understand your alternative here. If we overload np.matmul using
>> *array_function*, then it would not use *ether* of these options for
>> writing the operation in terms of other gufuncs. It would simply look for
>> an *array_function* attribute, and call that method instead.
>>
>> Let me explain that suggestion a little more clearly.
>>
>>    1. There?d be a linalg.matmul2d that performs the real matrix case,
>>    which would be easy to make as a ufunc right now.
>>    2. __matmul__ and __rmatmul__ would just call np.matmul, as they
>>    currently do (for consistency between np.matmul and operator.matmul,
>>    needed in python pre- at -operator)
>>    3. np.matmul would be implemented as:
>>
>>    @do_array_function_overridesdef matmul(a, b):
>>        if a.ndim != 1 and b.ndim != 1:
>>            return matmul2d(a, b)
>>        elif a.ndim != 1:
>>            return matmul2d(a, b[:,None])[...,0]
>>        elif b.ndim != 1:
>>            return matmul2d(a[None,:], b)
>>        else:
>>            # this one probably deserves its own ufunf
>>            return matmul2d(a[None,:], b[:,None])[0,0]
>>
>>    4. Quantity can just override __array_ufunc__ as with any other ufunc
>>    5. DataArray, knowing the above doesn?t work, would implement
>>    something like
>>
>>    @matmul.register_array_function(DataArray)def __array_function__(a, b):
>>        if a.ndim != 1 and b.ndim != 1:
>>            return matmul2d(a, b)
>>        else:
>>            # either:
>>            # - add/remove dummy dimensions in a dataarray-specific way
>>            # - downcast to ndarray and do the dimension juggling there
>>
>>
>> Advantages of this approach:
>>
>>    -
>>
>>    Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop,
>>    need to know about optional dimensions.
>>    -
>>
>>    We get a matmul2d ufunc, that all subclasses support out of the box
>>    if they support matmul
>>
>> Eric
>>
> OK, this sounds pretty reasonable to me -- assuming we manage to figure
> out the __array_function__ proposal!
>
> There's one additional ingredient we would need to make this work well:
> some way to guarantee that "ndim" and indexing operations are available
> without casting to a base numpy array.
>
> For now, np.asanyarray() would probably suffice, but that isn't quite
> right (e.g., this would fail for np.matrix).
>
> In the long term, I think we need a new coercion protocol for "duck"
> arrays. Nathaniel Smith and I started writing a NEP on this, but it isn't
> quite ready yet.
>
>> ?
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/0c38a3db/attachment-0001.html>

From matti.picus at gmail.com  Tue Jun 26 17:43:30 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 26 Jun 2018 14:43:30 -0700
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
Message-ID: <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>

On 19/06/18 10:57, Matthew Brett wrote:
> Hi,
>
> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com> wrote:
>> On 19/06/18 09:58, Charles R Harris wrote:
>>>> What I was curious about is that there were no more "daily" builds of
>>>> master.
>>> Is that right?  That there were daily builds of master, on Appveyor?
>>> I don't know how those worked, I only recently got cron permission ...
>>
>> No, but there used to be daily builds on travis. They stopped 8 days ago,
>> https://travis-ci.org/MacPython/numpy-wheels/builds.
> Oops - yes - sorry - I retired the 'daily' branch, in favor of
> 'master', but forgot to update the Travis-CI settings.
>
> Done now.
>
> Cheers,
>
> Matthew
>
FWIW, still no daily builds at 
https://travis-ci.org/MacPython/numpy-wheels/builds
Matti

From matthew.brett at gmail.com  Tue Jun 26 17:55:13 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 26 Jun 2018 22:55:13 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
Message-ID: <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>

Hi,

On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com> wrote:
> On 19/06/18 10:57, Matthew Brett wrote:
>>
>> Hi,
>>
>> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com>
>> wrote:
>>>
>>> On 19/06/18 09:58, Charles R Harris wrote:
>>>>>
>>>>> What I was curious about is that there were no more "daily" builds of
>>>>> master.
>>>>
>>>> Is that right?  That there were daily builds of master, on Appveyor?
>>>> I don't know how those worked, I only recently got cron permission ...
>>>
>>>
>>> No, but there used to be daily builds on travis. They stopped 8 days ago,
>>> https://travis-ci.org/MacPython/numpy-wheels/builds.
>>
>> Oops - yes - sorry - I retired the 'daily' branch, in favor of
>> 'master', but forgot to update the Travis-CI settings.
>>
>> Done now.
>>
>> Cheers,
>>
>> Matthew
>>
> FWIW, still no daily builds at
> https://travis-ci.org/MacPython/numpy-wheels/builds

You mean, some days there appears to be no build?  The build matrix
does show Cron-triggered jobs, the last of which was a few hours ago:
https://travis-ci.org/MacPython/numpy-wheels/builds/397008012

Cheers,

Matthew

From robert.kern at gmail.com  Tue Jun 26 19:32:21 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 16:32:21 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
Message-ID: <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>

On Tue, Jun 26, 2018 at 3:50 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:
>
> On Tue, 2018-06-26 at 02:27 -0700, Robert Kern wrote:
> > On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian at sipsolution
> > s.net> wrote:
> > > On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
> > > > On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg
> > > > <sebastian at sipsolutions.net> wrote:
> > >
> > > <snip>
> > >
> > > > >
> > > > > Yes, that is true, but I doubt you will find a lot of code path
> > > > > that
> > > > > need the current indexing as opposed to vindex here,
> > > >
> > > > That's probably true! But I think it's besides the point. I'd
> > > wager
> > > > that most code paths that will use .vindex would work perfectly
> > > well
> > > > with current indexing, too. Most of the time, people aren't
> > > getting
> > > > into the hairy corners of advanced indexing.
> > > >
> > >
> > > Right, the proposal was to have DeprecationWarnings when they
> > > differ,
> > > now I also thought DeprecationWarnings on two advanced indexes in
> > > general is good, because it is good for new users.
> > > I have to agree with your argument that most of the confused should
> > > be
> > > running into broadcast errors (if they expect oindex vs. fancy). So
> > > I
> > > see this as a point that we likely should just limit ourselves at
> > > least
> > > for now to the cases for example with sudden transposing going on.
> > >
> > > However, I would like to point out that the reason for the more
> > > broad
> > > warnings is that it could allow warping normal indexing at some
> > > point.
> > >
> >
> > I don't really understand this. You would discourage the "normal"
> > syntax in favor of these more specific named syntaxes, so you can
> > introduce different behavior for the "normal" syntax and encourage
> > everyone to use it again? Just add more named syntaxes if you want
> > new behavior! That's the beauty of the design underlying this NEP.
> >
> > > Also it decreases traps with array-likes that behave differently.
> >
> > If we were to take this seriously, then no one should use a bare []
> > ever.
> >
> > I'll go on record as saying that array-likes should respond to `a[rr,
> > cc]`, as in Juan's example, with the current behavior. And if they
> > don't, they don't deserve to be operated on by skimage functions.
> >
> > If I'm reading the NEP correctly, the main thrust of the issue with
> > array-likes is that it is difficult for some of them to implement the
> > full spectrum of indexing possibilities. This NEP does not actually
> > make it *easier* for those array-likes to implement every
> > possibility. It just offers some APIs that more naturally express
> > common use cases which can sometimes be implemented more naturally
> > than if expressed in the current indexing. For instance, you can
> > achieve the same effect as orthogonal indexing with the current
> > implementation, but you have to manipulate the indices before you
> > pass them over to __getitem__(), losing information along the way
> > that could be used to make a more efficient lookup in some array-
> > likes.
> >
> > The NEP design is essentially more of a way to give these array-likes
> > standard places to raise NotImplementedError than it is to help them
> > get rid of all of their NotImplementedErrors. More specifically, if
> > these array-likes can't implement `a[rr, cc]`, they're not going to
> > implement `a.vindex[rr, cc]`, either.
> >
> > I think most of the problems that caused these libraries to make
> > different choices in their __getitem__() implementation are due to
> > the fact that these expressive APIs didn't exist, so they had to
> > shoehorn them into __getitem__(); orthogonal indexing was too useful
> > and efficient not to implement! I think that once we have .oindex and
> > .vindex out there, they will be able to clean up their __getitem__()s
> > to consistently support whatever of the current behavior that they
> > can and raise NotImplementedError where they can't.
> >
>
> Right, it helps mostly to be clear about what an object can and cannot
> do. So h5py or whatever could error out for plain indexing and only
> support `.oindex`, and we have all options cleanly available.
>
> And yes, I agree that in itself is a big step forward.

Okay, great. Before we move on to your next point, can we agree that the
array-likes aren't a motivating factor for deprecating the current behavior
of __getitem__()?

> The thing is there are also very strong opinions that the fancy
> indexing behaviour is so confusing that it would ideally not be the
> default since it breaks comparing analogy slice objects.
>
> So, personally, I would argue that if we were to start over from
> scratch, fancy indexing (multiple indexes), would not be the default
> plain indexing behaviour.
> Now, maybe the pain of a few warnings is too high, but if we wish to
> move, no matter how slowly, in such regard, we will have to swallow it
> eventually.
> The suggestion was to make that as easy as possible with adding an
> attribute indefinitely.
> Otherwise, even a possible numpy replacement might have difficulties to
> chose a different default for indexing for years to come...

So I think we've moved past the technical objections. In the post-NEP
.oindex/.vindex order, everyone can get the behavior that they want. Your
argument for deprecation is now just about what the default is, the
semantics that get pride of place with the shortest spelling. I am
sympathetic to the feeling like you wish you had a time machine to go fix a
design with your new insight. But it seems to me that just changing which
semantics are the default has relatively attenuated value while breaking
compatibility for a fundamental feature of numpy has significant costs.
Just introducing .oindex is the bulk of the value of this NEP. Everything
else is window dressing.

You have my sympathies, but not enough for me to consent to deprecation.
You might get more of my sympathy a year or two from now when the community
has had a chance to work with .oindex. It's entirely possible that everyone
will leap to using .oindex (and .vindex only rarely), and we will be
flooded with complaints that "I only use .oindex, but the name is so long
it messes up the readability of my lengthy expressions". But it's also
possible that it sort of fizzles: people use it, but maybe use .vindex
more, or about the same. Or just keep on happily using neither.

We don't know which of those futures are going to be true. Anecdatally, you
want .oindex semantics most often; I would almost exclusively use .vindex.
I don't know which of us is more representative. Probably neither.

I maintain that considering deprecation is premature at this time. Please
take it out of this NEP. Let us get a feel for how people actually use
.oindex/.vindex. Then we can talk about deprecation. This NEP gets my
enthusiastic approval, except for the deprecation. I will be happy to talk
about deprecation with an open mind in a few years. With some more actual
experience under our belt, rather than prediction and theory, we can be
more confident about the approach we want to take. Deprecation is not a
fundamental part of this NEP and can be decided independently at a later
time.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/012418c9/attachment-0001.html>

From shoyer at gmail.com  Tue Jun 26 21:13:25 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 Jun 2018 18:13:25 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
Message-ID: <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>

On Tue, Jun 26, 2018 at 4:34 PM Robert Kern <robert.kern at gmail.com> wrote:

> I maintain that considering deprecation is premature at this time. Please
> take it out of this NEP. Let us get a feel for how people actually use
> .oindex/.vindex. Then we can talk about deprecation. This NEP gets my
> enthusiastic approval, except for the deprecation. I will be happy to talk
> about deprecation with an open mind in a few years. With some more actual
> experience under our belt, rather than prediction and theory, we can be
> more confident about the approach we want to take. Deprecation is not a
> fundamental part of this NEP and can be decided independently at a later
> time.
>

I agree, we should scale back most of the deprecations proposed in this
NEP, leaving them for possible future work. In particular, you're not
convinced yet that "outer indexing" is a more intuitive default indexing
mode than "vectorized indexing", so it is premature to deprecate vectorized
indexing behavior that conflicts with outer indexing. OK, fair enough.

I would still like to include at least two more limited form of deprecation
that I hope will be less controversial:
- Mixed boolean/integer array indexing. This is not very intuitive nor
useful, and I don't think I've ever seen it used. Usually "outer indexing"
behavior is what is desired here.
- Mixed array/slice indexing, for cases with arrays separated by slices so
NumPy can't do the "intuitive" transpose on the output. As noted in the
NEP, this is a common source of bugs. Users who want this should really
switch to vindex.

In the long term, although I agree with Sebastian that "outer indexing" is
more intuitive for default indexing behavior, I would really like to
eliminate the "dimension reordering" behavior of mixed array/slice indexing
altogether. This is a weird special case that is different between indexing
like array[...] from array.vindex[...]. So if we don't choose to deprecate
all cases where [] and oindex[] are different, I would at least like to
deprecate all cases where [] and vindex[] are different.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/0c725406/attachment.html>

From shoyer at gmail.com  Tue Jun 26 21:22:24 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 Jun 2018 18:22:24 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAL1kJvCZ1dg0JP5EK-gbkaoCCT=_NrU7rDGn5eT9ONnCv0EOww@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CADViA5B4kCB9W6n7q=F-=AsVm5iGRKUPCUmxqNGda28Ny1RkzQ@mail.gmail.com>
 <b00c55993176f23caaf31aa56edd8420fcc5ff7c.camel@sipsolutions.net>
 <CAL1kJvCZ1dg0JP5EK-gbkaoCCT=_NrU7rDGn5eT9ONnCv0EOww@mail.gmail.com>
Message-ID: <CAEQ_TvcnEVjEK41mTRXX6G7Bzar0W22qQKnYNDmbXZrFALL7Vg@mail.gmail.com>

On Tue, Jun 26, 2018 at 9:38 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> We can expose some of the internals
>
> These could be expressed as methods on the internal indexing objects I
> proposed in the first reply to this thread, which has seen no responses.
>
> I think Hameer Abbasi is looking for something like OrthogonalIndexer(...).to_vindex()
> -> VectorizedIndexer such that arr.oindex[ind] selects the same elements
> as arr.vindex[OrthogonalIndexer(ind).to_vindex()]
>
> Eric
>

It is probably worth noting that xarray already uses very similar classes
internally for keeping track of indexing operations. See BasicIndexer,
OuterIndexer and VectorizedIndexer:
https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L295-L428

This turns out to be pretty convenient model even when not using
subclassing. In xarray, we use them internally in various "partial duck
array" classes that do some lazy computation upon indexing with
__getitem__. It's nice to simply be able to forward on Indexer objects
rather than implement separate vindex/oindex methods.

We also have utility functions for converting between different forms,
e.g., from OuterIndexer to VectorizedIndexer:
https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L654

I guess this is a case for using such classes internally in NumPy, and
possibly for exposing them publicly as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/819c5c79/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 21:38:44 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 18:38:44 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
Message-ID: <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>

On Tue, Jun 26, 2018 at 6:14 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Jun 26, 2018 at 4:34 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> I maintain that considering deprecation is premature at this time. Please
>> take it out of this NEP. Let us get a feel for how people actually use
>> .oindex/.vindex. Then we can talk about deprecation. This NEP gets my
>> enthusiastic approval, except for the deprecation. I will be happy to talk
>> about deprecation with an open mind in a few years. With some more actual
>> experience under our belt, rather than prediction and theory, we can be
>> more confident about the approach we want to take. Deprecation is not a
>> fundamental part of this NEP and can be decided independently at a later
>> time.
>>
>
> I agree, we should scale back most of the deprecations proposed in this
> NEP, leaving them for possible future work. In particular, you're not
> convinced yet that "outer indexing" is a more intuitive default indexing
> mode than "vectorized indexing", so it is premature to deprecate vectorized
> indexing behavior that conflicts with outer indexing. OK, fair enough.
>

Actually, I do think outer indexing is more "intuitive"*, as far as that
goes. It's just rarely what I actually want to accomplish.

* I do not like using "intuitive" in programming. Nipples are intuitive.
Everything else is learned. But in this case, I think that outer indexing
is a more concordant extension of the concepts that a new numpy user would
have learned earlier: integer indices and slices.

I would still like to include at least two more limited form of deprecation
> that I hope will be less controversial:
> - Mixed boolean/integer array indexing. This is not very intuitive nor
> useful, and I don't think I've ever seen it used. Usually "outer indexing"
> behavior is what is desired here.
> - Mixed array/slice indexing, for cases with arrays separated by slices so
> NumPy can't do the "intuitive" transpose on the output. As noted in the
> NEP, this is a common source of bugs. Users who want this should really
> switch to vindex.
>

I'd still prefer not talking deprecation, per se, in this NEP (but my
objection is weaker). I would definitely start adding in informative, noisy
warnings in these cases, though. Along the lines of, "Hey, this is a dodgy
construction that typically gives unexpected results. Here are
.oindex/.vindex that might do what you actually want, but you can use
.legacy_index if you just want to silence this warning". Rather than "Hey,
this is going to go away at some point."

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/2e189f9b/attachment-0001.html>

From shoyer at gmail.com  Tue Jun 26 21:45:49 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 Jun 2018 18:45:49 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
 <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
Message-ID: <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>

On Tue, Jun 26, 2018 at 6:39 PM Robert Kern <robert.kern at gmail.com> wrote:

> I'd still prefer not talking deprecation, per se, in this NEP (but my
> objection is weaker). I would definitely start adding in informative, noisy
> warnings in these cases, though. Along the lines of, "Hey, this is a dodgy
> construction that typically gives unexpected results. Here are
> .oindex/.vindex that might do what you actually want, but you can use
> .legacy_index if you just want to silence this warning". Rather than "Hey,
> this is going to go away at some point."
>

Yes, agreed. These will use a new warning class, perhaps
numpy.IndexingWarning.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/d3dd3582/attachment.html>

From robert.kern at gmail.com  Tue Jun 26 21:54:14 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 18:54:14 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
 <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
 <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>
Message-ID: <CAF6FJiuNStW_JDZht69v2p-iRr429ydUBUbMh7=box_4SG-sFQ@mail.gmail.com>

On Tue, Jun 26, 2018 at 6:47 PM Stephan Hoyer <shoyer at gmail.com> wrote:

>
> On Tue, Jun 26, 2018 at 6:39 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> I'd still prefer not talking deprecation, per se, in this NEP (but my
>> objection is weaker). I would definitely start adding in informative, noisy
>> warnings in these cases, though. Along the lines of, "Hey, this is a dodgy
>> construction that typically gives unexpected results. Here are
>> .oindex/.vindex that might do what you actually want, but you can use
>> .legacy_index if you just want to silence this warning". Rather than "Hey,
>> this is going to go away at some point."
>>
>
> Yes, agreed. These will use a new warning class, perhaps
> numpy.IndexingWarning.
>

Perfect.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/25c4124d/attachment.html>

From shoyer at gmail.com  Wed Jun 27 00:48:40 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 Jun 2018 21:48:40 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
Message-ID: <CAEQ_TvcsoOAG6p1u6rj+=Pxovw=dvuHvnfOWKZTJXmgCOX7BSg@mail.gmail.com>

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern at gmail.com> wrote:

> I think having more self-contained descriptions of the semantics of each
> of these would be a good idea. The current description of `.vindex` spends
> more time talking about what it doesn't do, compared to the other methods,
> than what it does.
>

Will do.


> I'm still leaning towards not warning on current, unproblematic common
> uses. It's unnecessary churn for currently working, understandable code. I
> would still reserve warnings and deprecation for the cases where the
> current behavior gives us something that no one wants. Those are the real
> traps that people need to be warned away from.
>
> If someone is mixing slices and integer indices, that's a really good sign
> that they thought indexing behaved in a different way (e.g. orthogonal
> indexing).
>

I agree, but I'm still not  entirely sure where to draw the line on
behavior that should issue a warning. Some options, in roughly descending
order of severity:
1. Warn if [] would give a different result than .oindex[]. This is the
current proposal in the NEP, but based on the feedback we should hold back
on it for now.
2. Warn if there is a mixture of arrays/slice objects in indices for [],
even implicitly (e.g., including arr[idx] when is equivalent to arr[idx,
:]). In this case, indices end up at the end both for legacy_index and
vindex, but arguably that is only a happy coincidence.
3. Warn if [] would give a different result from .vindex[]. This is a
little weaker than the previous condition, because arr[idx, :] or arr[idx,
...] would not give a warning. However, cases like arr[..., idx] or arr[:,
idx, :] would still start to give warnings, even though they are arguably
well defined according to either outer indexing (if idx.ndim == 1) or
legacy indexing (due to dimension reordering rules that will be omitted
from vindex).
4. Warn if there are multiple arrays/integer indices separated by a slice
object, e.g., arr[idx1, :, idx2]. This is the edge case that really trips
up users.

As I said in my other response, in the long term, I would prefer to either
(a) drop support for vectorized indexing in [] or (b) if we stick with
supporting vectorized indexing in [], at least ensure consistent dimension
ordering rules for [] and vindex[]. That would suggest using either my
proposed rule 2 or 3.

I also agree with you that anyone mixing slices and integers probably is
confused about how indexing works, at least in edge cases. But given the
lengths that legacy indexing goes to to support "outer indexing-like"
behavior in the common case of a single integer array and many slices, I am
hesitant to start warning in this case. The result of arr[..., idx, :] is
relatively easy to understand, even though it uses its own set of rules,
which happen to be more consistent with oindex[] than vindex[].

We certainly could make the conservative choice of only adopting 4 for now
and leaving further cleanup for later. I guess this uncertainty about
whether direct indexing should be more like vindex[] or oindex[] in the
long term is a good argument for holding off on other warnings for now. But
I think we are almost certainly going to want to make further
warnings/deprecations of some form.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/269ffae6/attachment.html>

From jni.soma at gmail.com  Wed Jun 27 01:19:58 2018
From: jni.soma at gmail.com (Juan Nunez-Iglesias)
Date: Wed, 27 Jun 2018 15:19:58 +1000
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJiuNStW_JDZht69v2p-iRr429ydUBUbMh7=box_4SG-sFQ@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
 <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
 <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>
 <CAF6FJiuNStW_JDZht69v2p-iRr429ydUBUbMh7=box_4SG-sFQ@mail.gmail.com>
Message-ID: <1530076798.4187189.1421749616.4AC83CBA@webmail.messagingengine.com>

Let me start by thanking Robert for articulating my viewpoints far
better than I could have done myself. I want to explicitly flag the
following statements for endorsement:
> *I would still reserve warnings and deprecation for the cases where
> the current behavior gives us something that no one wants. Those are
> the real traps that people need to be warned away from.*
> *In the post-NEP .oindex/.vindex order, everyone can get the behavior
> that they want. Your argument for deprecation is now just about what
> the default is, the semantics that get pride of place with the
> shortest spelling. I am sympathetic to the feeling like you wish you
> had a time machine to go fix a design with your new insight. But it
> seems to me that just changing which semantics are the default has
> relatively attenuated value while breaking compatibility for a
> fundamental feature of numpy has significant costs. Just introducing
> .oindex is the bulk of the value of this NEP. Everything else is
> window dressing.*
> *If someone is mixing slices and integer indices, that's a really good
> sign that they thought indexing behaved in a different way (e.g.
> orthogonal indexing).*
I would offer the exception of trailing slices to this statement,
though:
In [1]: from skimage import data
In [2]: astro = data.astronaut()
In [3]: astro.shape
Out[3]: (512, 512, 3)

In [4]: rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10])
In [5]: astro[rr, cc].shape
Out[5]: (4, 3)

In [6]: astro[rr, cc, :].shape
Out[6]: (4, 3)

This does exactly what I would expect.

Going back to the motivation for the NEP, I think this bit, emphasis
mine, is crucial:
>> the existing rules for advanced indexing with multiple array indices
>> are typically confusing to both new, **and in many cases even old,**
>> users of NumPy
I think it is ok for advanced indexing to be accessible to advanced
users. I remember that it took me quite a while to grok NumPy advanced
indexing, but once I did I just loved it.
I also like that this syntax translates perfectly from integer indices
to float coordinates in `ndimage.map_coordinates`.
> *I'll go on record as saying that array-likes should respond to `a[rr,
> cc]`, as in Juan's example, with the current behavior. And if they
> don't, they don't deserve to be operated on by skimage functions.**
*
(I don't think of us highly enough to use the word "deserve", but I
would say that we would hesitate to support arrays that don't use this
convention.)
> *They didn't get a new feature; they just have to run faster to stay
> in the same place.**
*
It is also probably true, as mentioned elsewhere, that we could go
through our entire codebase and append `.vidx` to every array indexing
op. Perhaps others on this list find this a reasonable request, but I
don't. Aside from the churn involved, it would make our codebase
significantly uglier and less readable.
I should also emphasise that NumPy is really *the* foundational project
for the entire Scientific Python ecosystem. Changing the meaning of []
should only be considered if it delivers an *extreme* benefit. Robert's
statement would apply to a stupid number of projects.
> *Once we have some experience with them for a year or three, then
> let's talk about deprecating parts of the current behavior and make a
> new NEP then if we want to go that route.**
*
:+10**6:

To Sebastian's comment:
> if we choose to not annoy you a little, we will
> have much less long term options which also includes such projects
> compatibility to new/current array-likes.
> So basically one point is: if we annoy scikit-image now, their code
> will work better for dask arrays in the future hopefully.

Let's get rid of the hopefully. Let NumPy implement .oindex and
.vindex. Let Dask arrays do the same. Let's have an announcement on
the scikit-image mailing list, "hey guys, if you switch all your
indexing operations to .vindex, suddenly all of your library works
with dask arrays!"
At that point, we have a value proposition on our hands. Currently, it
amounts to gambling with others' time.
To Stephan's options that were sent while I was composing this:
> Some options, in roughly descending order of severity:

I favour 4, or at the limit 3. (See use case above, which I would
argue is totally unsurprising.) I'm happy that option 1 appears to be
off the table.
Hameer,
> For libraries like Dask, XArray, pydata/sparse, XND, etc., it would be
> bad for them if there was continued use of ?weird? indexing behaviour
> (no warnings means more code written that?s? well? not exactly the
> best design).
Again, I think libraries should support the simple/not unintuitive
vindex cases. This is not bad design.
> *We don't know which of those futures are going to be true.
> Anecdatally, you want .oindex semantics most often; I would almost
> exclusively use .vindex. I don't know which of us is more
> representative.*
Same.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/5b3d0ba8/attachment-0001.html>

From robert.kern at gmail.com  Wed Jun 27 01:21:49 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 22:21:49 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAEQ_TvcsoOAG6p1u6rj+=Pxovw=dvuHvnfOWKZTJXmgCOX7BSg@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CAEQ_TvcsoOAG6p1u6rj+=Pxovw=dvuHvnfOWKZTJXmgCOX7BSg@mail.gmail.com>
Message-ID: <CAF6FJisQb79TC45g5bbcqT9yqykerM1BXG-eGtLpb2rJZuBwhw@mail.gmail.com>

On Tue, Jun 26, 2018 at 9:50 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> I think having more self-contained descriptions of the semantics of each
>> of these would be a good idea. The current description of `.vindex` spends
>> more time talking about what it doesn't do, compared to the other methods,
>> than what it does.
>>
>
> Will do.
>
>
>> I'm still leaning towards not warning on current, unproblematic common
>> uses. It's unnecessary churn for currently working, understandable code. I
>> would still reserve warnings and deprecation for the cases where the
>> current behavior gives us something that no one wants. Those are the real
>> traps that people need to be warned away from.
>>
>> If someone is mixing slices and integer indices, that's a really good
>> sign that they thought indexing behaved in a different way (e.g. orthogonal
>> indexing).
>>
>
> I agree, but I'm still not  entirely sure where to draw the line on
> behavior that should issue a warning. Some options, in roughly descending
> order of severity:
> 1. Warn if [] would give a different result than .oindex[]. This is the
> current proposal in the NEP, but based on the feedback we should hold back
> on it for now.
> 2. Warn if there is a mixture of arrays/slice objects in indices for [],
> even implicitly (e.g., including arr[idx] when is equivalent to arr[idx,
> :]). In this case, indices end up at the end both for legacy_index and
> vindex, but arguably that is only a happy coincidence.
>

I'd have to deep dive through my email archive to double check, but I'm
pretty sure this is intentional design, not coincidence. There is a
long-standing pattern of using the first axes as the "collection" axes when
the objects that we are concerned with are vectors or matrices or more. For
example, evaluate a scalar field on a grid in 3D space (nx, ny, nz), then
the gradient at those points is usually represented as (nx, ny, nz, 3). It
is desirable to be able to apply the same indices to the scalar grid and
the vector grid to select out the scalar and vector values at the same set
of points. It's why we implicitly tack on empty slices to the end of any
partial index tuple (e.g. with just integer scalars).

The current rules for mixing slices and integer array indices are possibly
the simplest way to effect this use case; it is the behaviors for the other
cases that are the unhappy coincidences.

3. Warn if [] would give a different result from .vindex[]. This is a
> little weaker than the previous condition, because arr[idx, :] or arr[idx,
> ...] would not give a warning. However, cases like arr[..., idx] or arr[:,
> idx, :] would still start to give warnings, even though they are arguably
> well defined according to either outer indexing (if idx.ndim == 1) or
> legacy indexing (due to dimension reordering rules that will be omitted
> from vindex).
> 4. Warn if there are multiple arrays/integer indices separated by a slice
> object, e.g., arr[idx1, :, idx2]. This is the edge case that really trips
> up users.
>
> As I said in my other response, in the long term, I would prefer to either
> (a) drop support for vectorized indexing in [] or (b) if we stick with
> supporting vectorized indexing in [], at least ensure consistent dimension
> ordering rules for [] and vindex[]. That would suggest using either my
> proposed rule 2 or 3.
>
> I also agree with you that anyone mixing slices and integers probably is
> confused about how indexing works, at least in edge cases. But given the
> lengths that legacy indexing goes to to support "outer indexing-like"
> behavior in the common case of a single integer array and many slices, I am
> hesitant to start warning in this case. The result of arr[..., idx, :] is
> relatively easy to understand, even though it uses its own set of rules,
> which happen to be more consistent with oindex[] than vindex[].
>
> We certainly could make the conservative choice of only adopting 4 for now
> and leaving further cleanup for later. I guess this uncertainty about
> whether direct indexing should be more like vindex[] or oindex[] in the
> long term is a good argument for holding off on other warnings for now. But
> I think we are almost certainly going to want to make further
> warnings/deprecations of some form.
>

I'd prefer 4, could be talked into 3, but any higher is not a good idea, I
don't think.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/9d63bda6/attachment.html>

From robert.kern at gmail.com  Wed Jun 27 01:26:44 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 26 Jun 2018 22:26:44 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <1530076798.4187189.1421749616.4AC83CBA@webmail.messagingengine.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
 <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
 <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>
 <CAF6FJiuNStW_JDZht69v2p-iRr429ydUBUbMh7=box_4SG-sFQ@mail.gmail.com>
 <1530076798.4187189.1421749616.4AC83CBA@webmail.messagingengine.com>
Message-ID: <CAF6FJitVx2cdK+R9CozOqztoT9aFkbJtXoeb92PkR0MB9w7KZg@mail.gmail.com>

On Tue, Jun 26, 2018 at 10:21 PM Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> Let me start by thanking Robert for articulating my viewpoints far better
> than I could have done myself. I want to explicitly flag the following
> statements for endorsement:
>
> *I would still reserve warnings and deprecation for the cases where the
> current behavior gives us something that no one wants. Those are the real
> traps that people need to be warned away from.*
>
>
> *In the post-NEP .oindex/.vindex order, everyone can get the behavior that
> they want. Your argument for deprecation is now just about what the default
> is, the semantics that get pride of place with the shortest spelling. I am
> sympathetic to the feeling like you wish you had a time machine to go fix a
> design with your new insight. But it seems to me that just changing which
> semantics are the default has relatively attenuated value while breaking
> compatibility for a fundamental feature of numpy has significant costs.
> Just introducing .oindex is the bulk of the value of this NEP. Everything
> else is window dressing.*
>
>
> *If someone is mixing slices and integer indices, that's a really good
> sign that they thought indexing behaved in a different way (e.g. orthogonal
> indexing).*
>
>
> I would offer the exception of trailing slices to this statement, though:
>
> In [1]: from skimage import data
> In [2]: astro = data.astronaut()
> In [3]: astro.shape
> Out[3]: (512, 512, 3)
>
> In [4]: rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10])
> In [5]: astro[rr, cc].shape
> Out[5]: (4, 3)
>
> In [6]: astro[rr, cc, :].shape
> Out[6]: (4, 3)
>
> This does exactly what I would expect.
>

Yup, sorry, I didn't mean those. I meant when there is an explicit slice in
between index arrays. (And maybe when index arrays follow slices; I'll need
to think more on that.)


> Going back to the motivation for the NEP, I think this bit, emphasis mine,
> is crucial:
>
> the existing rules for advanced indexing with multiple array indices are
> typically confusing to both new, **and in many cases even old,** users of
> NumPy
>
>
> I think it is ok for advanced indexing to be accessible to advanced users.
> I remember that it took me quite a while to grok NumPy advanced indexing,
> but once I did I just loved it.
>
> I also like that this syntax translates perfectly from integer indices to
> float coordinates in `ndimage.map_coordinates`.
>
> *I'll go on record as saying that array-likes should respond to `a[rr,
> cc]`, as in Juan's example, with the current behavior. And if they don't,
> they don't deserve to be operated on by skimage functions.*
>
>
> (I don't think of us highly enough to use the word "deserve", but I would
> say that we would hesitate to support arrays that don't use this
> convention.)
>

Ahem, yes, I was being provocative in a moment of weakness. May the
array-like authors forgive me.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/4cfaa12d/attachment-0001.html>

From shoyer at gmail.com  Wed Jun 27 01:34:30 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 Jun 2018 22:34:30 -0700
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJisQb79TC45g5bbcqT9yqykerM1BXG-eGtLpb2rJZuBwhw@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAF6FJivg51fvTAKZ4b+c6fY+hA7TmTnsq9rg++ubt71tBfGpgQ@mail.gmail.com>
 <CAEQ_TvcsoOAG6p1u6rj+=Pxovw=dvuHvnfOWKZTJXmgCOX7BSg@mail.gmail.com>
 <CAF6FJisQb79TC45g5bbcqT9yqykerM1BXG-eGtLpb2rJZuBwhw@mail.gmail.com>
Message-ID: <CAEQ_TvdeRFOkSKC7o=OLPsxPt7YRoENVhW=hDqtXttuq3mwJ9Q@mail.gmail.com>

On Tue, Jun 26, 2018 at 10:22 PM Robert Kern <robert.kern at gmail.com> wrote:

> We certainly could make the conservative choice of only adopting 4 for now
>> and leaving further cleanup for later. I guess this uncertainty about
>> whether direct indexing should be more like vindex[] or oindex[] in the
>> long term is a good argument for holding off on other warnings for now. But
>> I think we are almost certainly going to want to make further
>> warnings/deprecations of some form.
>>
>
> I'd prefer 4, could be talked into 3, but any higher is not a good idea, I
> don't think.
>

OK, I think 4 is the safe option for now.

Eventually, I want either 1 or 3. But:
- We don't agree yet on whether the right long-term solution would be for
[] to support vectorized indexing, outer indexing or neither.
- This will certainly cause some amount of churn, so let's save it for
later when vindex/oindex are widely used and libraries don't need to worry
about whether they're available or not they are available in all NumPy
versions they support.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/54a70706/attachment.html>

From shoyer at gmail.com  Wed Jun 27 01:48:59 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 27 Jun 2018 01:48:59 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
Message-ID: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>

After much discussion (and the addition of three new co-authors!), I?m
pleased to present a significantly revision of NumPy Enhancement Proposal
18: A dispatch mechanism for NumPy's high level array functions:
http://www.numpy.org/neps/nep-0018-array-function-protocol.html

The full text is also included below.

Best,
Stephan

===========================================================
A dispatch mechanism for NumPy's high level array functions
===========================================================

:Author: Stephan Hoyer <shoyer at google.com>
:Author: Matthew Rocklin <mrocklin at gmail.com>
:Author: Marten van Kerkwijk <mhvk at astro.utoronto.ca>
:Author: Hameer Abbasi <hameerabbasi at yahoo.com>
:Author: Eric Wieser <wieser.eric at gmail.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-29

Abstact
-------

We propose the ``__array_function__`` protocol, to allow arguments of NumPy
functions to define how that function operates on them. This will allow
using NumPy as a high level API for efficient multi-dimensional array
operations, even with array implementations that differ greatly from
``numpy.ndarray``.

Detailed description
--------------------

NumPy's high level ndarray API has been implemented several times
outside of NumPy itself for different architectures, such as for GPU
arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
arrays (Dask array) as well as various NumPy-like implementations in the
deep learning frameworks, like TensorFlow and PyTorch.

Similarly there are many projects that build on top of the NumPy API
for labeled and indexed arrays (XArray), automatic differentiation
(Autograd, Tangent), masked arrays (numpy.ma), physical units
(astropy.units,
pint, unyt), etc. that add additional functionality on top of the NumPy API.
Most of these project also implement a close variation of NumPy's level high
API.

We would like to be able to use these libraries together, for example we
would like to be able to place a CuPy array within XArray, or perform
automatic differentiation on Dask array code. This would be easier to
accomplish if code written for NumPy ndarrays could also be used by
other NumPy-like projects.

For example, we would like for the following code example to work
equally well with any NumPy-like array object:

.. code:: python

    def f(x):
        y = np.tensordot(x, x.T)
        return np.mean(np.exp(y))

Some of this is possible today with various protocol mechanisms within
NumPy.

-  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
-  The ``.T`` method works using Python's method dispatch
-  The ``np.mean`` function explicitly checks for a ``.mean`` method on
   the argument

However other functions, like ``np.tensordot`` do not dispatch, and
instead are likely to coerce to a NumPy array (using the ``__array__``)
protocol, or err outright. To achieve enough coverage of the NumPy API
to support downstream projects like XArray and autograd we want to
support *almost all* functions within NumPy, which calls for a more
reaching protocol than just ``__array_ufunc__``. We would like a
protocol that allows arguments of a NumPy function to take control and
divert execution to another function (for example a GPU or parallel
implementation) in a way that is safe and consistent across projects.

Implementation
--------------

We propose adding support for a new protocol in NumPy,
``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that
is not covered by the ``__array_ufunc__`` protocol for universal functions
(like ``np.exp``). The semantics are very similar to ``__array_ufunc__``,
except
the operation is specified by an arbitrary callable object rather than a
ufunc
instance and method.

A prototype implementation can be found in
`this notebook <
https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
>`_.

The interface
~~~~~~~~~~~~~

We propose the following signature for implementations of
``__array_function__``:

.. code-block:: python

    def __array_function__(self, func, types, args, kwargs)

-  ``func`` is an arbitrary callable exposed by NumPy's public API,
   which was called in the form ``func(*args, **kwargs)``.
-  ``types`` is a ``frozenset`` of unique argument types from the original
NumPy
   function call that implement ``__array_function__``.
-  The tuple ``args`` and dict ``kwargs`` are directly passed on from the
   original call.

Unlike ``__array_ufunc__``, there are no high-level guarantees about the
type of ``func``, or about which of ``args`` and ``kwargs`` may contain
objects
implementing the array API.

As a convenience for ``__array_function__`` implementors, ``types``
provides all
argument types with an ``'__array_function__'`` attribute. This
allows downstream implementations to quickly determine if they are likely
able
to support the operation. A ``frozenset`` is used to ensure that
``__array_function__`` implementations cannot rely on the iteration order of
``types``, which would facilitate violating the well-defined "Type casting
hierarchy" described in
`NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_.

Example for a project implementing the NumPy API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most implementations of ``__array_function__`` will start with two
checks:

1.  Is the given function something that we know how to overload?
2.  Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return
the result from calling its implementation for ``func(*args, **kwargs)``.
Otherwise, it should return the sentinel value ``NotImplemented``,
indicating
that the function is not implemented by these types. This is preferable to
raising ``TypeError`` directly, because it gives *other* arguments the
opportunity to define the operations.

There are no general requirements on the return value from
``__array_function__``, although most sensible implementations should
probably
return array(s) with the same type as one of the function's arguments.
If/when Python gains
`typing support for protocols <https://www.python.org/dev/peps/pep-0544/>`_
and NumPy adds static type annotations, the ``@overload`` implementation
for ``SupportsArrayFunction`` will indicate a return type of ``Any``.

It may also be convenient to define a custom decorators (``implements``
below)
for registering ``__array_function__`` implementations.

.. code:: python

    HANDLED_FUNCTIONS = {}

    class MyArray:
        def __array_function__(self, func, types, args, kwargs):
            if func not in HANDLED_FUNCTIONS:
                return NotImplemented
            # Note: this allows subclasses that don't override
            # __array_function__ to handle MyArray objects
            if not all(issubclass(t, MyArray) for t in types):
                return NotImplemented
            return HANDLED_FUNCTIONS[func](*args, **kwargs)

    def implements(numpy_function):
        """Register an __array_function__ implementation for MyArray
objects."""
        def decorator(func):
            HANDLED_FUNCTIONS[numpy_function] = func
            return func
        return decorator

    @implements(np.concatenate)
    def concatenate(arrays, axis=0, out=None):
        ...  # implementation of concatenate for MyArray objects

    @implements(np.broadcast_to)
    def broadcast_to(array, shape):
        ...  # implementation of broadcast_to for MyArray objects

Note that it is not required for ``__array_function__`` implementations to
include *all* of the corresponding NumPy function's optional arguments
(e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument).
Optional arguments are only passed in to ``__array_function__`` if they
were explicitly used in the NumPy function call.

Necessary changes within the NumPy codebase itself
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This will require two changes within the NumPy codebase:

1. A function to inspect available inputs, look for the
   ``__array_function__`` attribute on those inputs, and call those
   methods appropriately until one succeeds.  This needs to be fast in the
   common all-NumPy case, and have acceptable performance (no worse than
   linear time) even if the number of overloaded inputs is large (e.g.,
   as might be the case for `np.concatenate`).

   This is one additional function of moderate complexity.
2. Calling this function within all relevant NumPy functions.

   This affects many parts of the NumPy codebase, although with very low
   complexity.

Finding and calling the right ``__array_function__``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to
search through ``*args`` and ``**kwargs`` for all appropriate inputs
that might have the ``__array_function__`` attribute. Then we need to
select among those possible methods and execute the right one.
Negotiating between several possible implementations can be complex.

Finding arguments
'''''''''''''''''

Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
as in the case for ``np.tensordot(left, right, out=out)``, or they may
be nested within lists or dictionaries, such as in the case of
``np.concatenate([x, y, z])``. This can be problematic for two reasons:

1. Some functions are given long lists of values, and traversing them
   might be prohibitively expensive.
2. Some functions may have arguments that we don't want to inspect, even
   if they have the ``__array_function__`` method.

To resolve these issues, NumPy functions should explicitly indicate which
of their arguments may be overloaded, and how these arguments should be
checked. As a rule, this should include all arguments documented as either
``array_like`` or ``ndarray``.

We propose to do so by writing "dispatcher" functions for each overloaded
NumPy function:

- These functions will be called with the exact same arguments that were
passed
  into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and
should
  return an iterable of arguments to check for overrides.
- Dispatcher functions are required to share the exact same positional,
  optional and keyword-only arguments as their corresponding NumPy
functions.
  Otherwise, valid invocations of a NumPy function could result in an error
when
  calling its dispatcher.
- Because default *values* for keyword arguments do not have
  ``__array_function__`` attributes, by convention we set all default
argument
  values to ``None``. This reduces the likelihood of signatures falling out
  of sync, and minimizes extraneous information in the dispatcher.
  The only exception should be cases where the argument value in some way
  effects dispatching, which should be rare.

An example of the dispatcher for ``np.concatenate`` may be instructive:

.. code:: python

    def _concatenate_dispatcher(arrays, axis=None, out=None):
        for array in arrays:
            yield array
        if out is not None:
            yield out

The concatenate dispatcher is written as generator function, which allows it
to potentially include the value of the optional ``out`` argument without
needing to create a new sequence with the (potentially long) list of objects
to be concatenated.

Trying ``__array_function__`` methods until the right one works
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Many arguments may implement the ``__array_function__`` protocol. Some
of these may decide that, given the available inputs, they are unable to
determine the correct result. How do we call the right one? If several
are valid then which has precedence?

For the most part, the rules for dispatch with ``__array_function__``
match those for ``__array_ufunc__`` (see
`NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
In particular:

-  NumPy will gather implementations of ``__array_function__`` from all
   specified inputs and call them in order: subclasses before
   superclasses, and otherwise left to right. Note that in some edge cases
   involving subclasses, this differs slightly from the
   `current behavior <https://bugs.python.org/issue30140>`_ of Python.
-  Implementations of ``__array_function__`` indicate that they can
   handle the operation by returning any value other than
   ``NotImplemented``.
-  If all ``__array_function__`` methods return ``NotImplemented``,
   NumPy will raise ``TypeError``.

One deviation from the current behavior of ``__array_ufunc__`` is that NumPy
will only call ``__array_function__`` on the *first* argument of each unique
type. This matches Python's
`rule for calling reflected methods <
https://docs.python.org/3/reference/datamodel.html#object.__ror__>`_,
and this ensures that checking overloads has acceptable performance even
when
there are a large number of overloaded arguments. To avoid long-term
divergence
between these two dispatch protocols, we should
`also update <https://github.com/numpy/numpy/issues/11306>`_
``__array_ufunc__`` to match this behavior.

Special handling of ``numpy.ndarray``
'''''''''''''''''''''''''''''''''''''

The use cases for subclasses with ``__array_function__`` are the same as
those
with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:

.. code:: python

    def __array_function__(self, func, types, args, kwargs):
        # Cannot handle items that have __array_function__ other than our
own.
        for t in types:
            if (hasattr(t, '__array_function__') and
                    t.__array_function__ is not ndarray.__array_function__):
                return NotImplemented

        # Arguments contain no overrides, so we can safely call the
        # overloaded function again.
        return func(*args, **kwargs)

To avoid infinite recursion, the dispatch rules for ``__array_function__``
need
also the same special case they have for ``__array_ufunc__``: any arguments
with
an ``__array_function__`` method that is identical to
``numpy.ndarray.__array_function__`` are not be called as
``__array_function__`` implementations.

Changes within NumPy functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a function defining the above behavior, for now call it
``try_array_function_override``, we now need to call that function from
within every relevant NumPy function. This is a pervasive change, but of
fairly simple and innocuous code that should complete quickly and
without effect if no arguments implement the ``__array_function__``
protocol.

In most cases, these functions should written using the
``array_function_dispatch`` decorator, which also associates dispatcher
functions:

.. code:: python

    def array_function_dispatch(dispatcher):
        """Wrap a function for dispatch with the __array_function__
protocol."""
        def decorator(func):
            @functools.wraps(func)
            def new_func(*args, **kwargs):
                relevant_arguments = dispatcher(*args, **kwargs)
                success, value = try_array_function_override(
                    new_func, relevant_arguments, args, kwargs)
                if success:
                    return value
                return func(*args, **kwargs)
            return new_func
        return decorator

    # example usage
    def _broadcast_to_dispatcher(array, shape, subok=None,
**ignored_kwargs):
        return (array,)

    @array_function_dispatch(_broadcast_to_dispatcher)
    def broadcast_to(array, shape, subok=False):
        ...  # existing definition of np.broadcast_to

Using a decorator is great! We don't need to change the definitions of
existing NumPy functions, and only need to write a few additional lines
for the dispatcher function. We could even reuse a single dispatcher for
families of functions with the same signature (e.g., ``sum`` and ``prod``).
For such functions, the largest change could be adding a few lines to the
docstring to note which arguments are checked for overloads.

It's particularly worth calling out the decorator's use of
``functools.wraps``:

- This ensures that the wrapped function has the same name and docstring as
  the wrapped NumPy function.
- On Python 3, it also ensures that the decorator function copies the
original
  function signature, which is important for introspection based tools such
as
  auto-complete. If we care about preserving function signatures on Python
2,
  for the `short while longer <
http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_
  that NumPy supports Python 2.7, we do could do so by adding a vendored
  dependency on the (single-file, BSD licensed)
  `decorator library <https://github.com/micheles/decorator>`_.
- Finally, it ensures that the wrapped function
  `can be pickled <
http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html
>`_.

In a few cases, it would not make sense to use the
``array_function_dispatch``
decorator directly, but override implementation in terms of
``try_array_function_override`` should still be straightforward.

- Functions written entirely in C (e.g., ``np.concatenate``) can't use
  decorators, but they could still use a C equivalent of
  ``try_array_function_override``. If performance is not a concern, they
could
  also be easily wrapped with a small Python wrapper.
- The ``__call__`` method of ``np.vectorize`` can't be decorated with
<p style="margin:0px;font-stretch:normal;font-size:17.4px;line-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/eee19e0d/attachment-0001.html>

From einstein.edison at gmail.com  Wed Jun 27 02:27:06 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Wed, 27 Jun 2018 02:27:06 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
Message-ID: <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>

On 27. Jun 2018 at 07:48, Stephan Hoyer <shoyer at gmail.com> wrote:


After much discussion (and the addition of three new co-authors!), I?m
pleased to present a significantly revision of NumPy Enhancement Proposal
18: A dispatch mechanism for NumPy's high level array functions:
http://www.numpy.org/neps/nep-0018-array-function-protocol.html

The full text is also included below.

Best,
Stephan

===========================================================
A dispatch mechanism for NumPy's high level array functions
===========================================================

:Author: Stephan Hoyer <shoyer at google.com>
:Author: Matthew Rocklin <mrocklin at gmail.com>
:Author: Marten van Kerkwijk <mhvk at astro.utoronto.ca>
:Author: Hameer Abbasi <hameerabbasi at yahoo.com>
:Author: Eric Wieser <wieser.eric at gmail.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-29

Abstact
-------

We propose the ``__array_function__`` protocol, to allow arguments of NumPy
functions to define how that function operates on them. This will allow
using NumPy as a high level API for efficient multi-dimensional array
operations, even with array implementations that differ greatly from
``numpy.ndarray``.

Detailed description
--------------------

NumPy's high level ndarray API has been implemented several times
outside of NumPy itself for different architectures, such as for GPU
arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
arrays (Dask array) as well as various NumPy-like implementations in the
deep learning frameworks, like TensorFlow and PyTorch.

Similarly there are many projects that build on top of the NumPy API
for labeled and indexed arrays (XArray), automatic differentiation
(Autograd, Tangent), masked arrays (numpy.ma), physical units
(astropy.units,
pint, unyt), etc. that add additional functionality on top of the NumPy API.
Most of these project also implement a close variation of NumPy's level high
API.

We would like to be able to use these libraries together, for example we
would like to be able to place a CuPy array within XArray, or perform
automatic differentiation on Dask array code. This would be easier to
accomplish if code written for NumPy ndarrays could also be used by
other NumPy-like projects.

For example, we would like for the following code example to work
equally well with any NumPy-like array object:

.. code:: python

    def f(x):
        y = np.tensordot(x, x.T)
        return np.mean(np.exp(y))

Some of this is possible today with various protocol mechanisms within
NumPy.

-  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
-  The ``.T`` method works using Python's method dispatch
-  The ``np.mean`` function explicitly checks for a ``.mean`` method on
   the argument

However other functions, like ``np.tensordot`` do not dispatch, and
instead are likely to coerce to a NumPy array (using the ``__array__``)
protocol, or err outright. To achieve enough coverage of the NumPy API
to support downstream projects like XArray and autograd we want to
support *almost all* functions within NumPy, which calls for a more
reaching protocol than just ``__array_ufunc__``. We would like a
protocol that allows arguments of a NumPy function to take control and
divert execution to another function (for example a GPU or parallel
implementation) in a way that is safe and consistent across projects.

Implementation
--------------

We propose adding support for a new protocol in NumPy,
``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that
is not covered by the ``__array_ufunc__`` protocol for universal functions
(like ``np.exp``). The semantics are very similar to ``__array_ufunc__``,
except
the operation is specified by an arbitrary callable object rather than a
ufunc
instance and method.

A prototype implementation can be found in
`this notebook <
https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
>`_.

The interface
~~~~~~~~~~~~~

We propose the following signature for implementations of
``__array_function__``:

.. code-block:: python

    def __array_function__(self, func, types, args, kwargs)

-  ``func`` is an arbitrary callable exposed by NumPy's public API,
   which was called in the form ``func(*args, **kwargs)``.
-  ``types`` is a ``frozenset`` of unique argument types from the original
NumPy
   function call that implement ``__array_function__``.
-  The tuple ``args`` and dict ``kwargs`` are directly passed on from the
   original call.

Unlike ``__array_ufunc__``, there are no high-level guarantees about the
type of ``func``, or about which of ``args`` and ``kwargs`` may contain
objects
implementing the array API.

As a convenience for ``__array_function__`` implementors, ``types``
provides all
argument types with an ``'__array_function__'`` attribute. This
allows downstream implementations to quickly determine if they are likely
able
to support the operation. A ``frozenset`` is used to ensure that
``__array_function__`` implementations cannot rely on the iteration order of
``types``, which would facilitate violating the well-defined "Type casting
hierarchy" described in
`NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_.

Example for a project implementing the NumPy API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most implementations of ``__array_function__`` will start with two
checks:

1.  Is the given function something that we know how to overload?
2.  Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return
the result from calling its implementation for ``func(*args, **kwargs)``.
Otherwise, it should return the sentinel value ``NotImplemented``,
indicating
that the function is not implemented by these types. This is preferable to
raising ``TypeError`` directly, because it gives *other* arguments the
opportunity to define the operations.

There are no general requirements on the return value from
``__array_function__``, although most sensible implementations should
probably
return array(s) with the same type as one of the function's arguments.
If/when Python gains
`typing support for protocols <https://www.python.org/dev/peps/pep-0544/>`_
and NumPy adds static type annotations, the ``@overload`` implementation
for ``SupportsArrayFunction`` will indicate a return type of ``Any``.

It may also be convenient to define a custom decorators (``implements``
below)
for registering ``__array_function__`` implementations.

.. code:: python

    HANDLED_FUNCTIONS = {}

    class MyArray:
        def __array_function__(self, func, types, args, kwargs):
            if func not in HANDLED_FUNCTIONS:
                return NotImplemented
            # Note: this allows subclasses that don't override
            # __array_function__ to handle MyArray objects
            if not all(issubclass(t, MyArray) for t in types):
                return NotImplemented
            return HANDLED_FUNCTIONS[func](*args, **kwargs)

    def implements(numpy_function):
        """Register an __array_function__ implementation for MyArray
objects."""
        def decorator(func):
            HANDLED_FUNCTIONS[numpy_function] = func
            return func
        return decorator

    @implements(np.concatenate)
    def concatenate(arrays, axis=0, out=None):
        ...  # implementation of concatenate for MyArray objects

    @implements(np.broadcast_to)
    def broadcast_to(array, shape):
        ...  # implementation of broadcast_to for MyArray objects

Note that it is not required for ``__array_function__`` implementations to
include *all* of the corresponding NumPy function's optional arguments
(e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument).
Optional arguments are only passed in to ``__array_function__`` if they
were explicitly used in the NumPy function call.

Necessary changes within the NumPy codebase itself
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This will require two changes within the NumPy codebase:

1. A function to inspect available inputs, look for the
   ``__array_function__`` attribute on those inputs, and call those
   methods appropriately until one succeeds.  This needs to be fast in the
   common all-NumPy case, and have acceptable performance (no worse than
   linear time) even if the number of overloaded inputs is large (e.g.,
   as might be the case for `np.concatenate`).

   This is one additional function of moderate complexity.
2. Calling this function within all relevant NumPy functions.

   This affects many parts of the NumPy codebase, although with very low
   complexity.

Finding and calling the right ``__array_function__``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to
search through ``*args`` and ``**kwargs`` for all appropriate inputs
that might have the ``__array_function__`` attribute. Then we need to
select among those possible methods and execute the right one.
Negotiating between several possible implementations can be complex.

Finding arguments
'''''''''''''''''

Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
as in the case for ``np.tensordot(left, right, out=out)``, or they may
be nested within lists or dictionaries, such as in the case of
``np.concatenate([x, y, z])``. This can be problematic for two reasons:

1. Some functions are given long lists of values, and traversing them
   might be prohibitively expensive.
2. Some functions may have arguments that we don't want to inspect, even
   if they have the ``__array_function__`` method.

To resolve these issues, NumPy functions should explicitly indicate which
of their arguments may be overloaded, and how these arguments should be
checked. As a rule, this should include all arguments documented as either
``array_like`` or ``ndarray``.

We propose to do so by writing "dispatcher" functions for each overloaded
NumPy function:

- These functions will be called with the exact same arguments that were
passed
  into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and
should
  return an iterable of arguments to check for overrides.
- Dispatcher functions are required to share the exact same positional,
  optional and keyword-only arguments as their corresponding NumPy
functions.
  Otherwise, valid invocations of a NumPy function could result in an error
when
  calling its dispatcher.
- Because default *values* for keyword arguments do not have
  ``__array_function__`` attributes, by convention we set all default
argument
  values to ``None``. This reduces the likelihood of signatures falling out
  of sync, and minimizes extraneous information in the dispatcher.
  The only exception should be cases where the argument value in some way
  effects dispatching, which should be rare.

An example of the dispatcher for ``np.concatenate`` may be instructive:

.. code:: python

    def _concatenate_dispatcher(arrays, axis=None, out=None):
        for array in arrays:
            yield array
        if out is not None:
            yield out

The concatenate dispatcher is written as generator function, which allows it
to potentially include the value of the optional ``out`` argument without
needing to create a new sequence with the (potentially long) list of objects
to be concatenated.

Trying ``__array_function__`` methods until the right one works
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Many arguments may implement the ``__array_function__`` protocol. Some
of these may decide that, given the available inputs, they are unable to
determine the correct result. How do we call the right one? If several
are valid then which has precedence?

For the most part, the rules for dispatch with ``__array_function__``
match those for ``__array_ufunc__`` (see
`NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
In particular:

-  NumPy will gather implementations of ``__array_function__`` from all
   specified inputs and call them in order: subclasses before
   superclasses, and otherwise left to right. Note that in some edge cases
   involving subclasses, this differs slightly from the
   `current behavior <https://bugs.python.org/issue30140>`_ of Python.
-  Implementations of ``__array_function__`` indicate that they can
   handle the operation by returning any value other than
   ``NotImplemented``.
-  If all ``__array_function__`` methods return ``NotImplemented``,
   NumPy will raise ``TypeError``.

One deviation from the current behavior of ``__array_ufunc__`` is that NumPy
will only call ``__array_function__`` on the *first* argument of each unique
type. This matches Python's
`rule for calling reflected methods <
https://docs.python.org/3/reference/datamodel.html#object.__ror__>`_,
and this ensures that checking overloads has acceptable performance even
when
there are a large number of overloaded arguments. To avoid long-term
divergence
between these two dispatch protocols, we should
`also update <https://github.com/numpy/numpy/issues/11306>`_
``__array_ufunc__`` to match this behavior.

Special handling of ``numpy.ndarray``
'''''''''''''''''''''''''''''''''''''

The use cases for subclasses with ``__array_function__`` are the same as
those
with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:

.. code:: python

    def __array_function__(self, func, types, args, kwargs):
        # Cannot handle items that have __array_function__ other than our
own.
        for t in types:
            if (hasattr(t, '__array_function__') and
                    t.__array_function__ is not ndarray.__array_function__):
                return NotImplemented

        # Arguments contain no overrides, so we can safely call the
        # overloaded function again.
        return func(*args, **kwargs)

To avoid infinite recursion, the dispatch rules for ``__array_function__``
need
also the same special case they have for ``__array_ufunc__``: any arguments
with
an ``__array_function__`` method that is identical to
``numpy.ndarray.__array_function__`` are not be called as
``__array_function__`` implementations.

Changes within NumPy functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a function defining the above behavior, for now call it
``try_array_function_override``, we now need to call that function from
within every relevant NumPy function. This is a pervasive change, but of
fairly simple and innocuous code that should complete quickly and
without effect if no arguments implement the ``__array_function__``
protocol.

In most cases, these functions should written using the
``array_function_dispatch`` decorator, which also associates dispatcher
functions:

.. code:: python

    def array_function_dispatch(dispatcher):
        """Wrap a function for dispatch with the __array_function__
protocol."""
        def decorator(func):
            @functools.wraps(func)
            def new_func(*args, **kwargs):
                relevant_arguments = dispatcher(*args, **kwargs)
                success, value = try_array_function_override(
                    new_func, relevant_arguments, args, kwargs)
                if success:
                    return value
                return func(*args, **kwargs)
            return new_func
        return decorator

    # example usage
    def _broadcast_to_dispatcher(array, shape, subok=None,
**ignored_kwargs):
        return (array,)

    @array_function_dispatch(_broadcast_to_dispatcher)
    def broadcast_to(array, shape, subok=False):
        ...  # existing definition of np.broadcast_to

Using a decorator is great! We don't need to change the definitions of
existing NumPy functions, and only need to write a few additional lines
for the dispatcher function. We could even reuse a single dispatcher for
families of functions with the same signature (e.g., ``sum`` and ``prod``).
For such functions, the largest change could be adding a few lines to the
docstring to note which arguments are checked for overloads.

It's particularly worth calling out the decorator's use of
``functools.wraps``:

- This ensures that the wrapped function has the same name and docstring as
  the wrapped NumPy function.
- On Python 3, it also ensures that the decorator function copies the
original
  function signature, which is important for introspection based tools such
as
  auto-complete. If we care about preserving function signatures on Python
2,
  for the `short while longer <
http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_
  that NumPy supports Python 2.7, we do could do so by adding a vendored
  dependency on the (single-file, BSD licensed)
  `decorator library <https://github.com/micheles/decorator>`_.
- Finally, it ensures that the wrapped function
  `can be pickled <
http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html
>`_.

In a few cases, it would not make sense to use the
``array_function_dispatch``
decorator directly, but override implementation in terms of
``try_array_function_override`` should still be straightforward.

- Functions written entirely in C (e.g., ``np.concatenate``) can't use
  decorators, but they could still use a C equivalent of
  ``try_array_function_override``. If performance is not a concern, they
could
  also be easily wrapped with a small Python wrapper.
- The ``__call__`` method of ``np.vectorize`` can't be decorated with
<p style="margin:0px;font-stretch:normal;font-size:17.4px;line-

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


I would like to propose that we use `__array_function` in the following
manner for functions that create arrays:

   - `array_reference` for indicating the ?reference array? whose
   `__array_function__` implementation will be called. For example,
   `np.arange(5, array_reference=some_dask_array)`.
   - I use a reference in the design rather than a type because for some
   arrays (such as Dask), chunk sizes or other reference data is needed to
   make this work.


I realise that this is a big design decision, so I welcome any input!

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/f5266b6c/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun 27 02:45:44 2018
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 27 Jun 2018 08:45:44 +0200
Subject: [Numpy-discussion] NEP 21: Simplified and explicit advanced
 indexing
In-Reply-To: <CAF6FJitVx2cdK+R9CozOqztoT9aFkbJtXoeb92PkR0MB9w7KZg@mail.gmail.com>
References: <CAEQ_Tvd8wW_=xQvvxiJW3ewdEVCpQisS4LYFB-LYKpvvi31cSA@mail.gmail.com>
 <CAL1kJvA6CPwWS7Kh7iuYVMSeo--b__g=PWKhyjxsjeXpUZcOeA@mail.gmail.com>
 <1529994253.102107.1420482720.1911BC65@webmail.messagingengine.com>
 <CAAbtOZchCFfcUiaNq_wW58h9H3y60PWVaRTmdhhmapmLx-gvhQ@mail.gmail.com>
 <CAF6FJiui-_PvpmPgyVmJgHK2id6tihVGbxwEQH=dekBp9enuvg@mail.gmail.com>
 <CAL1kJvBuzE1mB+tgO5jAXpuP5=HETrU_tn1DOemwp+fSAKKOMQ@mail.gmail.com>
 <CAAbtOZc=U202TZswx0wuN_+r1EMSMBmFRGKPeqhLPiYvT462kQ@mail.gmail.com>
 <ea1a88fc1378f04484a9be75a7659539e41f4bda.camel@sipsolutions.net>
 <CAF6FJiu0vVu_VB5qrtFmQuZuQshcvkTDZVQLoCBqCZkFWSLWgw@mail.gmail.com>
 <3aa8387021a6367c7a6227e424226904631bce60.camel@sipsolutions.net>
 <CAF6FJis-tRPMELGZpVajUxmEZD4UKSZ3sutwqxMT-oxW6_05RA@mail.gmail.com>
 <c35d1b3d0800f0135cda9008b33c39817abbf75a.camel@sipsolutions.net>
 <CAF6FJitkx1=QRcB-cySjmRmMGjAB7dNMbTeEUKQg1o6aiUYUMw@mail.gmail.com>
 <CAEQ_Tve16bmi303NgUHtx6tZetFCTJbN-sc0wQVqcMUc6mQgkQ@mail.gmail.com>
 <CAF6FJitxkrfwAx1cK438jwUct7_2vMW_QDgTBV995Lb=z+Pb4w@mail.gmail.com>
 <CAEQ_Tvfw6eZun5G_uYzB4D1md2Cb6KxHYQZvr++0SaaDF7MrdA@mail.gmail.com>
 <CAF6FJiuNStW_JDZht69v2p-iRr429ydUBUbMh7=box_4SG-sFQ@mail.gmail.com>
 <1530076798.4187189.1421749616.4AC83CBA@webmail.messagingengine.com>
 <CAF6FJitVx2cdK+R9CozOqztoT9aFkbJtXoeb92PkR0MB9w7KZg@mail.gmail.com>
Message-ID: <18c5f741e637838bad7c940d10e5e187ce5dd99e.camel@sipsolutions.net>

On Tue, 2018-06-26 at 22:26 -0700, Robert Kern wrote:
> On Tue, Jun 26, 2018 at 10:21 PM Juan Nunez-Iglesias <jni.soma at gmail.
> com> wrote:
> > Let me start by thanking Robert for articulating my viewpoints far
> > better than I could have done myself. I want to explicitly flag the
> > following statements for endorsement:
> > 
> > > I would still reserve warnings and deprecation for the cases
> > > where the current behavior gives us something that no one wants.
> > > Those are the real traps that people need to be warned away from.
> > > In the post-NEP .oindex/.vindex order, everyone can get the
> > > behavior that they want. Your argument for deprecation is now
> > > just about what the default is, the semantics that get pride of
> > > place with the shortest spelling. I am sympathetic to the feeling
> > > like you wish you had a time machine to go fix a design with your
> > > new insight. But it seems to me that just changing which
> > > semantics are the default has relatively attenuated value while
> > > breaking compatibility for a fundamental feature of numpy has
> > > significant costs. Just introducing .oindex is the bulk of the
> > > value of this NEP. Everything else is window dressing.
> > > If someone is mixing slices and integer indices, that's a really
> > > good sign that they thought indexing behaved in a different way
> > > (e.g. orthogonal indexing).
> > 
> > I would offer the exception of trailing slices to this statement,
> > though:
> > 


OK, sounds fine to me, I see that we just can't start planning for a
possible long term future yet. I personally do not care really what the
warnings itself say for now (Deprecation or not), larger packages will
have to avoid them in any case though.
But I guess we have a consent on a certain amount of warnings (probably
will have to see how much they actually appear) and then can revisit in
a longer while.

- Sebastian


> > In [1]: from skimage import data
> > In [2]: astro = data.astronaut()
> > In [3]: astro.shape
> > Out[3]: (512, 512, 3)
> > 
> > In [4]: rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10])
> > In [5]: astro[rr, cc].shape
> > Out[5]: (4, 3)
> > 
> > In [6]: astro[rr, cc, :].shape
> > Out[6]: (4, 3)
> > 
> > This does exactly what I would expect.
> > 
> 
> Yup, sorry, I didn't mean those. I meant when there is an explicit
> slice in between index arrays. (And maybe when index arrays follow
> slices; I'll need to think more on that.)
>  
> > Going back to the motivation for the NEP, I think this bit,
> > emphasis mine, is crucial:
> > 
> > > > the existing rules for advanced indexing with multiple array
> > > > indices are typically confusing to both new, **and in many
> > > > cases even old,** users of NumPy
> > 
> > I think it is ok for advanced indexing to be accessible to advanced
> > users. I remember that it took me quite a while to grok NumPy
> > advanced indexing, but once I did I just loved it.
> > 
> > I also like that this syntax translates perfectly from integer
> > indices to float coordinates in `ndimage.map_coordinates`. 
> > 
> > > I'll go on record as saying that array-likes should respond to
> > > `a[rr, cc]`, as in Juan's example, with the current behavior. And
> > > if they don't, they don't deserve to be operated on by skimage
> > > functions.
> > 
> > (I don't think of us highly enough to use the word "deserve", but I
> > would say that we would hesitate to support arrays that don't use
> > this convention.)
> > 
> 
> Ahem, yes, I was being provocative in a moment of weakness. May the
> array-like authors forgive me.
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/9846401a/attachment.sig>

From m.h.vankerkwijk at gmail.com  Wed Jun 27 11:41:39 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 27 Jun 2018 11:41:39 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
Message-ID: <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>

Hi Hameer,

I'm confused: Isn't your reference array just `self`?
All the best,

Marten


On Wed, Jun 27, 2018 at 2:27 AM, Hameer Abbasi <einstein.edison at gmail.com>
wrote:

>
>
> On 27. Jun 2018 at 07:48, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>
> After much discussion (and the addition of three new co-authors!), I?m
> pleased to present a significantly revision of NumPy Enhancement Proposal
> 18: A dispatch mechanism for NumPy's high level array functions:
> http://www.numpy.org/neps/nep-0018-array-function-protocol.html
>
> The full text is also included below.
>
> Best,
> Stephan
>
> ===========================================================
> A dispatch mechanism for NumPy's high level array functions
> ===========================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Matthew Rocklin <mrocklin at gmail.com>
> :Author: Marten van Kerkwijk <mhvk at astro.utoronto.ca>
> :Author: Hameer Abbasi <hameerabbasi at yahoo.com>
> :Author: Eric Wieser <wieser.eric at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-29
>
> Abstact
> -------
>
> We propose the ``__array_function__`` protocol, to allow arguments of NumPy
> functions to define how that function operates on them. This will allow
> using NumPy as a high level API for efficient multi-dimensional array
> operations, even with array implementations that differ greatly from
> ``numpy.ndarray``.
>
> Detailed description
> --------------------
>
> NumPy's high level ndarray API has been implemented several times
> outside of NumPy itself for different architectures, such as for GPU
> arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
> arrays (Dask array) as well as various NumPy-like implementations in the
> deep learning frameworks, like TensorFlow and PyTorch.
>
> Similarly there are many projects that build on top of the NumPy API
> for labeled and indexed arrays (XArray), automatic differentiation
> (Autograd, Tangent), masked arrays (numpy.ma), physical units
> (astropy.units,
> pint, unyt), etc. that add additional functionality on top of the NumPy
> API.
> Most of these project also implement a close variation of NumPy's level
> high
> API.
>
> We would like to be able to use these libraries together, for example we
> would like to be able to place a CuPy array within XArray, or perform
> automatic differentiation on Dask array code. This would be easier to
> accomplish if code written for NumPy ndarrays could also be used by
> other NumPy-like projects.
>
> For example, we would like for the following code example to work
> equally well with any NumPy-like array object:
>
> .. code:: python
>
>     def f(x):
>         y = np.tensordot(x, x.T)
>         return np.mean(np.exp(y))
>
> Some of this is possible today with various protocol mechanisms within
> NumPy.
>
> -  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
> -  The ``.T`` method works using Python's method dispatch
> -  The ``np.mean`` function explicitly checks for a ``.mean`` method on
>    the argument
>
> However other functions, like ``np.tensordot`` do not dispatch, and
> instead are likely to coerce to a NumPy array (using the ``__array__``)
> protocol, or err outright. To achieve enough coverage of the NumPy API
> to support downstream projects like XArray and autograd we want to
> support *almost all* functions within NumPy, which calls for a more
> reaching protocol than just ``__array_ufunc__``. We would like a
> protocol that allows arguments of a NumPy function to take control and
> divert execution to another function (for example a GPU or parallel
> implementation) in a way that is safe and consistent across projects.
>
> Implementation
> --------------
>
> We propose adding support for a new protocol in NumPy,
> ``__array_function__``.
>
> This protocol is intended to be a catch-all for NumPy functionality that
> is not covered by the ``__array_ufunc__`` protocol for universal functions
> (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``,
> except
> the operation is specified by an arbitrary callable object rather than a
> ufunc
> instance and method.
>
> A prototype implementation can be found in
> `this notebook <https://nbviewer.jupyter.org/gist/shoyer/
> 1f0a308a06cd96df20879a1ddb8f0006>`_.
>
> The interface
> ~~~~~~~~~~~~~
>
> We propose the following signature for implementations of
> ``__array_function__``:
>
> .. code-block:: python
>
>     def __array_function__(self, func, types, args, kwargs)
>
> -  ``func`` is an arbitrary callable exposed by NumPy's public API,
>    which was called in the form ``func(*args, **kwargs)``.
> -  ``types`` is a ``frozenset`` of unique argument types from the original
> NumPy
>    function call that implement ``__array_function__``.
> -  The tuple ``args`` and dict ``kwargs`` are directly passed on from the
>    original call.
>
> Unlike ``__array_ufunc__``, there are no high-level guarantees about the
> type of ``func``, or about which of ``args`` and ``kwargs`` may contain
> objects
> implementing the array API.
>
> As a convenience for ``__array_function__`` implementors, ``types``
> provides all
> argument types with an ``'__array_function__'`` attribute. This
> allows downstream implementations to quickly determine if they are likely
> able
> to support the operation. A ``frozenset`` is used to ensure that
> ``__array_function__`` implementations cannot rely on the iteration order
> of
> ``types``, which would facilitate violating the well-defined "Type casting
> hierarchy" described in
> `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_.
>
> Example for a project implementing the NumPy API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Most implementations of ``__array_function__`` will start with two
> checks:
>
> 1.  Is the given function something that we know how to overload?
> 2.  Are all arguments of a type that we know how to handle?
>
> If these conditions hold, ``__array_function__`` should return
> the result from calling its implementation for ``func(*args, **kwargs)``.
> Otherwise, it should return the sentinel value ``NotImplemented``,
> indicating
> that the function is not implemented by these types. This is preferable to
> raising ``TypeError`` directly, because it gives *other* arguments the
> opportunity to define the operations.
>
> There are no general requirements on the return value from
> ``__array_function__``, although most sensible implementations should
> probably
> return array(s) with the same type as one of the function's arguments.
> If/when Python gains
> `typing support for protocols <https://www.python.org/dev/peps/pep-0544/
> >`_
> and NumPy adds static type annotations, the ``@overload`` implementation
> for ``SupportsArrayFunction`` will indicate a return type of ``Any``.
>
> It may also be convenient to define a custom decorators (``implements``
> below)
> for registering ``__array_function__`` implementations.
>
> .. code:: python
>
>     HANDLED_FUNCTIONS = {}
>
>     class MyArray:
>         def __array_function__(self, func, types, args, kwargs):
>             if func not in HANDLED_FUNCTIONS:
>                 return NotImplemented
>             # Note: this allows subclasses that don't override
>             # __array_function__ to handle MyArray objects
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return HANDLED_FUNCTIONS[func](*args, **kwargs)
>
>     def implements(numpy_function):
>         """Register an __array_function__ implementation for MyArray
> objects."""
>         def decorator(func):
>             HANDLED_FUNCTIONS[numpy_function] = func
>             return func
>         return decorator
>
>     @implements(np.concatenate)
>     def concatenate(arrays, axis=0, out=None):
>         ...  # implementation of concatenate for MyArray objects
>
>     @implements(np.broadcast_to)
>     def broadcast_to(array, shape):
>         ...  # implementation of broadcast_to for MyArray objects
>
> Note that it is not required for ``__array_function__`` implementations to
> include *all* of the corresponding NumPy function's optional arguments
> (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument).
> Optional arguments are only passed in to ``__array_function__`` if they
> were explicitly used in the NumPy function call.
>
> Necessary changes within the NumPy codebase itself
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This will require two changes within the NumPy codebase:
>
> 1. A function to inspect available inputs, look for the
>    ``__array_function__`` attribute on those inputs, and call those
>    methods appropriately until one succeeds.  This needs to be fast in the
>    common all-NumPy case, and have acceptable performance (no worse than
>    linear time) even if the number of overloaded inputs is large (e.g.,
>    as might be the case for `np.concatenate`).
>
>    This is one additional function of moderate complexity.
> 2. Calling this function within all relevant NumPy functions.
>
>    This affects many parts of the NumPy codebase, although with very low
>    complexity.
>
> Finding and calling the right ``__array_function__``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to
> search through ``*args`` and ``**kwargs`` for all appropriate inputs
> that might have the ``__array_function__`` attribute. Then we need to
> select among those possible methods and execute the right one.
> Negotiating between several possible implementations can be complex.
>
> Finding arguments
> '''''''''''''''''
>
> Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
> as in the case for ``np.tensordot(left, right, out=out)``, or they may
> be nested within lists or dictionaries, such as in the case of
> ``np.concatenate([x, y, z])``. This can be problematic for two reasons:
>
> 1. Some functions are given long lists of values, and traversing them
>    might be prohibitively expensive.
> 2. Some functions may have arguments that we don't want to inspect, even
>    if they have the ``__array_function__`` method.
>
> To resolve these issues, NumPy functions should explicitly indicate which
> of their arguments may be overloaded, and how these arguments should be
> checked. As a rule, this should include all arguments documented as either
> ``array_like`` or ``ndarray``.
>
> We propose to do so by writing "dispatcher" functions for each overloaded
> NumPy function:
>
> - These functions will be called with the exact same arguments that were
> passed
>   into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and
> should
>   return an iterable of arguments to check for overrides.
> - Dispatcher functions are required to share the exact same positional,
>   optional and keyword-only arguments as their corresponding NumPy
> functions.
>   Otherwise, valid invocations of a NumPy function could result in an
> error when
>   calling its dispatcher.
> - Because default *values* for keyword arguments do not have
>   ``__array_function__`` attributes, by convention we set all default
> argument
>   values to ``None``. This reduces the likelihood of signatures falling out
>   of sync, and minimizes extraneous information in the dispatcher.
>   The only exception should be cases where the argument value in some way
>   effects dispatching, which should be rare.
>
> An example of the dispatcher for ``np.concatenate`` may be instructive:
>
> .. code:: python
>
>     def _concatenate_dispatcher(arrays, axis=None, out=None):
>         for array in arrays:
>             yield array
>         if out is not None:
>             yield out
>
> The concatenate dispatcher is written as generator function, which allows
> it
> to potentially include the value of the optional ``out`` argument without
> needing to create a new sequence with the (potentially long) list of
> objects
> to be concatenated.
>
> Trying ``__array_function__`` methods until the right one works
> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Many arguments may implement the ``__array_function__`` protocol. Some
> of these may decide that, given the available inputs, they are unable to
> determine the correct result. How do we call the right one? If several
> are valid then which has precedence?
>
> For the most part, the rules for dispatch with ``__array_function__``
> match those for ``__array_ufunc__`` (see
> `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
> In particular:
>
> -  NumPy will gather implementations of ``__array_function__`` from all
>    specified inputs and call them in order: subclasses before
>    superclasses, and otherwise left to right. Note that in some edge cases
>    involving subclasses, this differs slightly from the
>    `current behavior <https://bugs.python.org/issue30140>`_ of Python.
> -  Implementations of ``__array_function__`` indicate that they can
>    handle the operation by returning any value other than
>    ``NotImplemented``.
> -  If all ``__array_function__`` methods return ``NotImplemented``,
>    NumPy will raise ``TypeError``.
>
> One deviation from the current behavior of ``__array_ufunc__`` is that
> NumPy
> will only call ``__array_function__`` on the *first* argument of each
> unique
> type. This matches Python's
> `rule for calling reflected methods <https://docs.python.org/3/
> reference/datamodel.html#object.__ror__>`_,
> and this ensures that checking overloads has acceptable performance even
> when
> there are a large number of overloaded arguments. To avoid long-term
> divergence
> between these two dispatch protocols, we should
> `also update <https://github.com/numpy/numpy/issues/11306>`_
> ``__array_ufunc__`` to match this behavior.
>
> Special handling of ``numpy.ndarray``
> '''''''''''''''''''''''''''''''''''''
>
> The use cases for subclasses with ``__array_function__`` are the same as
> those
> with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
> ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:
>
> .. code:: python
>
>     def __array_function__(self, func, types, args, kwargs):
>         # Cannot handle items that have __array_function__ other than our
> own.
>         for t in types:
>             if (hasattr(t, '__array_function__') and
>                     t.__array_function__ is not
> ndarray.__array_function__):
>                 return NotImplemented
>
>         # Arguments contain no overrides, so we can safely call the
>         # overloaded function again.
>         return func(*args, **kwargs)
>
> To avoid infinite recursion, the dispatch rules for ``__array_function__``
> need
> also the same special case they have for ``__array_ufunc__``: any
> arguments with
> an ``__array_function__`` method that is identical to
> ``numpy.ndarray.__array_function__`` are not be called as
> ``__array_function__`` implementations.
>
> Changes within NumPy functions
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a function defining the above behavior, for now call it
> ``try_array_function_override``, we now need to call that function from
> within every relevant NumPy function. This is a pervasive change, but of
> fairly simple and innocuous code that should complete quickly and
> without effect if no arguments implement the ``__array_function__``
> protocol.
>
> In most cases, these functions should written using the
> ``array_function_dispatch`` decorator, which also associates dispatcher
> functions:
>
> .. code:: python
>
>     def array_function_dispatch(dispatcher):
>         """Wrap a function for dispatch with the __array_function__
> protocol."""
>         def decorator(func):
>             @functools.wraps(func)
>             def new_func(*args, **kwargs):
>                 relevant_arguments = dispatcher(*args, **kwargs)
>                 success, value = try_array_function_override(
>                     new_func, relevant_arguments, args, kwargs)
>                 if success:
>                     return value
>                 return func(*args, **kwargs)
>             return new_func
>         return decorator
>
>     # example usage
>     def _broadcast_to_dispatcher(array, shape, subok=None,
> **ignored_kwargs):
>         return (array,)
>
>     @array_function_dispatch(_broadcast_to_dispatcher)
>     def broadcast_to(array, shape, subok=False):
>         ...  # existing definition of np.broadcast_to
>
> Using a decorator is great! We don't need to change the definitions of
> existing NumPy functions, and only need to write a few additional lines
> for the dispatcher function. We could even reuse a single dispatcher for
> families of functions with the same signature (e.g., ``sum`` and ``prod``).
> For such functions, the largest change could be adding a few lines to the
> docstring to note which arguments are checked for overloads.
>
> It's particularly worth calling out the decorator's use of
> ``functools.wraps``:
>
> - This ensures that the wrapped function has the same name and docstring as
>   the wrapped NumPy function.
> - On Python 3, it also ensures that the decorator function copies the
> original
>   function signature, which is important for introspection based tools
> such as
>   auto-complete. If we care about preserving function signatures on Python
> 2,
>   for the `short while longer <http://www.numpy.org/neps/
> nep-0014-dropping-python2.7-proposal.html>`_
>   that NumPy supports Python 2.7, we do could do so by adding a vendored
>   dependency on the (single-file, BSD licensed)
>   `decorator library <https://github.com/micheles/decorator>`_.
> - Finally, it ensures that the wrapped function
>   `can be pickled <http://gael-varoquaux.info/programming/decoration-in-
> python-done-right-decorating-and-pickling.html>`_.
>
> In a few cases, it would not make sense to use the
> ``array_function_dispatch``
> decorator directly, but override implementation in terms of
> ``try_array_function_override`` should still be straightforward.
>
> - Functions written entirely in C (e.g., ``np.concatenate``) can't use
>   decorators, but they could still use a C equivalent of
>   ``try_array_function_override``. If performance is not a concern, they
> could
>   also be easily wrapped with a small Python wrapper.
> - The ``__call__`` method of ``np.vectorize`` can't be decorated with
> <p style="margin:0px;font-stretch:normal;font-size:17.4px;line-
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> I would like to propose that we use `__array_function` in the following
> manner for functions that create arrays:
>
>    - `array_reference` for indicating the ?reference array? whose
>    `__array_function__` implementation will be called. For example,
>    `np.arange(5, array_reference=some_dask_array)`.
>    - I use a reference in the design rather than a type because for some
>    arrays (such as Dask), chunk sizes or other reference data is needed to
>    make this work.
>
>
> I realise that this is a big design decision, so I welcome any input!
>
> Best Regards,
> Hameer Abbasi
> Sent from Astro <https://www.helloastro.com> for Mac
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/d45cac21/attachment-0001.html>

From shoyer at gmail.com  Wed Jun 27 15:50:49 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 27 Jun 2018 12:50:49 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
Message-ID: <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>

On Tue, Jun 26, 2018 at 11:27 PM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> I would like to propose that we use `__array_function` in the following
> manner for functions that create arrays:
>
>    - `array_reference` for indicating the ?reference array? whose
>    `__array_function__` implementation will be called. For example,
>    `np.arange(5, array_reference=some_dask_array)`.
>    - I use a reference in the design rather than a type because for some
>    arrays (such as Dask), chunk sizes or other reference data is needed to
>    make this work.
>
>
> I realise that this is a big design decision, so I welcome any input!
>

These are somewhat similar to the existing ones_like, zeros_like and
full_like.

My inclination would be to consider adding new functions/methods for these
rather than a new argument, e.g., arange_like() and random_like(), which
could then use the standard __array_function__ dispatching mechanism. But
this is pretty orthogonal to the design of __array_function__ either way,
so I think we could safely defer this to another NEP (which could be pretty
short!).

One concern this does raise is how to handle methods like those on
RandomState, even though methods like random_like() don't currently exist.
Distribution objects from scipy.stats could have similar use cases.

So perhaps it's worth "future proofing" the interface by passing `obj` and
`method` to __array_function__ rather than only `func`. It is slower to
call a func via func.__call__ than func, but only very marginally (~100 ns
in my tests).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/bb7a8174/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 28 08:37:41 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 28 Jun 2018 08:37:41 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
Message-ID: <CAJNV+9tdS2h-YH4pcf0xyJvkGhG=ibjcY7y4u0HCyqCPtyodLQ@mail.gmail.com>

On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:


<snip>


> So perhaps it's worth "future proofing" the interface by passing `obj` and
> `method` to __array_function__ rather than only `func`. It is slower to
> call a func via func.__call__ than func, but only very marginally (~100 ns
> in my tests).
>

That would make it more similar yet to `__array_ufunc__`, but I'm not sure
how useful it is, as you cannot generically assume the methods have the
same arguments and hence they need their own dispatcher. Once you're there
you might as well pass them on directly (since any callable can be used as
the function). Indeed, for `__array_ufunc__`, this might not have been a
bad idea either...

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/3d71203f/attachment.html>

From einstein.edison at gmail.com  Thu Jun 28 08:46:02 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 28 Jun 2018 08:46:02 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAJNV+9tdS2h-YH4pcf0xyJvkGhG=ibjcY7y4u0HCyqCPtyodLQ@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAJNV+9tdS2h-YH4pcf0xyJvkGhG=ibjcY7y4u0HCyqCPtyodLQ@mail.gmail.com>
Message-ID: <CADViA5ArMSeOCJh01FJwjr_O7xnM=3c0MVoTNvtaPig=qmkZUw@mail.gmail.com>

I think the usefulness of this feature is actually needed. Consider
`np.random.RandomState`. If we were to add what I proposed, the two could
work very nicely to (for example) do things like creating Dask random
arrays, from RandomState objects.

For reproducibility, Dask could generate multiple RandomState objects with
a seed sequential in the job numbers.

Looping in Matt Rocklin for this ? He might have some input about the
design.

Best Regards,
Hameer Abbasi
Sent from Astro <https://www.helloastro.com> for Mac

On 28. Jun 2018 at 14:37, Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:


On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:


<snip>


> So perhaps it's worth "future proofing" the interface by passing `obj` and
> `method` to __array_function__ rather than only `func`. It is slower to
> call a func via func.__call__ than func, but only very marginally (~100 ns
> in my tests).
>

That would make it more similar yet to `__array_ufunc__`, but I'm not sure
how useful it is, as you cannot generically assume the methods have the
same arguments and hence they need their own dispatcher. Once you're there
you might as well pass them on directly (since any callable can be used as
the function). Indeed, for `__array_ufunc__`, this might not have been a
bad idea either...

-- Marten

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/513f5f52/attachment.html>

From shoyer at gmail.com  Thu Jun 28 14:04:19 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 28 Jun 2018 11:04:19 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
Message-ID: <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>

On Wed, Jun 27, 2018 at 12:50 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> One concern this does raise is how to handle methods like those on
> RandomState, even though methods like random_like() don't currently exist.
> Distribution objects from scipy.stats could have similar use cases.
>
> So perhaps it's worth "future proofing" the interface by passing `obj` and
> `method` to __array_function__ rather than only `func`. It is slower to
> call a func via func.__call__ than func, but only very marginally (~100 ns
> in my tests).
>

I did a little more digging, and turned up the __self__ and __func__
attributes of bound methods:
https://stackoverflow.com/questions/4679592/how-to-find-instance-of-a-bound-method-in-python

So we might need another decorator function, but it seems that the current
interface would actually suffice just fine for overriding methods. I'll
update the NEP with some examples. It will look something like:

def __array_function__(self, func, types, args, kwargs):
  ...
  if isinstance(func, types.MethodType):
    object = func.__self__
    unbound_func = func.__func__
    ...

Given that functions are the most common case, I think it's best to keep
with `func` as the main interface, but it's good to know that this does not
preclude overriding methods.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/c44a436f/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 28 16:11:07 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 28 Jun 2018 16:11:07 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
Message-ID: <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>

> I did a little more digging, and turned up the __self__ and __func__
> attributes of bound methods:
> https://stackoverflow.com/questions/4679592/how-to-find-
> instance-of-a-bound-method-in-python
>
> So we might need another decorator function, but it seems that the current
> interface would actually suffice just fine for overriding methods. I'll
> update the NEP with some examples. It will look something like:
>
> def __array_function__(self, func, types, args, kwargs):
>   ...
>   if isinstance(func, types.MethodType):
>     object = func.__self__
>     unbound_func = func.__func__
>     ...
>
>
For C classes like the ufuncs, it seems `__self__` is defined for methods
as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a
`__func__`. There is a `__name__` (="reduce"), though, which means that I
think one can still retrieve what is needed (obviously, this also means
`__array_ufunc__` could have been simpler...)

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/a13db381/attachment.html>

From shoyer at gmail.com  Thu Jun 28 20:18:28 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 28 Jun 2018 17:18:28 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
Message-ID: <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>

On Thu, Jun 28, 2018 at 1:12 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> For C classes like the ufuncs, it seems `__self__` is defined for methods
> as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a
> `__func__`. There is a `__name__` (="reduce"), though, which means that I
> think one can still retrieve what is needed (obviously, this also means
> `__array_ufunc__` could have been simpler...)
>

Good point!

I guess this means we should encourage using __name__ rather than __func__.
I would not want to preclude refactoring classes from Python to C/Cython.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/34ba362f/attachment.html>

From wieser.eric+numpy at gmail.com  Thu Jun 28 20:35:17 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Thu, 28 Jun 2018 17:35:17 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
 <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
Message-ID: <CAL1kJvDjhggML4YGF9O3wYFu-hJd57tDb3EitcsoRyjm=KkxKg@mail.gmail.com>

Another option would be to directly compare the methods against known ones:

obj = func.__self__
if isinstance(obj, np.ufunc):
    if func is obj.reduce:
        got_reduction()

Eric
?

On Thu, 28 Jun 2018 at 17:19 Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Jun 28, 2018 at 1:12 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> For C classes like the ufuncs, it seems `__self__` is defined for methods
>> as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a
>> `__func__`. There is a `__name__` (="reduce"), though, which means that I
>> think one can still retrieve what is needed (obviously, this also means
>> `__array_ufunc__` could have been simpler...)
>>
>
> Good point!
>
> I guess this means we should encourage using __name__ rather than
> __func__. I would not want to preclude refactoring classes from Python to
> C/Cython.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/966183f0/attachment-0001.html>

From matti.picus at gmail.com  Thu Jun 28 20:34:32 2018
From: matti.picus at gmail.com (Matti Picus)
Date: Thu, 28 Jun 2018 17:34:32 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
 <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
Message-ID: <d22f32df-179e-da64-6add-12ec5a83925f@gmail.com>


On 28/06/18 17:18, Stephan Hoyer wrote:
> On Thu, Jun 28, 2018 at 1:12 PM Marten van Kerkwijk 
> <m.h.vankerkwijk at gmail.com <mailto:m.h.vankerkwijk at gmail.com>> wrote:
>
>     For C classes like the ufuncs, it seems `__self__` is defined for
>     methods as well (at least, `np.add.reduce.__self__` gives
>     `np.add`), but not a `__func__`. There is a `__name__`
>     (="reduce"), though, which means that I think one can still
>     retrieve what is needed (obviously, this also means
>     `__array_ufunc__` could have been simpler...)
>
>
> Good point!
>
> I guess this means we should encourage using __name__ rather than 
> __func__. I would not want to preclude refactoring classes from Python 
> to C/Cython.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
There was opposition to that in a PR I made to provide a wrapper around 
matmul to turn it into a ufunc. It would have left the __name__ but 
changed the __func__.
https://github.com/numpy/numpy/pull/11061#issuecomment-387468084

From einstein.edison at gmail.com  Thu Jun 28 22:48:14 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 28 Jun 2018 22:48:14 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>
Message-ID: <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>

Hi Martin,

It is. The point of the proposed feature was to handle array generation
mechanisms, that don't take an array as input in the standard NumPy API.
Giving them a reference handles both the dispatch and the decision about
which implementation to call.

I'm confused: Isn't your reference array just `self`?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180628/d828480f/attachment.html>

From charlesr.harris at gmail.com  Fri Jun 29 13:15:59 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 29 Jun 2018 11:15:59 -0600
Subject: [Numpy-discussion] Github down on comcast
Message-ID: <CAB6mnxJQkmQNWNUBmRtUkouiVaEBMQB4cjVMgoQMzo-XX3O1ZA@mail.gmail.com>

Hi All,

Just a note for those who may be having a problem reaching Github, it is
currently down for comcast users. See
http://downdetector.com/status/github/map/.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/e064ca75/attachment.html>

From daniele at grinta.net  Fri Jun 29 15:30:10 2018
From: daniele at grinta.net (Daniele Nicolodi)
Date: Fri, 29 Jun 2018 13:30:10 -0600
Subject: [Numpy-discussion] Github down on comcast
In-Reply-To: <CAB6mnxJQkmQNWNUBmRtUkouiVaEBMQB4cjVMgoQMzo-XX3O1ZA@mail.gmail.com>
References: <CAB6mnxJQkmQNWNUBmRtUkouiVaEBMQB4cjVMgoQMzo-XX3O1ZA@mail.gmail.com>
Message-ID: <213a3cec-bfbe-7f6d-8335-3a6a6e34d229@grinta.net>

On 6/29/18 11:15 AM, Charles R Harris wrote:
> Hi All,
> 
> Just a note for those who may be having a problem reaching Github, it is
> currently down for comcast users.
> See?http://downdetector.com/status/github/map/.

Funny enough http://dowdetector.com seems to not be reachable from this
side of the Internet :-)

Cheers,
Dan


From njs at pobox.com  Fri Jun 29 18:18:20 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 29 Jun 2018 15:18:20 -0700
Subject: [Numpy-discussion] Proposal to accept NEP 15: Merging multiarray
 and umath
Message-ID: <CAPJVwB=c6V1p_nzK1=Bpf1mP2R2o3MnLFB50Xuv9-X-+SNKOjg@mail.gmail.com>

Hi all,

I propose that we accept NEP 15: Merging multiarray and umath:

http://www.numpy.org/neps/nep-0015-merge-multiarray-umath.html

The core part of this proposal was uncontroversial. The main point of
discussion was whether it was OK to deprecate set_numeric_ops, or
whether it had some legitimate use cases. The conclusion was that in
all the cases where set_numeric_ops is useful,
PyUFunc_ReplaceLoopBySignature is a strictly better alternative, so
there's no reason not to deprecate set_numeric_ops. So at this point I
think the whole proposal is uncontroversial, and we can go ahead and
accept it.

If there are no substantive objections within 7 days from this email,
then the NEP will be accepted; see NEP 0 for more details:
http://www.numpy.org/neps/nep-0000.html

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From njs at pobox.com  Fri Jun 29 18:23:05 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 29 Jun 2018 15:23:05 -0700
Subject: [Numpy-discussion] Proposal to accept NEP 15: Merging
 multiarray and umath
In-Reply-To: <CAPJVwB=c6V1p_nzK1=Bpf1mP2R2o3MnLFB50Xuv9-X-+SNKOjg@mail.gmail.com>
References: <CAPJVwB=c6V1p_nzK1=Bpf1mP2R2o3MnLFB50Xuv9-X-+SNKOjg@mail.gmail.com>
Message-ID: <CAPJVwBnfLeh6PYZvYHs0AY3BJ4Lq0f_3LNomXHpBYruMcqtw8Q@mail.gmail.com>

Note that this is the first formal proposal to accept a NEP using our
new process (yay!). While writing it I realized that the current text
about this in NEP 0 is a bit terse, so I've also just submitted a PR
to expand that section:

https://github.com/numpy/numpy/pull/11459

-n

On Fri, Jun 29, 2018 at 3:18 PM, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> I propose that we accept NEP 15: Merging multiarray and umath:
>
> http://www.numpy.org/neps/nep-0015-merge-multiarray-umath.html
>
> The core part of this proposal was uncontroversial. The main point of
> discussion was whether it was OK to deprecate set_numeric_ops, or
> whether it had some legitimate use cases. The conclusion was that in
> all the cases where set_numeric_ops is useful,
> PyUFunc_ReplaceLoopBySignature is a strictly better alternative, so
> there's no reason not to deprecate set_numeric_ops. So at this point I
> think the whole proposal is uncontroversial, and we can go ahead and
> accept it.
>
> If there are no substantive objections within 7 days from this email,
> then the NEP will be accepted; see NEP 0 for more details:
> http://www.numpy.org/neps/nep-0000.html
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org


-- 
Nathaniel J. Smith -- https://vorpus.org

From m.h.vankerkwijk at gmail.com  Fri Jun 29 18:28:03 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 29 Jun 2018 18:28:03 -0400
Subject: [Numpy-discussion] Proposal to accept NEP 15: Merging
 multiarray and umath
In-Reply-To: <CAPJVwBnfLeh6PYZvYHs0AY3BJ4Lq0f_3LNomXHpBYruMcqtw8Q@mail.gmail.com>
References: <CAPJVwB=c6V1p_nzK1=Bpf1mP2R2o3MnLFB50Xuv9-X-+SNKOjg@mail.gmail.com>
 <CAPJVwBnfLeh6PYZvYHs0AY3BJ4Lq0f_3LNomXHpBYruMcqtw8Q@mail.gmail.com>
Message-ID: <CAJNV+9uvaXGSSqYJYYcqXE-fgFV_8YRBR=ZLQdYBPzh93K3rgg@mail.gmail.com>

Agreed on accepting the NEP! But it is not the first proposal to accept
under the new rules - that goes to the broadcasting NEP (though perhaps I
wasn't sufficiently explicit in stating that I was starting a
count-down...). -- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/9daf01c9/attachment.html>

From charlesr.harris at gmail.com  Fri Jun 29 18:31:06 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 29 Jun 2018 16:31:06 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
 <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
Message-ID: <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>

On Tue, Jun 26, 2018 at 3:55 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com>
> wrote:
> > On 19/06/18 10:57, Matthew Brett wrote:
> >>
> >> Hi,
> >>
> >> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com>
> >> wrote:
> >>>
> >>> On 19/06/18 09:58, Charles R Harris wrote:
> >>>>>
> >>>>> What I was curious about is that there were no more "daily" builds of
> >>>>> master.
> >>>>
> >>>> Is that right?  That there were daily builds of master, on Appveyor?
> >>>> I don't know how those worked, I only recently got cron permission ...
> >>>
> >>>
> >>> No, but there used to be daily builds on travis. They stopped 8 days
> ago,
> >>> https://travis-ci.org/MacPython/numpy-wheels/builds.
> >>
> >> Oops - yes - sorry - I retired the 'daily' branch, in favor of
> >> 'master', but forgot to update the Travis-CI settings.
> >>
> >> Done now.
> >>
> >> Cheers,
> >>
> >> Matthew
> >>
> > FWIW, still no daily builds at
> > https://travis-ci.org/MacPython/numpy-wheels/builds
>
> You mean, some days there appears to be no build?  The build matrix
> does show Cron-triggered jobs, the last of which was a few hours ago:
> https://travis-ci.org/MacPython/numpy-wheels/builds/397008012
>
> Cheers,
>
> Matthew
>

The cron wheels are getting built and tested, but they aren't uploading to
rackspace.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/5948bd99/attachment-0001.html>

From matthew.brett at gmail.com  Fri Jun 29 18:35:11 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Fri, 29 Jun 2018 23:35:11 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
 <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
 <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>
Message-ID: <CAH6Pt5rH7hU8BUi0yOddB9goXxUCAjdfvAqMJMH-naG9WVGnrA@mail.gmail.com>

On Fri, Jun 29, 2018 at 11:31 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Jun 26, 2018 at 3:55 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com>
>> wrote:
>> > On 19/06/18 10:57, Matthew Brett wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com>
>> >> wrote:
>> >>>
>> >>> On 19/06/18 09:58, Charles R Harris wrote:
>> >>>>>
>> >>>>> What I was curious about is that there were no more "daily" builds
>> >>>>> of
>> >>>>> master.
>> >>>>
>> >>>> Is that right?  That there were daily builds of master, on Appveyor?
>> >>>> I don't know how those worked, I only recently got cron permission
>> >>>> ...
>> >>>
>> >>>
>> >>> No, but there used to be daily builds on travis. They stopped 8 days
>> >>> ago,
>> >>> https://travis-ci.org/MacPython/numpy-wheels/builds.
>> >>
>> >> Oops - yes - sorry - I retired the 'daily' branch, in favor of
>> >> 'master', but forgot to update the Travis-CI settings.
>> >>
>> >> Done now.
>> >>
>> >> Cheers,
>> >>
>> >> Matthew
>> >>
>> > FWIW, still no daily builds at
>> > https://travis-ci.org/MacPython/numpy-wheels/builds
>>
>> You mean, some days there appears to be no build?  The build matrix
>> does show Cron-triggered jobs, the last of which was a few hours ago:
>> https://travis-ci.org/MacPython/numpy-wheels/builds/397008012
>>
>> Cheers,
>>
>> Matthew
>
>
> The cron wheels are getting built and tested, but they aren't uploading to
> rackspace.

The cron wheels go to the "pre" container at
https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com

Cheers,

Matthew

From njs at pobox.com  Fri Jun 29 19:50:11 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 29 Jun 2018 16:50:11 -0700
Subject: [Numpy-discussion] Proposal to accept NEP 15: Merging
 multiarray and umath
In-Reply-To: <CAJNV+9uvaXGSSqYJYYcqXE-fgFV_8YRBR=ZLQdYBPzh93K3rgg@mail.gmail.com>
References: <CAPJVwB=c6V1p_nzK1=Bpf1mP2R2o3MnLFB50Xuv9-X-+SNKOjg@mail.gmail.com>
 <CAPJVwBnfLeh6PYZvYHs0AY3BJ4Lq0f_3LNomXHpBYruMcqtw8Q@mail.gmail.com>
 <CAJNV+9uvaXGSSqYJYYcqXE-fgFV_8YRBR=ZLQdYBPzh93K3rgg@mail.gmail.com>
Message-ID: <CAPJVwB=kQjPqh55=xq4G27SV8F23aDD04kPSoHA0idc4U5tuvw@mail.gmail.com>

On Fri, Jun 29, 2018 at 3:28 PM, Marten van Kerkwijk
<m.h.vankerkwijk at gmail.com> wrote:
> Agreed on accepting the NEP! But it is not the first proposal to accept
> under the new rules - that goes to the broadcasting NEP (though perhaps I
> wasn't sufficiently explicit in stating that I was starting a
> count-down...). -- Marten

Oh sorry, I missed that! (Which I guess is some evidence in favor of
starting a new thread :-).)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From charlesr.harris at gmail.com  Fri Jun 29 19:36:53 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 29 Jun 2018 17:36:53 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5rH7hU8BUi0yOddB9goXxUCAjdfvAqMJMH-naG9WVGnrA@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
 <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
 <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>
 <CAH6Pt5rH7hU8BUi0yOddB9goXxUCAjdfvAqMJMH-naG9WVGnrA@mail.gmail.com>
Message-ID: <CAB6mnxLGhrM0d02skBxb35bgmpPr6Y6vbw7E8N-0KVNN=JPPjw@mail.gmail.com>

On Fri, Jun 29, 2018 at 4:35 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> On Fri, Jun 29, 2018 at 11:31 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Tue, Jun 26, 2018 at 3:55 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com>
> >> wrote:
> >> > On 19/06/18 10:57, Matthew Brett wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> On 19/06/18 09:58, Charles R Harris wrote:
> >> >>>>>
> >> >>>>> What I was curious about is that there were no more "daily" builds
> >> >>>>> of
> >> >>>>> master.
> >> >>>>
> >> >>>> Is that right?  That there were daily builds of master, on
> Appveyor?
> >> >>>> I don't know how those worked, I only recently got cron permission
> >> >>>> ...
> >> >>>
> >> >>>
> >> >>> No, but there used to be daily builds on travis. They stopped 8 days
> >> >>> ago,
> >> >>> https://travis-ci.org/MacPython/numpy-wheels/builds.
> >> >>
> >> >> Oops - yes - sorry - I retired the 'daily' branch, in favor of
> >> >> 'master', but forgot to update the Travis-CI settings.
> >> >>
> >> >> Done now.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Matthew
> >> >>
> >> > FWIW, still no daily builds at
> >> > https://travis-ci.org/MacPython/numpy-wheels/builds
> >>
> >> You mean, some days there appears to be no build?  The build matrix
> >> does show Cron-triggered jobs, the last of which was a few hours ago:
> >> https://travis-ci.org/MacPython/numpy-wheels/builds/397008012
> >>
> >> Cheers,
> >>
> >> Matthew
> >
> >
> > The cron wheels are getting built and tested, but they aren't uploading
> to
> > rackspace.
>
> The cron wheels go to the "pre" container at
> https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a
> 83.ssl.cf2.rackcdn.com
>
>
Ah, there they are ... except ... you cancelled the builds I was waiting
for :) I was building wheels so we could have folks test the DLL load
problem, which I'm pretty sure if fixed anyway, so I suppose waiting on the
daily isn't a big a deal.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/7c73aa21/attachment.html>

From shoyer at gmail.com  Fri Jun 29 21:23:15 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Fri, 29 Jun 2018 18:23:15 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAL1kJvDjhggML4YGF9O3wYFu-hJd57tDb3EitcsoRyjm=KkxKg@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
 <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
 <CAL1kJvDjhggML4YGF9O3wYFu-hJd57tDb3EitcsoRyjm=KkxKg@mail.gmail.com>
Message-ID: <CAEQ_TvfEVOUBgSx1D0eirCXzHYWeVkPAhVojujsOE_qi0T1DDg@mail.gmail.com>

On Thu, Jun 28, 2018 at 5:36 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Another option would be to directly compare the methods against known ones:
>
> obj = func.__self__
> if isinstance(obj, np.ufunc):
>     if func is obj.reduce:
>         got_reduction()
>
> I'm not quite sure why, but this doesn't seem to work with current ufunc
objects:

>>> np.add.reduce == np.add.reduce  # OK
True

>>> np.add.reduce is np.add.reduce  # what?!?
False

Maybe this is a bug? There's been some somewhat related discussion recently
on python-dev:
https://mail.python.org/pipermail/python-dev/2018-June/153959.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/c41f6720/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Fri Jun 29 21:54:38 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Fri, 29 Jun 2018 18:54:38 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvfEVOUBgSx1D0eirCXzHYWeVkPAhVojujsOE_qi0T1DDg@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
 <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
 <CAL1kJvDjhggML4YGF9O3wYFu-hJd57tDb3EitcsoRyjm=KkxKg@mail.gmail.com>
 <CAEQ_TvfEVOUBgSx1D0eirCXzHYWeVkPAhVojujsOE_qi0T1DDg@mail.gmail.com>
Message-ID: <CAL1kJvDSb5Jw=OLLESkyh6zomGkdFxTpjDKH9UO+Kft0e574AA@mail.gmail.com>

Good catch,

I think the latter failing is because np.add.reduce ends up calling
np.ufunc.reduce.__get__(np.add), and builtin_function.__get__ doesn?t
appear to do any caching. I suppose caching bound methods would just be a
waste of time.
== would work just fine in my suggestion above, it seems - irrespective of
the resolution of the discussion on python-dev.

Eric
?

On Fri, 29 Jun 2018 at 18:24 Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Jun 28, 2018 at 5:36 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Another option would be to directly compare the methods against known
>> ones:
>>
>> obj = func.__self__
>> if isinstance(obj, np.ufunc):
>>     if func is obj.reduce:
>>         got_reduction()
>>
>> I'm not quite sure why, but this doesn't seem to work with current ufunc
> objects:
>
> >>> np.add.reduce == np.add.reduce  # OK
> True
>
> >>> np.add.reduce is np.add.reduce  # what?!?
> False
>
> Maybe this is a bug? There's been some somewhat related discussion
> recently on python-dev:
> https://mail.python.org/pipermail/python-dev/2018-June/153959.html
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/6a26daa3/attachment.html>

From maifer at haverford.edu  Fri Jun 29 22:21:18 2018
From: maifer at haverford.edu (Maxwell Aifer)
Date: Fri, 29 Jun 2018 22:21:18 -0400
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
Message-ID: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>

Hi,
I noticed some frustrating inconsistencies in the various ways to evaluate
polynomials using numpy. Numpy has three ways of evaluating polynomials
(that I know of) and each of them has a different syntax:

   -

   numpy.polynomial.polynomial.Polynomial
   <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
   You define a polynomial by a list of coefficients *in order of
   increasing degree*, and then use the class?s call() function.
   -

   np.polyval
   <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
   Evaluates a polynomial at a point. *First* argument is the polynomial,
   or list of coefficients *in order of decreasing degree*, and the *second*
   argument is the point to evaluate at.
   -

   np.polynomial.polynomial.polyval
   <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
   Also evaluates a polynomial at a point, but has more support for
   vectorization. *First* argument is the point to evaluate at, and *second*
   argument the list of coefficients *in order of increasing degree*.

Not only the order of arguments is changed between different methods, but
the order of the coefficients is reversed as well, leading to puzzling bugs
(in my experience). What could be the reason for this madness? As polyval
is a shameless ripoff of Matlab?s function of the same name
<https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why not
just use matlab?s syntax (polyval([c0, c1, c2...], x)) across the board?
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/fad9ba77/attachment.html>

From charlesr.harris at gmail.com  Fri Jun 29 23:10:16 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 29 Jun 2018 21:10:16 -0600
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
Message-ID: <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>

On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer <maifer at haverford.edu> wrote:

> Hi,
> I noticed some frustrating inconsistencies in the various ways to evaluate
> polynomials using numpy. Numpy has three ways of evaluating polynomials
> (that I know of) and each of them has a different syntax:
>
>    -
>
>    numpy.polynomial.polynomial.Polynomial
>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
>    You define a polynomial by a list of coefficients *in order of
>    increasing degree*, and then use the class?s call() function.
>    -
>
>    np.polyval
>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
>    Evaluates a polynomial at a point. *First* argument is the polynomial,
>    or list of coefficients *in order of decreasing degree*, and the
>    *second* argument is the point to evaluate at.
>    -
>
>    np.polynomial.polynomial.polyval
>    <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
>    Also evaluates a polynomial at a point, but has more support for
>    vectorization. *First* argument is the point to evaluate at, and
>    *second* argument the list of coefficients *in order of increasing
>    degree*.
>
> Not only the order of arguments is changed between different methods, but
> the order of the coefficients is reversed as well, leading to puzzling bugs
> (in my experience). What could be the reason for this madness? As polyval
> is a shameless ripoff of Matlab?s function of the same name
> <https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why not
> just use matlab?s syntax (polyval([c0, c1, c2...], x)) across the board?
> ?
>
>
The polynomial package, with its various basis, deals with series, and
especially with the truncated series approximations that are used in
numerical work. Series are universally written in increasing order of the
degree. The Polynomial class is efficient in a single variable, while the
numpy.polynomial.polynomial.polyval function is intended as a building
block and can also deal with multivariate polynomials or multidimensional
arrays of polynomials, or a mix. See the simple implementation of polyval3d
for an example. If you are just dealing with a single variable, use
Polynomial, which will also track scaling and offsets for numerical
stability and is generally much superior to the simple polyval function
from a numerical point of view.

As to the ordering of the degrees, learning that the degree matches the
index is pretty easy and is a more natural fit for the implementation code,
especially as the number of variables increases. I note that Matlab has
ones based indexing, so that was really not an option for them.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/e5a8cf73/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Fri Jun 29 23:23:48 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Fri, 29 Jun 2018 20:23:48 -0700
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
Message-ID: <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>

Here's my take on this, but it may not be an accurate summary of the
history.

`np.poly<func>` is part of the original matlab-style API, built around
`poly1d` objects. This isn't a great design, because they represent:

    p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0

For this reason, among others, the `np.polynomial` module was created,
starting with a clean slate. The core of this is
`np.polynomial.Polynomial`. There, everything uses the convention

    p(x) = c[0] * x^0 + c[1] * x^1 + c[2] * x^2

It sounds like we might need clearer docs explaining the difference, and
pointing users to the more sensible `np.polynomial.Polynomial`

Eric


On Fri, 29 Jun 2018 at 20:10 Charles R Harris <charlesr.harris at gmail.com>
wrote:

> On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer <maifer at haverford.edu>
> wrote:
>
>> Hi,
>> I noticed some frustrating inconsistencies in the various ways to
>> evaluate polynomials using numpy. Numpy has three ways of evaluating
>> polynomials (that I know of) and each of them has a different syntax:
>>
>>    -
>>
>>    numpy.polynomial.polynomial.Polynomial
>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
>>    You define a polynomial by a list of coefficients *in order of
>>    increasing degree*, and then use the class?s call() function.
>>    -
>>
>>    np.polyval
>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
>>    Evaluates a polynomial at a point. *First* argument is the
>>    polynomial, or list of coefficients *in order of decreasing degree*,
>>    and the *second* argument is the point to evaluate at.
>>    -
>>
>>    np.polynomial.polynomial.polyval
>>    <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
>>    Also evaluates a polynomial at a point, but has more support for
>>    vectorization. *First* argument is the point to evaluate at, and
>>    *second* argument the list of coefficients *in order of increasing
>>    degree*.
>>
>> Not only the order of arguments is changed between different methods, but
>> the order of the coefficients is reversed as well, leading to puzzling bugs
>> (in my experience). What could be the reason for this madness? As polyval
>> is a shameless ripoff of Matlab?s function of the same name
>> <https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why not
>> just use matlab?s syntax (polyval([c0, c1, c2...], x)) across the board?
>> ?
>>
>>
> The polynomial package, with its various basis, deals with series, and
> especially with the truncated series approximations that are used in
> numerical work. Series are universally written in increasing order of the
> degree. The Polynomial class is efficient in a single variable, while the
> numpy.polynomial.polynomial.polyval function is intended as a building
> block and can also deal with multivariate polynomials or multidimensional
> arrays of polynomials, or a mix. See the simple implementation of polyval3d
> for an example. If you are just dealing with a single variable, use
> Polynomial, which will also track scaling and offsets for numerical
> stability and is generally much superior to the simple polyval function
> from a numerical point of view.
>
> As to the ordering of the degrees, learning that the degree matches the
> index is pretty easy and is a more natural fit for the implementation code,
> especially as the number of variables increases. I note that Matlab has
> ones based indexing, so that was really not an option for them.
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180629/82330581/attachment.html>

From matthew.brett at gmail.com  Sat Jun 30 04:32:10 2018
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Jun 2018 09:32:10 +0100
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAB6mnxLGhrM0d02skBxb35bgmpPr6Y6vbw7E8N-0KVNN=JPPjw@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
 <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
 <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>
 <CAH6Pt5rH7hU8BUi0yOddB9goXxUCAjdfvAqMJMH-naG9WVGnrA@mail.gmail.com>
 <CAB6mnxLGhrM0d02skBxb35bgmpPr6Y6vbw7E8N-0KVNN=JPPjw@mail.gmail.com>
Message-ID: <CAH6Pt5pbKddN1Wiju-1UZXK-+OO=1ZoP2vr=0-TVgqsfmg0Cyg@mail.gmail.com>

On Sat, Jun 30, 2018 at 12:36 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Jun 29, 2018 at 4:35 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> On Fri, Jun 29, 2018 at 11:31 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Tue, Jun 26, 2018 at 3:55 PM, Matthew Brett <matthew.brett at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com>
>> >> wrote:
>> >> > On 19/06/18 10:57, Matthew Brett wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <matti.picus at gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> On 19/06/18 09:58, Charles R Harris wrote:
>> >> >>>>>
>> >> >>>>> What I was curious about is that there were no more "daily"
>> >> >>>>> builds
>> >> >>>>> of
>> >> >>>>> master.
>> >> >>>>
>> >> >>>> Is that right?  That there were daily builds of master, on
>> >> >>>> Appveyor?
>> >> >>>> I don't know how those worked, I only recently got cron permission
>> >> >>>> ...
>> >> >>>
>> >> >>>
>> >> >>> No, but there used to be daily builds on travis. They stopped 8
>> >> >>> days
>> >> >>> ago,
>> >> >>> https://travis-ci.org/MacPython/numpy-wheels/builds.
>> >> >>
>> >> >> Oops - yes - sorry - I retired the 'daily' branch, in favor of
>> >> >> 'master', but forgot to update the Travis-CI settings.
>> >> >>
>> >> >> Done now.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Matthew
>> >> >>
>> >> > FWIW, still no daily builds at
>> >> > https://travis-ci.org/MacPython/numpy-wheels/builds
>> >>
>> >> You mean, some days there appears to be no build?  The build matrix
>> >> does show Cron-triggered jobs, the last of which was a few hours ago:
>> >> https://travis-ci.org/MacPython/numpy-wheels/builds/397008012
>> >>
>> >> Cheers,
>> >>
>> >> Matthew
>> >
>> >
>> > The cron wheels are getting built and tested, but they aren't uploading
>> > to
>> > rackspace.
>>
>> The cron wheels go to the "pre" container at
>>
>> https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com
>>
>
> Ah, there they are ... except ... you cancelled the builds I was waiting for
> :) I was building wheels so we could have folks test the DLL load problem,
> which I'm pretty sure if fixed anyway, so I suppose waiting on the daily
> isn't a big a deal.

Oh - sorry - I was rushing to get 1.14.5 wheels built.  Can you
retrigger the builds?  Do you want me to?

Cheers,

Matthew

From m.h.vankerkwijk at gmail.com  Sat Jun 30 09:51:15 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 30 Jun 2018 09:51:15 -0400
Subject: [Numpy-discussion] Fwd: Allowing broadcasting of code dimensions in
 generalized ufuncs
In-Reply-To: <CAJNV+9t9XoDdM1WyJ0y6degHNe89GMQZPJ_ASZqooLUgKeqwfg@mail.gmail.com>
References: <CAJNV+9s+aQyqY0M8U0p3daN+c9WyG3e2kEp45Vgmb0-BQC5uSw@mail.gmail.com>
 <CAEQ_TvcmhZFNBtsx7map8Ran0enyjsu0BvSz+Hx55L=RSpTw7g@mail.gmail.com>
 <CAOfRF=hsOy67OQBLYsOFwJ0C1sbNFaPSX2phQy0=V8SHd6MT6w@mail.gmail.com>
 <CAEQ_TveDRjYW4h37Tzz=sH+fRPPkTTArpc_MJ-CP8X9qmF6x8g@mail.gmail.com>
 <CAOfRF=gwWE=HLzgWyo9gQ4ytTSUGFnXfV_S9gaFRC1GcBtP6OQ@mail.gmail.com>
 <CAJNV+9vs9SumTMsnLkoH=AiMTj-JsEV_o8j_JP1n9+OTz_y=OA@mail.gmail.com>
 <CAEQ_TvfXXGOd7cac4ezFYaNiw52XJrqYfio-cwksFkeS--7BGw@mail.gmail.com>
 <CAJNV+9vKD-CbdTsqhQ1hs4zdcQzbYw8mkqKKNrpH_+Wr5LnqZQ@mail.gmail.com>
 <CAJNV+9u1km1vaEay330_ySEJXh3Qrf8ddzamNo1e6T5wqdUT6A@mail.gmail.com>
 <CAL1kJvCj9m1nXtw=gKr+U1pn6pmF+iBjewKQ0zJEWi1Y3mdCYg@mail.gmail.com>
 <CAL1kJvDC8oR44q+xc0yLULJnPzRDQaHkHO8ThtwJe2bORCvwpA@mail.gmail.com>
 <CAEQ_TvfbQg9gFoBhfXU991SWuKPEkjdvPGaz5DuoGx0ZwCU5tw@mail.gmail.com>
 <CAL1kJvADe=-q+Ydx_xyOURctjrNnT+rdr-DU8Da0hZB3wXTWuw@mail.gmail.com>
 <CAEQ_TvfFiAODZMNFfW7g1X1rNw4bTK2Z8XMme9GvEcnR8RUJFQ@mail.gmail.com>
 <CAJNV+9t9XoDdM1WyJ0y6degHNe89GMQZPJ_ASZqooLUgKeqwfg@mail.gmail.com>
Message-ID: <CAJNV+9soV-a4Y6hZQWsqta9vf-TvhJvSanD5u-3dYoVaJqqwCg@mail.gmail.com>

Hi All,

In case it was missed because people have tuned out of the thread: Matti
and I proposed last Tuesday to accept NEP 20 (on coming Tuesday, as per NEP
0), which introduces notation for generalized ufuncs allowing fixed,
flexible and broadcastable core dimensions. For one thing, this will allow
Matti to finish his work on making matmul a gufunc.

See http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html

All the best,

Marten


---------- Forwarded message ----------
From: Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
Date: Tue, Jun 26, 2018 at 2:25 PM
Subject: Re: [Numpy-discussion] Allowing broadcasting of code dimensions in
generalized ufuncs
To: Discussion of Numerical Python <numpy-discussion at python.org>


Hi All,

Matti asked me to make a PR accepting my own NEP -
https://github.com/numpy/numpy/pull/11429

Any objections?

As noted in my earlier summary of the discussion, in principle we can
choose to accept only parts, although I think it became clear that the most
contentious is also the one arguably most needed, the flexible dimensions
for matmul.

Moving forward has the advantage that in 1.16 we will actually be able to
deal with matmul.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/048ba0e5/attachment.html>

From m.h.vankerkwijk at gmail.com  Sat Jun 30 09:55:29 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 30 Jun 2018 09:55:29 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAL1kJvDSb5Jw=OLLESkyh6zomGkdFxTpjDKH9UO+Kft0e574AA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAEQ_TvfL9wKgMOL-AMfMTX3RBUU=8UPOOZcZ4bVXXW0iCdd-kw@mail.gmail.com>
 <CAEQ_TvcXXaj-r7KGbN5qQAmCpagZ4RP2QC8bi-=U=UyTrcCqVg@mail.gmail.com>
 <CAJNV+9t76jt=S+LdF9RO_THO6kEiFsV4jdEErSL0STZY3U+DYg@mail.gmail.com>
 <CAEQ_TvdJV6yh7oWpv-BsPeWsw8_KxNgwvrghmdhu_da2j08dKA@mail.gmail.com>
 <CAL1kJvDjhggML4YGF9O3wYFu-hJd57tDb3EitcsoRyjm=KkxKg@mail.gmail.com>
 <CAEQ_TvfEVOUBgSx1D0eirCXzHYWeVkPAhVojujsOE_qi0T1DDg@mail.gmail.com>
 <CAL1kJvDSb5Jw=OLLESkyh6zomGkdFxTpjDKH9UO+Kft0e574AA@mail.gmail.com>
Message-ID: <CAJNV+9v-NAsagu5MrjTNjcSof8rsuZacQhBJRvQ-Bu8ZM9F=tw@mail.gmail.com>

On Fri, Jun 29, 2018 at 9:54 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Good catch,
>
> I think the latter failing is because np.add.reduce ends up calling
> np.ufunc.reduce.__get__(np.add), and builtin_function.__get__ doesn?t
> appear to do any caching. I suppose caching bound methods would just be a
> waste of time.
> == would work just fine in my suggestion above, it seems - irrespective
> of the resolution of the discussion on python-dev.
>
> Eric
> ?
>
I think for implementers it might work easiest anyway to look up the ufunc
itself in a dict or so and then check the name of the method. (At least,
for my impementations of `__array_ufunc__`, it made a lot of sense to use
the method in that way; possibly less so for the larger variety with other
numpy functions).

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/22f67c4d/attachment.html>

From charlesr.harris at gmail.com  Sat Jun 30 09:57:44 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 30 Jun 2018 07:57:44 -0600
Subject: [Numpy-discussion] rackspace ssl certificates
In-Reply-To: <CAH6Pt5pbKddN1Wiju-1UZXK-+OO=1ZoP2vr=0-TVgqsfmg0Cyg@mail.gmail.com>
References: <CAB6mnxKMZBL4fxvrfM8x8EiSfGKdaKR_=Xzs1sU3Y8Gu4gG1Kg@mail.gmail.com>
 <CAJXewOn9E1JvrLG8v2q0Vhp-AXKc_KKmKk8QQQeHH1TdHkiAhA@mail.gmail.com>
 <CAB6mnx+uBHNTMXtj-bHb9LodVqgtjzP0_b_4HUf=qoAkbEjaTw@mail.gmail.com>
 <CAH6Pt5r=v7Q1a2LtZ1X_myFoCaRNRBWKXre9arT2sqN-H6XYQA@mail.gmail.com>
 <CAB6mnxK3-Xxjx0Z2WftaFzsQcdvYJEXdPqbOd3fPLoPTgDEsNA@mail.gmail.com>
 <CAH6Pt5pT=snWTm81Rx3Q3VxTugcGNNM+X2W9yhxL17iWgLVXLA@mail.gmail.com>
 <CAB6mnxKM5u6gJs6YY6HnhQsNiDA2xbshksqRm8JvBBts6yuv7g@mail.gmail.com>
 <CAH6Pt5rxgAyb21FEbJ4t1Pm+kkE4EPY-CO08URWuQxvY19gHiw@mail.gmail.com>
 <CAB6mnxLrqWrre3HPY8oD0s-eLnDuiU41DHTMjnhhfugQU-pFCg@mail.gmail.com>
 <CAH6Pt5rOzwtF5_q2GVYj17RkxM8Soa34McbAvEJeUftwcXWYkg@mail.gmail.com>
 <CAB6mnxJ-LuDx86KT7bxXee_-6psT7mKC8=HhkCT_WWasghy5qw@mail.gmail.com>
 <e3ad1ade-b551-d463-f48c-8fd6c1c4f975@gmail.com>
 <CAH6Pt5osXxFRaNRPxOvGjw-OC6bQjFj2+Bqc91YKWkDdz87LOQ@mail.gmail.com>
 <dcef00d9-42f0-e86f-14ac-cac9fb6bdabb@gmail.com>
 <CAH6Pt5p-zViT9Mcacj=+hSiCxmRYW4Bp9X9a-KnvSSG71rrx9Q@mail.gmail.com>
 <CAB6mnxLrVjAMT6i9MDSAdViuyNC3FQep_=qfRNstFsvoTvOdAw@mail.gmail.com>
 <CAH6Pt5rH7hU8BUi0yOddB9goXxUCAjdfvAqMJMH-naG9WVGnrA@mail.gmail.com>
 <CAB6mnxLGhrM0d02skBxb35bgmpPr6Y6vbw7E8N-0KVNN=JPPjw@mail.gmail.com>
 <CAH6Pt5pbKddN1Wiju-1UZXK-+OO=1ZoP2vr=0-TVgqsfmg0Cyg@mail.gmail.com>
Message-ID: <CAB6mnxLewzqiWatKXGzFPpzUqA_1ppe4ZrWB82LGvdi5DzsYSg@mail.gmail.com>

Not to worry, I'll just wait on the daily.

On Sat, Jun 30, 2018 at 2:32 AM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> On Sat, Jun 30, 2018 at 12:36 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Fri, Jun 29, 2018 at 4:35 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> On Fri, Jun 29, 2018 at 11:31 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Tue, Jun 26, 2018 at 3:55 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Tue, Jun 26, 2018 at 10:43 PM, Matti Picus <matti.picus at gmail.com
> >
> >> >> wrote:
> >> >> > On 19/06/18 10:57, Matthew Brett wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Tue, Jun 19, 2018 at 6:27 PM, Matti Picus <
> matti.picus at gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> On 19/06/18 09:58, Charles R Harris wrote:
> >> >> >>>>>
> >> >> >>>>> What I was curious about is that there were no more "daily"
> >> >> >>>>> builds
> >> >> >>>>> of
> >> >> >>>>> master.
> >> >> >>>>
> >> >> >>>> Is that right?  That there were daily builds of master, on
> >> >> >>>> Appveyor?
> >> >> >>>> I don't know how those worked, I only recently got cron
> permission
> >> >> >>>> ...
> >> >> >>>
> >> >> >>>
> >> >> >>> No, but there used to be daily builds on travis. They stopped 8
> >> >> >>> days
> >> >> >>> ago,
> >> >> >>> https://travis-ci.org/MacPython/numpy-wheels/builds.
> >> >> >>
> >> >> >> Oops - yes - sorry - I retired the 'daily' branch, in favor of
> >> >> >> 'master', but forgot to update the Travis-CI settings.
> >> >> >>
> >> >> >> Done now.
> >> >> >>
> >> >> >> Cheers,
> >> >> >>
> >> >> >> Matthew
> >> >> >>
> >> >> > FWIW, still no daily builds at
> >> >> > https://travis-ci.org/MacPython/numpy-wheels/builds
> >> >>
> >> >> You mean, some days there appears to be no build?  The build matrix
> >> >> does show Cron-triggered jobs, the last of which was a few hours ago:
> >> >> https://travis-ci.org/MacPython/numpy-wheels/builds/397008012
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Matthew
> >> >
> >> >
> >> > The cron wheels are getting built and tested, but they aren't
> uploading
> >> > to
> >> > rackspace.
> >>
> >> The cron wheels go to the "pre" container at
> >>
> >> https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a
> 83.ssl.cf2.rackcdn.com
> >>
> >
> > Ah, there they are ... except ... you cancelled the builds I was waiting
> for
> > :) I was building wheels so we could have folks test the DLL load
> problem,
> > which I'm pretty sure if fixed anyway, so I suppose waiting on the daily
> > isn't a big a deal.
>
> Oh - sorry - I was rushing to get 1.14.5 wheels built.  Can you
> retrigger the builds?  Do you want me to?
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/092e7656/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sat Jun 30 10:02:52 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 30 Jun 2018 10:02:52 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
Message-ID: <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>

Hi Hameer,


It is. The point of the proposed feature was to handle array generation
> mechanisms, that don't take an array as input in the standard NumPy API.
> Giving them a reference handles both the dispatch and the decision about
> which implementation to call.
>

Sorry, I had clearly misunderstood. It would indeed be nice for overrides
to work on functions like `zeros` or `arange` as well, but it seems strange
to change the signature just for that. As a possible alternative, should we
perhaps generally check for overrides on `dtype`?

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/d9b34bcf/attachment.html>

From einstein.edison at gmail.com  Sat Jun 30 10:40:29 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sat, 30 Jun 2018 07:40:29 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
 <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
Message-ID: <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>

Hi Marten,

Sorry, I had clearly misunderstood. It would indeed be nice for overrides
to work on functions like `zeros` or `arange` as well, but it seems strange
to change the signature just for that. As a possible alternative, should we
perhaps generally check for overrides on `dtype`?


While this very clearly makes sense for something like astropy, it has a
few drawbacks:

   - Other duck arrays such as Dask need more information than just the
   dtype. For example, Dask needs chunk sizes, XArray needs axis labels, and
   pydata/sparse needs to know the type of the reference array in order to
   make one of the same type. The information in a reference array is a strict
   superset of information in the dtype.
   - There?s a need for a separate protocol, which might be a lot harder to
   work with for both NumPy and library authors.
   - Some things, like numpy.random.RandomState, don?t accept a dtype
   argument.

As for your concern about changing the signature, it?s easy enough with a
decorator. We?ll need a separate decorator for array generation functions.
Something like:

def array_generation_function(func):
    @functools.wraps(func)
    def wrapped(*args, **kwargs, array_reference=np._NoValue):
        if array_reference is not np._NoValue:
            success, result = try_array_function_override(wrapped,
[array_reference], args, kwargs)

            if success:
                return result

        return func(*args, **kwargs)

    return wrapped

Hameer Abbasi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/3b707b80/attachment.html>

From maifer at haverford.edu  Sat Jun 30 12:13:58 2018
From: maifer at haverford.edu (Maxwell Aifer)
Date: Sat, 30 Jun 2018 12:13:58 -0400
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
Message-ID: <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>

Thanks, that explains a lot! I didn't realize the reverse ordering actually
originated with matlab's polyval, but that makes sense given the one-based
indexing. I see why it is the way it is, but I still think it would make
more sense for np.polyval() to use conventional indexing (c[0] * x^0 + c[1]
* x^1 + c[2] * x^2). np.polyval() can be convenient when a polynomial
object is just not needed, but if a single program uses both np.polyval()
and np.polynomail.Polynomial, it seems bound to cause unnecessary confusion.

Max

On Fri, Jun 29, 2018 at 11:23 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Here's my take on this, but it may not be an accurate summary of the
> history.
>
> `np.poly<func>` is part of the original matlab-style API, built around
> `poly1d` objects. This isn't a great design, because they represent:
>
>     p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0
>
> For this reason, among others, the `np.polynomial` module was created,
> starting with a clean slate. The core of this is
> `np.polynomial.Polynomial`. There, everything uses the convention
>
>     p(x) = c[0] * x^0 + c[1] * x^1 + c[2] * x^2
>
> It sounds like we might need clearer docs explaining the difference, and
> pointing users to the more sensible `np.polynomial.Polynomial`
>
> Eric
>
>
>
> On Fri, 29 Jun 2018 at 20:10 Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>
>> On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer <maifer at haverford.edu>
>> wrote:
>>
>>> Hi,
>>> I noticed some frustrating inconsistencies in the various ways to
>>> evaluate polynomials using numpy. Numpy has three ways of evaluating
>>> polynomials (that I know of) and each of them has a different syntax:
>>>
>>>    -
>>>
>>>    numpy.polynomial.polynomial.Polynomial
>>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
>>>    You define a polynomial by a list of coefficients *in order of
>>>    increasing degree*, and then use the class?s call() function.
>>>    -
>>>
>>>    np.polyval
>>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
>>>    Evaluates a polynomial at a point. *First* argument is the
>>>    polynomial, or list of coefficients *in order of decreasing degree*,
>>>    and the *second* argument is the point to evaluate at.
>>>    -
>>>
>>>    np.polynomial.polynomial.polyval
>>>    <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
>>>    Also evaluates a polynomial at a point, but has more support for
>>>    vectorization. *First* argument is the point to evaluate at, and
>>>    *second* argument the list of coefficients *in order of increasing
>>>    degree*.
>>>
>>> Not only the order of arguments is changed between different methods,
>>> but the order of the coefficients is reversed as well, leading to puzzling
>>> bugs (in my experience). What could be the reason for this madness? As
>>> polyval is a shameless ripoff of Matlab?s function of the same name
>>> <https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why
>>> not just use matlab?s syntax (polyval([c0, c1, c2...], x)) across the
>>> board?
>>> ?
>>>
>>>
>> The polynomial package, with its various basis, deals with series, and
>> especially with the truncated series approximations that are used in
>> numerical work. Series are universally written in increasing order of the
>> degree. The Polynomial class is efficient in a single variable, while the
>> numpy.polynomial.polynomial.polyval function is intended as a building
>> block and can also deal with multivariate polynomials or multidimensional
>> arrays of polynomials, or a mix. See the simple implementation of polyval3d
>> for an example. If you are just dealing with a single variable, use
>> Polynomial, which will also track scaling and offsets for numerical
>> stability and is generally much superior to the simple polyval function
>> from a numerical point of view.
>>
>> As to the ordering of the degrees, learning that the degree matches the
>> index is pretty easy and is a more natural fit for the implementation code,
>> especially as the number of variables increases. I note that Matlab has
>> ones based indexing, so that was really not an option for them.
>>
>> Chuck
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/206225da/attachment.html>

From m.h.vankerkwijk at gmail.com  Sat Jun 30 12:52:19 2018
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 30 Jun 2018 12:52:19 -0400
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5Dw0HinNA14jswa2UVpC=6jmoa9EmdA+u+hpJA=EMm58A@mail.gmail.com>
 <CAJNV+9sDf2Lj4M2hgDbUyb6kfHBb3meDmpxYwyRGQzszr_EkmA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
 <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
 <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>
Message-ID: <CAJNV+9s=p01Qo3tRHY8x9amELHH4tbP5gxPErcLQSMrDb31w-w@mail.gmail.com>

Hi Hameer,

I think the override on `dtype` would work - after all, the override is
checked before anything is done, so one can just pass in `self` if one
wishes (or some helper class that contains both `self` and any desired
further information.

But, as you note, it would not cover everything, and your `array_reference`
idea definitely makes things more uniform. Indeed, it would allow one to
implement things like `np.zeros_like` using `np.zero`, which seems quite
nice.

Still, I'm not sure whether this should be included in the present NEP or
is best done separately after, with a few concrete examples of where it
would be useful.

All the best,

Marten


On Sat, Jun 30, 2018 at 10:40 AM, Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Hi Marten,
>
> Sorry, I had clearly misunderstood. It would indeed be nice for overrides
> to work on functions like `zeros` or `arange` as well, but it seems strange
> to change the signature just for that. As a possible alternative, should we
> perhaps generally check for overrides on `dtype`?
>
>
> While this very clearly makes sense for something like astropy, it has a
> few drawbacks:
>
>    - Other duck arrays such as Dask need more information than just the
>    dtype. For example, Dask needs chunk sizes, XArray needs axis labels, and
>    pydata/sparse needs to know the type of the reference array in order
>    to make one of the same type. The information in a reference array is a
>    strict superset of information in the dtype.
>    - There?s a need for a separate protocol, which might be a lot harder
>    to work with for both NumPy and library authors.
>    - Some things, like numpy.random.RandomState, don?t accept a dtype
>    argument.
>
> As for your concern about changing the signature, it?s easy enough with a
> decorator. We?ll need a separate decorator for array generation functions.
> Something like:
>
> def array_generation_function(func):
>     @functools.wraps(func)
>     def wrapped(*args, **kwargs, array_reference=np._NoValue):
>         if array_reference is not np._NoValue:
>             success, result = try_array_function_override(wrapped, [array_reference], args, kwargs)
>
>             if success:
>                 return result
>
>         return func(*args, **kwargs)
>
>     return wrapped
>
> Hameer Abbasi
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/4937f23b/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Sat Jun 30 14:09:56 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sat, 30 Jun 2018 11:09:56 -0700
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
Message-ID: <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>

>  if a single program uses both np.polyval() and np.polynomail.Polynomial,
it seems bound to cause unnecessary confusion.

Yes, I would recommend definitely not doing that!

> I still think it would make more sense for np.polyval() to use
conventional indexing

Unfortunately, it's too late for "making sense" to factor into the design.
`polyval` is being used in the wild, so we're stuck with it behaving the
way it does. At best, we can deprecate it and start telling people to move
from `np.polyval` over to `np.polynomial.polynomial.polyval`. Perhaps we
need to make this namespace less cumbersome in order for that to be a
reasonable option.

I also wonder if we want a more lightweight polynomial object without the
extra domain and range information, which seem like they make `Polynomial`
a more questionable drop-in replacement for `poly1d`.

Eric

On Sat, 30 Jun 2018 at 09:14 Maxwell Aifer <maifer at haverford.edu> wrote:

> Thanks, that explains a lot! I didn't realize the reverse ordering
> actually originated with matlab's polyval, but that makes sense given the
> one-based indexing. I see why it is the way it is, but I still think it
> would make more sense for np.polyval() to use conventional indexing (c[0]
> * x^0 + c[1] * x^1 + c[2] * x^2). np.polyval() can be convenient when a
> polynomial object is just not needed, but if a single program uses both
> np.polyval() and np.polynomail.Polynomial, it seems bound to cause
> unnecessary confusion.
>
> Max
>
> On Fri, Jun 29, 2018 at 11:23 PM, Eric Wieser <wieser.eric+numpy at gmail.com
> > wrote:
>
>> Here's my take on this, but it may not be an accurate summary of the
>> history.
>>
>> `np.poly<func>` is part of the original matlab-style API, built around
>> `poly1d` objects. This isn't a great design, because they represent:
>>
>>     p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0
>>
>> For this reason, among others, the `np.polynomial` module was created,
>> starting with a clean slate. The core of this is
>> `np.polynomial.Polynomial`. There, everything uses the convention
>>
>>     p(x) = c[0] * x^0 + c[1] * x^1 + c[2] * x^2
>>
>> It sounds like we might need clearer docs explaining the difference, and
>> pointing users to the more sensible `np.polynomial.Polynomial`
>>
>> Eric
>>
>>
>>
>> On Fri, 29 Jun 2018 at 20:10 Charles R Harris <charlesr.harris at gmail.com>
>> wrote:
>>
>>> On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer <maifer at haverford.edu>
>>> wrote:
>>>
>>>> Hi,
>>>> I noticed some frustrating inconsistencies in the various ways to
>>>> evaluate polynomials using numpy. Numpy has three ways of evaluating
>>>> polynomials (that I know of) and each of them has a different syntax:
>>>>
>>>>    -
>>>>
>>>>    numpy.polynomial.polynomial.Polynomial
>>>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
>>>>    You define a polynomial by a list of coefficients *in order of
>>>>    increasing degree*, and then use the class?s call() function.
>>>>    -
>>>>
>>>>    np.polyval
>>>>    <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
>>>>    Evaluates a polynomial at a point. *First* argument is the
>>>>    polynomial, or list of coefficients *in order of decreasing degree*,
>>>>    and the *second* argument is the point to evaluate at.
>>>>    -
>>>>
>>>>    np.polynomial.polynomial.polyval
>>>>    <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
>>>>    Also evaluates a polynomial at a point, but has more support for
>>>>    vectorization. *First* argument is the point to evaluate at, and
>>>>    *second* argument the list of coefficients *in order of increasing
>>>>    degree*.
>>>>
>>>> Not only the order of arguments is changed between different methods,
>>>> but the order of the coefficients is reversed as well, leading to puzzling
>>>> bugs (in my experience). What could be the reason for this madness? As
>>>> polyval is a shameless ripoff of Matlab?s function of the same name
>>>> <https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why
>>>> not just use matlab?s syntax (polyval([c0, c1, c2...], x)) across the
>>>> board?
>>>> ?
>>>>
>>>>
>>> The polynomial package, with its various basis, deals with series, and
>>> especially with the truncated series approximations that are used in
>>> numerical work. Series are universally written in increasing order of the
>>> degree. The Polynomial class is efficient in a single variable, while the
>>> numpy.polynomial.polynomial.polyval function is intended as a building
>>> block and can also deal with multivariate polynomials or multidimensional
>>> arrays of polynomials, or a mix. See the simple implementation of polyval3d
>>> for an example. If you are just dealing with a single variable, use
>>> Polynomial, which will also track scaling and offsets for numerical
>>> stability and is generally much superior to the simple polyval function
>>> from a numerical point of view.
>>>
>>> As to the ordering of the degrees, learning that the degree matches the
>>> index is pretty easy and is a more natural fit for the implementation code,
>>> especially as the number of variables increases. I note that Matlab has
>>> ones based indexing, so that was really not an option for them.
>>>
>>> Chuck
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/956a93bd/attachment-0001.html>

From charlesr.harris at gmail.com  Sat Jun 30 14:30:18 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 30 Jun 2018 12:30:18 -0600
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
Message-ID: <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>

On Sat, Jun 30, 2018 at 12:09 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> >  if a single program uses both np.polyval() and
> np.polynomail.Polynomial, it seems bound to cause unnecessary confusion.
>
> Yes, I would recommend definitely not doing that!
>
> > I still think it would make more sense for np.polyval() to use
> conventional indexing
>
> Unfortunately, it's too late for "making sense" to factor into the design.
> `polyval` is being used in the wild, so we're stuck with it behaving the
> way it does. At best, we can deprecate it and start telling people to move
> from `np.polyval` over to `np.polynomial.polynomial.polyval`. Perhaps we
> need to make this namespace less cumbersome in order for that to be a
> reasonable option.
>
> I also wonder if we want a more lightweight polynomial object without the
> extra domain and range information, which seem like they make `Polynomial`
> a more questionable drop-in replacement for `poly1d`.
>

The defaults for domain and window make it like a regular polynomial. For
fitting, it does adjust the range, but the usual form can be recovered with
`p.convert()` and will usually have more accurate coefficients due to using
a better conditioned matrix during the fit.

In [1]: from numpy.polynomial import Polynomial as P

In [2]: p = P([1, 2, 3], domain=(0,2))

In [3]: p(0)
Out[3]: 2.0

In [4]: p.convert()
Out[4]: Polynomial([ 2., -4.,  3.], domain=[-1.,  1.], window=[-1.,  1.])

In [5]: p.convert()(0)
Out[5]: 2.0

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/f3a630a2/attachment.html>

From einstein.edison at gmail.com  Sat Jun 30 14:30:21 2018
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sat, 30 Jun 2018 11:30:21 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAJNV+9s=p01Qo3tRHY8x9amELHH4tbP5gxPErcLQSMrDb31w-w@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
 <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
 <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>
 <CAJNV+9s=p01Qo3tRHY8x9amELHH4tbP5gxPErcLQSMrDb31w-w@mail.gmail.com>
Message-ID: <CADViA5B6x2iGdrK1QrP5DfcYrzn=+N99Wj0Y9nep7Q5ES=BKaA@mail.gmail.com>

Hi Marten,

Still, I'm not sure whether this should be included in the present NEP or
is best done separately after, with a few concrete examples of where it
would be useful.


There already are concrete examples from Dask and CuPy, and this is
currently a blocker for them, which is part of the reason I?m pushing so
hard for it. See #11074 <https://github.com/numpy/numpy/issues/11074> for a
context, and I think it was part of the reason that inspired Matt and
Stephan to write this protocol in the first place.

Best Regards,
Hameer Abbasi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/9349e2e8/attachment.html>

From shoyer at gmail.com  Sat Jun 30 15:13:11 2018
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sat, 30 Jun 2018 12:13:11 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CADViA5B6x2iGdrK1QrP5DfcYrzn=+N99Wj0Y9nep7Q5ES=BKaA@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
 <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
 <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>
 <CAJNV+9s=p01Qo3tRHY8x9amELHH4tbP5gxPErcLQSMrDb31w-w@mail.gmail.com>
 <CADViA5B6x2iGdrK1QrP5DfcYrzn=+N99Wj0Y9nep7Q5ES=BKaA@mail.gmail.com>
Message-ID: <CAEQ_TvcTbbSivKZPG7pHsxHaGNw7XC8SVoahfN4MfsXT1uooYw@mail.gmail.com>

On Sat, Jun 30, 2018 at 11:59 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Hi Marten,
>
> Still, I'm not sure whether this should be included in the present NEP or
> is best done separately after, with a few concrete examples of where it
> would be useful.
>
>
> There already are concrete examples from Dask and CuPy, and this is
> currently a blocker for them, which is part of the reason I?m pushing so
> hard for it. See #11074 <https://github.com/numpy/numpy/issues/11074> for
> a context, and I think it was part of the reason that inspired Matt and
> Stephan to write this protocol in the first place.
>

Overloading np.ones_like() is definitely in scope already.

I?d love to see a generic way of doing random number generation, but I
agree with Martin that I don?t see it fitting a naturally into this NEP. An
invasive change to add an array_reference argument to a bunch of functions
might indeed be worthy of its own NEP, but again I?m not convinced that?s
actually the right approach. I?d rather add a few new functions like
random_like, which is a small enough change that concensus on the list
might be enough.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/ab0b0819/attachment.html>

From ilhanpolat at gmail.com  Sat Jun 30 15:08:17 2018
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Sat, 30 Jun 2018 21:08:17 +0200
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
Message-ID: <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>

I think restricting polynomials to time series is not a generic way and
quite specific.

Apart from the series and certain filter design actual usage of polynomials
are always presented with decreasing order (control and signal processing
included because they use powers of s and inverse powers of z if needed).
So if that is the use case then probably it should go under a namespace of
`TimeSeries` or at least require an option to present it in reverse.  In my
opinion polynomials are way more general than that domain and to everyone
else it seems to me that "the intuitive way" is the decreasing powers.

For the design

> This isn't a great design, because they represent:
>    p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0

I don't see the problem actually. If I ask someone to write down the
coefficients of a polynomial I don't think anyone would start from c[2].


On Sat, Jun 30, 2018 at 8:30 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jun 30, 2018 at 12:09 PM, Eric Wieser <wieser.eric+numpy at gmail.com
> > wrote:
>
>> >  if a single program uses both np.polyval() and
>> np.polynomail.Polynomial, it seems bound to cause unnecessary confusion.
>>
>> Yes, I would recommend definitely not doing that!
>>
>> > I still think it would make more sense for np.polyval() to use
>> conventional indexing
>>
>> Unfortunately, it's too late for "making sense" to factor into the
>> design. `polyval` is being used in the wild, so we're stuck with it
>> behaving the way it does. At best, we can deprecate it and start telling
>> people to move from `np.polyval` over to `np.polynomial.polynomial.polyval`.
>> Perhaps we need to make this namespace less cumbersome in order for that to
>> be a reasonable option.
>>
>> I also wonder if we want a more lightweight polynomial object without the
>> extra domain and range information, which seem like they make `Polynomial`
>> a more questionable drop-in replacement for `poly1d`.
>>
>
> The defaults for domain and window make it like a regular polynomial. For
> fitting, it does adjust the range, but the usual form can be recovered with
> `p.convert()` and will usually have more accurate coefficients due to using
> a better conditioned matrix during the fit.
>
> In [1]: from numpy.polynomial import Polynomial as P
>
> In [2]: p = P([1, 2, 3], domain=(0,2))
>
> In [3]: p(0)
> Out[3]: 2.0
>
> In [4]: p.convert()
> Out[4]: Polynomial([ 2., -4.,  3.], domain=[-1.,  1.], window=[-1.,  1.])
>
> In [5]: p.convert()(0)
> Out[5]: 2.0
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/6cd33e2c/attachment-0001.html>

From charlesr.harris at gmail.com  Sat Jun 30 16:56:14 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 30 Jun 2018 14:56:14 -0600
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
Message-ID: <CAB6mnxKu-RS=NmwT6-OCHpmgbL36tUYaTziaAodsEJBYUc5M7w@mail.gmail.com>

On Sat, Jun 30, 2018 at 1:08 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I think restricting polynomials to time series is not a generic way and
> quite specific.
>

I think more of complex analysis and it's use of series.


> Apart from the series and certain filter design actual usage of
> polynomials are always presented with decreasing order (control and signal
> processing included because they use powers of s and inverse powers of z if
> needed). So if that is the use case then probably it should go under a
> namespace of `TimeSeries` or at least require an option to present it in
> reverse.  In my opinion polynomials are way more general than that domain
> and to everyone else it seems to me that "the intuitive way" is the
> decreasing powers.
>
>
In approximation, say by Chebyshev polynomials, the coefficients will
typically drop off sharply above a certain degree. This has two effects,
first, the coefficients that one really cares about are of low degree and
should come first, and second, one can truncate the coefficients easily
with c[:n]. So in this usage ordering by increasing degree is natural. This
is the series idea, fundamental to analysis.

Algebraically, interest centers on the degree of the polynomial, which
determines the number of zeros and general shape, consequently from the
point of view of the algebraist, working with polynomials of finite
predetermined degree, arranging the coefficients in order of decreasing
degree makes sense and is traditional.

That said, I am not actually sure where the high to low ordering of
polynomials came from. It could even be like the Arabic numeral system,
which when read properly from right to left, has its terms arranged from
small to greater. It may even be that the polynomial convention derives
that of the Arabic numerals.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/5e7e74bc/attachment.html>

From wieser.eric+numpy at gmail.com  Sat Jun 30 17:30:03 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sat, 30 Jun 2018 14:30:03 -0700
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
Message-ID: <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>

?the intuitive way? is the decreasing powers.

An argument against this is that accessing the ith power of x is spelt:

   - x.coeffs[i] for increasing powers
   - x.coeffs[-i-1] for decreasing powers

The former is far more natural than the latter, and avoids a potential
off-by-one error

If I ask someone to write down the coefficients of a polynomial I don?t
think anyone would start from c[2]

You wouldn?t? I?d expect to see

[image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]

rather than

[image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]

Sure, I?d write it starting with the highest power, but I?d still number my
coefficients to match the powers.


Eric
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/2576aeaf/attachment.html>

From maifer at haverford.edu  Sat Jun 30 17:33:22 2018
From: maifer at haverford.edu (Maxwell Aifer)
Date: Sat, 30 Jun 2018 17:33:22 -0400
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAB6mnxKu-RS=NmwT6-OCHpmgbL36tUYaTziaAodsEJBYUc5M7w@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAB6mnxKu-RS=NmwT6-OCHpmgbL36tUYaTziaAodsEJBYUc5M7w@mail.gmail.com>
Message-ID: <CAKGLbUrdpqbF2isqimfLEdJuoub-6_465iOuy5c2G4xQ1_Wpxg@mail.gmail.com>

Interesting,  I wasn't aware that both conventions were widely used.

Speaking of series with inverse powers (i.e. Laurent series), I wonder how
useful it would be to create a class to represent expressions with integral
powers from -m to n. These come up in my work sometimes, and I usually
represent them with coefficient arrays ordered like this:

c[0]*x^0 + ... + c[n]*x^n + c[n+1]x^-m + ... + c[n+m+1]*x^-1

Because then with negative indexing you have:

c[-m]*x^-m + ... + c[n]*x^n

Still, these objects can't be manipulated as nicely as polynomials because
they aren't closed under integration and differentiation (you get log
terms).

Max


On Sat, Jun 30, 2018 at 4:56 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jun 30, 2018 at 1:08 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> I think restricting polynomials to time series is not a generic way and
>> quite specific.
>>
>
> I think more of complex analysis and it's use of series.
>
>
>> Apart from the series and certain filter design actual usage of
>> polynomials are always presented with decreasing order (control and signal
>> processing included because they use powers of s and inverse powers of z if
>> needed). So if that is the use case then probably it should go under a
>> namespace of `TimeSeries` or at least require an option to present it in
>> reverse.  In my opinion polynomials are way more general than that domain
>> and to everyone else it seems to me that "the intuitive way" is the
>> decreasing powers.
>>
>>
> In approximation, say by Chebyshev polynomials, the coefficients will
> typically drop off sharply above a certain degree. This has two effects,
> first, the coefficients that one really cares about are of low degree and
> should come first, and second, one can truncate the coefficients easily
> with c[:n]. So in this usage ordering by increasing degree is natural. This
> is the series idea, fundamental to analysis.
>
> Algebraically, interest centers on the degree of the polynomial, which
> determines the number of zeros and general shape, consequently from the
> point of view of the algebraist, working with polynomials of finite
> predetermined degree, arranging the coefficients in order of decreasing
> degree makes sense and is traditional.
>
> That said, I am not actually sure where the high to low ordering of
> polynomials came from. It could even be like the Arabic numeral system,
> which when read properly from right to left, has its terms arranged from
> small to greater. It may even be that the polynomial convention derives
> that of the Arabic numerals.
>
> <snip>
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/710245ea/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Sat Jun 30 17:41:28 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sat, 30 Jun 2018 14:41:28 -0700
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
Message-ID: <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>

Since the one of the arguments for the decreasing order seems to just be
textual representation - do we want to tweak the repr to something like

Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)

(And add a constructor that calls the lambda with Polynomial(1))

Eric
?

On Sat, 30 Jun 2018 at 14:30 Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> ?the intuitive way? is the decreasing powers.
>
> An argument against this is that accessing the ith power of x is spelt:
>
>    - x.coeffs[i] for increasing powers
>    - x.coeffs[-i-1] for decreasing powers
>
> The former is far more natural than the latter, and avoids a potential
> off-by-one error
>
> If I ask someone to write down the coefficients of a polynomial I don?t
> think anyone would start from c[2]
>
> You wouldn?t? I?d expect to see
>
> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>
> rather than
>
> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>
> Sure, I?d write it starting with the highest power, but I?d still number
> my coefficients to match the powers.
>
>
> Eric
> ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/834762ab/attachment.html>

From maifer at haverford.edu  Sat Jun 30 18:05:12 2018
From: maifer at haverford.edu (Maxwell Aifer)
Date: Sat, 30 Jun 2018 18:05:12 -0400
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
 <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
Message-ID: <CAKGLbUo-CheudXcrBM5Z3eRwgZRVx41Bp3rfZGf2AvK_fUcysg@mail.gmail.com>

Oh, clever... yeah I think that would be very cool. But shouldn't it call
the constructor with Polynomial([0,1])?

On Sat, Jun 30, 2018 at 5:41 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Since the one of the arguments for the decreasing order seems to just be
> textual representation - do we want to tweak the repr to something like
>
> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>
> (And add a constructor that calls the lambda with Polynomial(1))
>
> Eric
> ?
>
> On Sat, 30 Jun 2018 at 14:30 Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> ?the intuitive way? is the decreasing powers.
>>
>> An argument against this is that accessing the ith power of x is spelt:
>>
>>    - x.coeffs[i] for increasing powers
>>    - x.coeffs[-i-1] for decreasing powers
>>
>> The former is far more natural than the latter, and avoids a potential
>> off-by-one error
>>
>> If I ask someone to write down the coefficients of a polynomial I don?t
>> think anyone would start from c[2]
>>
>> You wouldn?t? I?d expect to see
>>
>> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>>
>> rather than
>>
>> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>>
>> Sure, I?d write it starting with the highest power, but I?d still number
>> my coefficients to match the powers.
>>
>>
>> Eric
>> ?
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/0f07c47b/attachment-0001.html>

From maifer at haverford.edu  Sat Jun 30 18:06:46 2018
From: maifer at haverford.edu (Maxwell Aifer)
Date: Sat, 30 Jun 2018 18:06:46 -0400
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAKGLbUo-CheudXcrBM5Z3eRwgZRVx41Bp3rfZGf2AvK_fUcysg@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
 <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
 <CAKGLbUo-CheudXcrBM5Z3eRwgZRVx41Bp3rfZGf2AvK_fUcysg@mail.gmail.com>
Message-ID: <CAKGLbUo4b0cyPkq3BvvsnQF5mn45xCP35JhUZZMFBbDhG4G99g@mail.gmail.com>

*shouldn't the constructor call the lambda with Polynomial([0,1[)

On Sat, Jun 30, 2018 at 6:05 PM, Maxwell Aifer <maifer at haverford.edu> wrote:

> Oh, clever... yeah I think that would be very cool. But shouldn't it call
> the constructor with Polynomial([0,1])?
>
> On Sat, Jun 30, 2018 at 5:41 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Since the one of the arguments for the decreasing order seems to just be
>> textual representation - do we want to tweak the repr to something like
>>
>> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>>
>> (And add a constructor that calls the lambda with Polynomial(1))
>>
>> Eric
>> ?
>>
>> On Sat, 30 Jun 2018 at 14:30 Eric Wieser <wieser.eric+numpy at gmail.com>
>> wrote:
>>
>>> ?the intuitive way? is the decreasing powers.
>>>
>>> An argument against this is that accessing the ith power of x is spelt:
>>>
>>>    - x.coeffs[i] for increasing powers
>>>    - x.coeffs[-i-1] for decreasing powers
>>>
>>> The former is far more natural than the latter, and avoids a potential
>>> off-by-one error
>>>
>>> If I ask someone to write down the coefficients of a polynomial I don?t
>>> think anyone would start from c[2]
>>>
>>> You wouldn?t? I?d expect to see
>>>
>>> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>>>
>>> rather than
>>>
>>> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>>>
>>> Sure, I?d write it starting with the highest power, but I?d still number
>>> my coefficients to match the powers.
>>>
>>>
>>> Eric
>>> ?
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/da6389e8/attachment.html>

From wieser.eric+numpy at gmail.com  Sat Jun 30 18:08:40 2018
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sat, 30 Jun 2018 15:08:40 -0700
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAKGLbUo4b0cyPkq3BvvsnQF5mn45xCP35JhUZZMFBbDhG4G99g@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
 <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
 <CAKGLbUo-CheudXcrBM5Z3eRwgZRVx41Bp3rfZGf2AvK_fUcysg@mail.gmail.com>
 <CAKGLbUo4b0cyPkq3BvvsnQF5mn45xCP35JhUZZMFBbDhG4G99g@mail.gmail.com>
Message-ID: <CAL1kJvBFTN+40hojkxrVBdxtRoNzOs2M_QQp4WcLkn+CJDNbEA@mail.gmail.com>

Good catch, it would do that

On Sat, 30 Jun 2018 at 15:07 Maxwell Aifer <maifer at haverford.edu> wrote:

> *shouldn't the constructor call the lambda with Polynomial([0,1[)
>
> On Sat, Jun 30, 2018 at 6:05 PM, Maxwell Aifer <maifer at haverford.edu>
> wrote:
>
>> Oh, clever... yeah I think that would be very cool. But shouldn't it call
>> the constructor with Polynomial([0,1])?
>>
>> On Sat, Jun 30, 2018 at 5:41 PM, Eric Wieser <wieser.eric+numpy at gmail.com
>> > wrote:
>>
>>> Since the one of the arguments for the decreasing order seems to just be
>>> textual representation - do we want to tweak the repr to something like
>>>
>>> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>>>
>>> (And add a constructor that calls the lambda with Polynomial(1))
>>>
>>> Eric
>>> ?
>>>
>>> On Sat, 30 Jun 2018 at 14:30 Eric Wieser <wieser.eric+numpy at gmail.com>
>>> wrote:
>>>
>>>> ?the intuitive way? is the decreasing powers.
>>>>
>>>> An argument against this is that accessing the ith power of x is spelt:
>>>>
>>>>    - x.coeffs[i] for increasing powers
>>>>    - x.coeffs[-i-1] for decreasing powers
>>>>
>>>> The former is far more natural than the latter, and avoids a potential
>>>> off-by-one error
>>>>
>>>> If I ask someone to write down the coefficients of a polynomial I don?t
>>>> think anyone would start from c[2]
>>>>
>>>> You wouldn?t? I?d expect to see
>>>>
>>>> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>>>>
>>>> rather than
>>>>
>>>> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>>>>
>>>> Sure, I?d write it starting with the highest power, but I?d still
>>>> number my coefficients to match the powers.
>>>>
>>>>
>>>> Eric
>>>> ?
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/2de48958/attachment-0001.html>

From charlesr.harris at gmail.com  Sat Jun 30 18:47:10 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 30 Jun 2018 16:47:10 -0600
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAB6mnxJ2wArHBiiwv4EtHhT5tsez5sESF9gpGLy-HeBqyVKpAQ@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
 <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
 <CAB6mnxJ2wArHBiiwv4EtHhT5tsez5sESF9gpGLy-HeBqyVKpAQ@mail.gmail.com>
Message-ID: <CAB6mnxLQtU1eJnSnxusgr44pwcvm0JHSuxNw9zJAPKYXq81m2Q@mail.gmail.com>

On Sat, Jun 30, 2018 at 4:42 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jun 30, 2018 at 3:41 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Since the one of the arguments for the decreasing order seems to just be
>> textual representation - do we want to tweak the repr to something like
>>
>> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>>
>> (And add a constructor that calls the lambda with Polynomial(1))
>>
>> Eric
>>
>
> IIRC there was a proposal for that. There is the possibility of adding
> renderers for latex and html that could be used by Jupyter, and I think the
> ordering was an option.
>

See https://github.com/numpy/numpy/issues/8893 for the proposal. BTW, if
someone would like to work on this, go for it.

Chuck

> ?
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/f4f7391d/attachment.html>

From charlesr.harris at gmail.com  Sat Jun 30 18:42:45 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 30 Jun 2018 16:42:45 -0600
Subject: [Numpy-discussion] Polynomial evaluation inconsistencies
In-Reply-To: <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
References: <CAKGLbUoye_xBjsTtRMrFU5gsHOueTckDYns2PJjLt5_+Y1d-Mg@mail.gmail.com>
 <CAB6mnxJa-s+=cwTTQjM-9+0_35=1fSqbepTb5x-QJD+mf=vXsA@mail.gmail.com>
 <CAL1kJvDNhu66cL6rA90dF9cMH8Jxtp-KscWcZFuKjMSSkvJyYA@mail.gmail.com>
 <CAKGLbUoODTF7i76Cw7gTJg4i=nhsT2bW25OELJtRY_wYRpyEOg@mail.gmail.com>
 <CAL1kJvAze1fRTCXgdzCy=nu0rsiVv+wtd2UO4ouwnz3s5ootQw@mail.gmail.com>
 <CAB6mnxK3r+SXTusc3LPBV1JHuGzki=aSPVsen6PQu3D5eDrbHw@mail.gmail.com>
 <CAEBuzr8_yQ3NGZatkn5a9Xr0GYVV+13aRnJKCWUU+yv+ukOHXQ@mail.gmail.com>
 <CAL1kJvDDB7nvgXOLvkYCESxCjS0PxL7Dh6MJE-bZ5D6LjYpt_w@mail.gmail.com>
 <CAL1kJvC3WS993LfDOo--CYookEGRFLS-1b8O+rpcEdKcd69pug@mail.gmail.com>
Message-ID: <CAB6mnxJ2wArHBiiwv4EtHhT5tsez5sESF9gpGLy-HeBqyVKpAQ@mail.gmail.com>

On Sat, Jun 30, 2018 at 3:41 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Since the one of the arguments for the decreasing order seems to just be
> textual representation - do we want to tweak the repr to something like
>
> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>
> (And add a constructor that calls the lambda with Polynomial(1))
>
> Eric
>

IIRC there was a proposal for that. There is the possibility of adding
renderers for latex and html that could be used by Jupyter, and I think the
ordering was an option.

Chuck

> ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/7c818ecb/attachment.html>

From robert.kern at gmail.com  Sat Jun 30 22:23:48 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 30 Jun 2018 19:23:48 -0700
Subject: [Numpy-discussion] Revised NEP-18, __array_function__ protocol
In-Reply-To: <CAEQ_TvcTbbSivKZPG7pHsxHaGNw7XC8SVoahfN4MfsXT1uooYw@mail.gmail.com>
References: <CAEQ_TvfzqEXZUqsS_VjjwQs+JWk60pQFchdN8RmfMb-m_BumGA@mail.gmail.com>
 <CADViA5CFaMkcyg4p0m1+1J8dpy1Ln6kVYMKUS1=a8=zBjN1qrw@mail.gmail.com>
 <CAJNV+9sOaZNpGtVOriZYbDXTg-0QyqAjmEwt2PCM4j9-f3A0kQ@mail.gmail.com>
 <CADViA5Ctg0LMx9L40pNamDxxDBL-ZR+9GQtvT45F6mAhM3SQwA@mail.gmail.com>
 <CAJNV+9s=p01Qo3tRHY8x9amELHH4tbP5gxPErcLQSMrDb31w-w@mail.gmail.com>
 <CADViA5B6x2iGdrK1QrP5DfcYrzn=+N99Wj0Y9nep7Q5ES=BKaA@mail.gmail.com>
 <CAEQ_TvcTbbSivKZPG7pHsxHaGNw7XC8SVoahfN4MfsXT1uooYw@mail.gmail.com>
Message-ID: <CAF6FJisRK_eeJbRZ8gWjWdvqARw82st63bCTJ1fOtDiuyZ=Fbw@mail.gmail.com>

On Sat, Jun 30, 2018 at 12:14 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> I?d love to see a generic way of doing random number generation, but I
> agree with Martin that I don?t see it fitting a naturally into this NEP. An
> invasive change to add an array_reference argument to a bunch of functions
> might indeed be worthy of its own NEP, but again I?m not convinced that?s
> actually the right approach. I?d rather add a few new functions like
> random_like, which is a small enough change that concensus on the list
> might be enough.
>

random_like() seems very weird to me. It doesn't seem like a function that
anyone actually wants. It seems like what people actually want is to be
able to draw random numbers from any distribution as a specified array-like
type and shape, not just sample U(0, 1) with the shape of an existing array.

The most workable way to do this is to modify RandomGenerator (i.e. the new
RandomState design)[1] to accept the array-like type in the class
constructor, and modify its internals to do the right thing. Because the
intrusion on the API is so small, that doesn't require a NEP, just a PR (a
long, complicated, and tedious PR, to be sure)[2]. There are a bunch of
technical issues (if you want to avoid memory copies) because the Cython
implementation requires direct memory access, but that's intrinsic to any
solution to this problem, regardless of the API choices. random_like()
would have the same issues.

[1] https://github.com/bashtage/randomgen
[2] Sorry, Kevin.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180630/fe1c05d5/attachment.html>