From m.h.vankerkwijk at gmail.com Wed Nov 1 18:50:30 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 1 Nov 2017 18:50:30 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: >From my experience with Quantity, routines that properly ducktype work well, those that feel the need to accept list and blatantly do `asarray` do not - even if in many cases they would have worked if they used `asanyarray`... But there are lots of nice surprises, with, e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom line, I think you should let this stop you from trying only if you know something important does not work. -- Marten From nathan12343 at gmail.com Wed Nov 1 18:55:22 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 1 Nov 2017 17:55:22 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: I think the biggest issues could be resolved if __array_concatenate__ were finished. Unfortunately I don't feel like I can take that on right now. See Ryan May's talk at scipy about using an ndarray subclass for units and the issues he's run into: https://www.youtube.com/watch?v=qCo9bkT9sow On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > From my experience with Quantity, routines that properly ducktype work > well, those that feel the need to accept list and blatantly do > `asarray` do not - even if in many cases they would have worked if > they used `asanyarray`... But there are lots of nice surprises, with, > e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom > line, I think you should let this stop you from trying only if you > know something important does not work. -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 08:46:01 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 08:46:01 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum wrote: > I think the biggest issues could be resolved if __array_concatenate__ were > finished. Unfortunately I don't feel like I can take that on right now. > > See Ryan May's talk at scipy about using an ndarray subclass for units and > the issues he's run into: > > https://www.youtube.com/watch?v=qCo9bkT9sow > Interesting talk, but I don't see how general library code should know what units the output has. for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation) What are units of covariance or correlation between two variables with the same units, and what are they between variables with different units? How do you concatenate and operate arrays with different units? interpolation or prediction would work with using the existing units. partially related: statsmodels uses a wrapper for pandas Series and DataFrames and tries to preserve the index when possible and make up a new DataFrame or Series if the existing index doesn't apply. E.g. predicted values and residuals are in terms of the original provided index, and could also get original units assigned. That would also be possible with prediction confidence intervals. But for the rest, see above. Josef > > > On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> From my experience with Quantity, routines that properly ducktype work >> well, those that feel the need to accept list and blatantly do >> `asarray` do not - even if in many cases they would have worked if >> they used `asanyarray`... But there are lots of nice surprises, with, >> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >> line, I think you should let this stop you from trying only if you >> know something important does not work. -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 08:56:26 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 08:56:26 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 8:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > > How do you concatenate and operate arrays with different units? > > interpolation or prediction would work with using the existing units. > > partially related: > statsmodels uses a wrapper for pandas Series and DataFrames and tries to > preserve the index when possible and make up a new DataFrame or Series if > the existing index doesn't apply. > E.g. predicted values and residuals are in terms of the original provided > index, and could also get original units assigned. That would also be > possible with prediction confidence intervals. But for the rest, see above. > using pint >>> x >>> x / x >>> x / (1 + x) Traceback (most recent call last): File "", line 1, in File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 669, in __add__ raise DimensionalityError(self._units, 'dimensionless') return self._add_sub(other, operator.add) File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 580, in _add_sub pint.errors.DimensionalityError: Cannot convert from 'meter' to 'dimensionless' np.exp(x) raises pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) to 'dimensionless' (dimensionless) Josef > > Josef > > >> >> >> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> From my experience with Quantity, routines that properly ducktype work >>> well, those that feel the need to accept list and blatantly do >>> `asarray` do not - even if in many cases they would have worked if >>> they used `asanyarray`... But there are lots of nice surprises, with, >>> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >>> line, I think you should let this stop you from trying only if you >>> know something important does not work. -- Marten >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 11:51:54 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 11:51:54 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Hi Josef, astropy's Quantity is well developed and would give similar results to pint; all those results make sense if one wants to have consistent units. A general library code will actually do the right thing as long as it just uses normal mathematical operations with ufuncs - and as long as it just duck types! - the unit code will then override and properly propagate units to outputs, as can be seen in this example: ``` import astropy.units as u np.fft.fftfreq(8, 1*u.min) # np.fft.fftfreq(8, 1*u.min).var() # ``` > for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation) The units module will force you to take into account `dt`! This is in fact one reason why it is so powerful. So, your example might go something like: ``` flow = [1., 1.5, 1.5] * u.g / u.s dt = [0.5, 0.5, 1.] * u.hr np.sum(flow * dt) # np.sum(flow * dt).to(u.kg) # ``` > How do you concatenate and operate arrays with different units? This is where Nathaniel's `__array_concatenate__` would come in. For regular arrays it is fine to just concatenate, but for almost anything else you need a different approach. For quantities, the most logical one would be to first create an empty array of the right size with the unit of, e.g., the first part to be concatenated, and then set sections to the input quantities (where the setter does unit conversion and will fail if that is not possible). All the best, Marten p.s. A fun subject is what to do with logarithmic units, such as the magnitudes in astronomy... We have a module for that as well; http://docs.astropy.org/en/latest/units/logarithmic_units.html From rmay31 at gmail.com Thu Nov 2 12:23:44 2017 From: rmay31 at gmail.com (Ryan May) Date: Thu, 2 Nov 2017 10:23:44 -0600 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 6:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > A general library doesn't have to do anything--just not do annoying things like isinstance() checks and calling np.asarray() everywhere. Honestly one of those is the core of most of the problems I run into. It's currently more complicated when doing things in compiled land, but that's implementation details, not any kind of fundamental incompatibility. For basic mathematical operations, units have perfectly well defined semantics that many of us encountered in an introductory physics or chemistry class: - Want to add or subtract two things? They need to have the same units; a units library can handle conversion provided they have the same dimensionality (e.g. length, time) - Multiplication/Divison: combine and cancel units ( m/s * s -> m) Everything else we do on a computer with data in some way boils down to: add, subtract, multiply, divide. Average keeps the same units -- it's just a sum and division by a unit-less constant Integration (in 1-D) involves *two* variables, your data as well as the time/space coordinates (or dx or dt); fundamentally it's a multiplication by dx and a summation. The units results then are e.g. data.units * dx.units. This works just like it does in Physics 101 where you integrate velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > Well, covariance is subtracting the mean from each variable and multiplying the residuals; therefore the units for cov(x, y): (x.units - x.units) * (y.units - y.units) -> x.units * y.units Correlation takes covariance and divides by the product of the standard deviations, so that's: (x.units * y.units) / (x.units * y.units) -> dimensionless Which is what I'd expect for a correlation. > How do you concatenate and operate arrays with different units? > If all arrays have compatible dimensionality (say meters, inches, miles), you convert to one (say the first) and concatenate like normal. If they're not compatible, you error out. > interpolation or prediction would work with using the existing units. > I'm sure you wrote that thinking units didn't play a role, but the math behind those operations works perfectly fine with units, with things cancelling out properly to give the same units out as in. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 12:43:43 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 12:43:43 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 11:51 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Josef, > > astropy's Quantity is well developed and would give similar results to > pint; all those results make sense if one wants to have consistent > units. A general library code will actually do the right thing as long > as it just uses normal mathematical operations with ufuncs - and as > long as it just duck types! - the unit code will then override and > properly propagate units to outputs, as can be seen in this example: > ``` > import astropy.units as u > np.fft.fftfreq(8, 1*u.min) > # min> > np.fft.fftfreq(8, 1*u.min).var() > # > ``` > > > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > > The units module will force you to take into account `dt`! This is in > fact one reason why it is so powerful. So, your example might go > something like: > ``` > flow = [1., 1.5, 1.5] * u.g / u.s > dt = [0.5, 0.5, 1.] * u.hr > np.sum(flow * dt) > # > np.sum(flow * dt).to(u.kg) > # > ``` > > > How do you concatenate and operate arrays with different units? > > This is where Nathaniel's `__array_concatenate__` would come in. For > regular arrays it is fine to just concatenate, but for almost anything > else you need a different approach. For quantities, the most logical > one would be to first create an empty array of the right size with the > unit of, e.g., the first part to be concatenated, and then set > sections to the input quantities (where the setter does unit > conversion and will fail if that is not possible). > For example, "will fail if that is not possible" rules out inhomogeneous arrays (analogous to structure dtypes) How to you get a vander matrix for something simple like a polynomial fit? x[:, None] ** np.arange(3) > > All the best, > > Marten > > p.s. A fun subject is what to do with logarithmic units, such as the > magnitudes in astronomy... We have a module for that as well; > http://docs.astropy.org/en/latest/units/logarithmic_units.html similar, scipy.special has ufuncs what units are those? Most code that I know (i.e. scipy.stats and statsmodels) does not use only "normal mathematical operations with ufuncs" I guess there are a lot of "abnormal" mathematical operations where just simply propagating the units will not work. Aside: The problem is more general also for other datastructures. E.g. statsmodels for most parts uses only numpy ndarrays inside the algorithm and computations because that provides well defined behavior. (e.g. pandas behaved too differently in many cases) I don't have much idea yet about how to change the infrastructure to allow the use of dask arrays, sparse matrices and similar and possibly automatic differentiation. Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Thu Nov 2 12:46:41 2017 From: rmay31 at gmail.com (Ryan May) Date: Thu, 2 Nov 2017 10:46:41 -0600 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 6:56 AM, wrote: > On Thu, Nov 2, 2017 at 8:46 AM, wrote: > >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> What are units of covariance or correlation between two variables with >> the same units, and what are they between variables with different units? >> >> How do you concatenate and operate arrays with different units? >> >> interpolation or prediction would work with using the existing units. >> >> partially related: >> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >> preserve the index when possible and make up a new DataFrame or Series if >> the existing index doesn't apply. >> E.g. predicted values and residuals are in terms of the original provided >> index, and could also get original units assigned. That would also be >> possible with prediction confidence intervals. But for the rest, see above. >> > > using pint > > >>> x > > >>> x / x > > > >>> x / (1 + x) > Traceback (most recent call last): > File "", line 1, in > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 669, in __add__ > raise DimensionalityError(self._units, 'dimensionless') > return self._add_sub(other, operator.add) > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 580, in _add_sub > pint.errors.DimensionalityError: Cannot convert from 'meter' to > 'dimensionless' > I'm not sure why you have a problem with that results. You tried to take a number in meters and add a dimensionless value to that--that's not a defined operation. That's like saying: "I have a distance of 12 meters and added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, but I can do that operation. 1 second? That makes no sense. If you add units to the 1 then it's a defined operation: >>> reg = pint.UnitRegistry() >>> x / (1 * ureg.meters + x) > np.exp(x) > raises > pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) > to 'dimensionless' (dimensionless) > Well, the Taylor series for exp (around a=0) is: exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... so for that to properly add up, x needs to be dimensionless. It should be noted, though, that I've *never* seen a formula, theoretically derived or empirically fit, require directly taking exp(x) where x is a physical quantity with units. Instead, you have: f = a * exp(kx) Properly calculated values for a, k will have appropriate units attached to them that allows the calculation to proceed without error Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 12:52:29 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 12:52:29 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 12:23 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:46 AM, wrote: > >> >> >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> > > A general library doesn't have to do anything--just not do annoying things > like isinstance() checks and calling np.asarray() everywhere. Honestly one > of those is the core of most of the problems I run into. It's currently > more complicated when doing things in compiled land, but that's > implementation details, not any kind of fundamental incompatibility. > > For basic mathematical operations, units have perfectly well defined > semantics that many of us encountered in an introductory physics or > chemistry class: > - Want to add or subtract two things? They need to have the same units; a > units library can handle conversion provided they have the same > dimensionality (e.g. length, time) > - Multiplication/Divison: combine and cancel units ( m/s * s -> m) > > Everything else we do on a computer with data in some way boils down to: > add, subtract, multiply, divide. > > Average keeps the same units -- it's just a sum and division by a > unit-less constant > Integration (in 1-D) involves *two* variables, your data as well as the > time/space coordinates (or dx or dt); fundamentally it's a multiplication > by dx and a summation. The units results then are e.g. data.units * > dx.units. This works just like it does in Physics 101 where you integrate > velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) > > What are units of covariance or correlation between two variables with the >> same units, and what are they between variables with different units? >> > > Well, covariance is subtracting the mean from each variable and > multiplying the residuals; therefore the units for cov(x, y): > > (x.units - x.units) * (y.units - y.units) -> x.units * y.units > > Correlation takes covariance and divides by the product of the standard > deviations, so that's: > > (x.units * y.units) / (x.units * y.units) -> dimensionless > > Which is what I'd expect for a correlation. > > >> How do you concatenate and operate arrays with different units? >> > > If all arrays have compatible dimensionality (say meters, inches, miles), > you convert to one (say the first) and concatenate like normal. If they're > not compatible, you error out. > > >> interpolation or prediction would work with using the existing units. >> > > I'm sure you wrote that thinking units didn't play a role, but the math > behind those operations works perfectly fine with units, with things > cancelling out properly to give the same units out as in. > Some of it is in my reply to Marten. regression and polyfit requires an X matrix with different units and then some linear algebra like solve, pinv or svd. So, while the predicted values have well defined units, the computation involves some messier operations, unless you want to forgo linear algebra in all intermediate step and reduce it to sum, division and inverse. Josef > > Ryan > > -- > Ryan May > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 13:01:03 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 13:01:03 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 12:46 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:56 AM, wrote: > >> On Thu, Nov 2, 2017 at 8:46 AM, wrote: >> >>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >>> wrote: >>> >>>> I think the biggest issues could be resolved if __array_concatenate__ >>>> were finished. Unfortunately I don't feel like I can take that on right now. >>>> >>>> See Ryan May's talk at scipy about using an ndarray subclass for units >>>> and the issues he's run into: >>>> >>>> https://www.youtube.com/watch?v=qCo9bkT9sow >>>> >>> >>> >>> Interesting talk, but I don't see how general library code should know >>> what units the output has. >>> for example if units are some flows per unit of time and we average, sum >>> or integrate over time, then what are the new units? (e.g. pandas time >>> aggregation) >>> What are units of covariance or correlation between two variables with >>> the same units, and what are they between variables with different units? >>> >>> How do you concatenate and operate arrays with different units? >>> >>> interpolation or prediction would work with using the existing units. >>> >>> partially related: >>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >>> preserve the index when possible and make up a new DataFrame or Series if >>> the existing index doesn't apply. >>> E.g. predicted values and residuals are in terms of the original >>> provided index, and could also get original units assigned. That would also >>> be possible with prediction confidence intervals. But for the rest, see >>> above. >>> >> >> using pint >> >> >>> x >> >> >>> x / x >> >> >> >>> x / (1 + x) >> Traceback (most recent call last): >> File "", line 1, in >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 669, in __add__ >> raise DimensionalityError(self._units, 'dimensionless') >> return self._add_sub(other, operator.add) >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 580, in _add_sub >> pint.errors.DimensionalityError: Cannot convert from 'meter' to >> 'dimensionless' >> > > I'm not sure why you have a problem with that results. You tried to take a > number in meters and add a dimensionless value to that--that's not a > defined operation. That's like saying: "I have a distance of 12 meters and > added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, > but I can do that operation. 1 second? That makes no sense. > > If you add units to the 1 then it's a defined operation: > > >>> reg = pint.UnitRegistry() > >>> x / (1 * ureg.meters + x) > 'dimensionless')> > > >> np.exp(x) >> raises >> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) >> to 'dimensionless' (dimensionless) >> > > Well, the Taylor series for exp (around a=0) is: > > exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... > > so for that to properly add up, x needs to be dimensionless. It should be > noted, though, that I've *never* seen a formula, theoretically derived or > empirically fit, require directly taking exp(x) where x is a physical > quantity with units. Instead, you have: > > f = a * exp(kx) > > Properly calculated values for a, k will have appropriate units attached > to them that allows the calculation to proceed without error > I was thinking of a simple logit model to predict whether it rains tomorrow The Logit transformation for the probability is exp(k x) / (1 + exp(k x) where k is a parameter to search for in the optimization. x is a matrix with all predictors or explanatory variables which could all have different units. So it sounds to me if we drop asarray, then we just get exceptions or possibly strange results, or we have to introduce a unit that matches everything (like a joker card) for any constants that we are using. Josef > > Ryan > > -- > Ryan May > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 14:39:42 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 14:39:42 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Hi Josef, Indeed, for some applications one would like to have different units for different parts of an array. And that means that, at present, the quantity implementations that we have are no good at storing, say, a covariance matrix involving parameters with different units, where thus each element of the covariance matrix has a different unit. I fear at present it would have to be an object array instead; other cases may be a bit easier to solve, by, e.g., allowing structured arrays with similarly structured units. I do note that actually doing it would clarify, e.g., what the axes in Vandermonde (spelling?) matrices mean. That said, there is truly an enormous benefit for checking units on "regular" operations. Spacecraft have missed Mars because people didn't do it properly... All the best, Marten p.s. The scipy functions should indeed be included in the ufuncs covered; there is a fairly long-standing issue for that in astropy... From josef.pktd at gmail.com Thu Nov 2 15:33:18 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 15:33:18 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 2:39 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Josef, > > Indeed, for some applications one would like to have different units > for different parts of an array. And that means that, at present, the > quantity implementations that we have are no good at storing, say, a > covariance matrix involving parameters with different units, where > thus each element of the covariance matrix has a different unit. I > fear at present it would have to be an object array instead; other > cases may be a bit easier to solve, by, e.g., allowing structured > arrays with similarly structured units. I do note that actually doing > it would clarify, e.g., what the axes in Vandermonde (spelling?) > matrices mean. > (I have problems remembering the spelling of proper names) np.vander and various polyvander functions/methods One point I wanted to make is that the units are overhead and irrelevant in the computation. It's the outcome that might have units. Eg. polyfit could use various underlying polynomials, e.g. numpy.polynomial.chebyshev.chebvander(...) and various linear algebra and projection versions, and the output would still be the same units. aside: I just found an interesting http://docs.astropy.org/en/latest/api/astropy.stats.biweight.biweight_midcovariance.html is pairwise, but uses asanyarray e.g. using asarray (for robust scatter) https://github.com/statsmodels/statsmodels/pull/3230/files#diff-8fd46d3044db86ae7992f5d817eec6c7R473 I guess I would have problems replacing asarray by asanyarray. one last related one What's the inverse of a covariance matrix? It's just sum, multiplication and division (which I wouldn't remember), but for the computation is just np.linalg.inv or np.linalg.pinv which is a simple shortcut. Josef > > That said, there is truly an enormous benefit for checking units on > "regular" operations. Spacecraft have missed Mars because people > didn't do it properly... > https://twitter.com/search?q=2%20unit%20tests.%200%20integration%20tests > > All the best, > > Marten > > p.s. The scipy functions should indeed be included in the ufuncs > covered; there is a fairly long-standing issue for that in astropy... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 15:37:01 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 02 Nov 2017 19:37:01 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 9:45 AM wrote: > similar, scipy.special has ufuncs > what units are those? > > Most code that I know (i.e. scipy.stats and statsmodels) does not use only > "normal mathematical operations with ufuncs" > I guess there are a lot of "abnormal" mathematical operations > where just simply propagating the units will not work. > > Aside: The problem is more general also for other datastructures. > E.g. statsmodels for most parts uses only numpy ndarrays inside the > algorithm and computations because that provides well defined > behavior. (e.g. pandas behaved too differently in many cases) > I don't have much idea yet about how to change the infrastructure to > allow the use of dask arrays, sparse matrices and similar and possibly > automatic differentiation. > This is the exact same reason why pandas and xarray do not support wrapping arbitrary ndarray subclasses or duck array types. The operations we use internally (on numpy.ndarray objects) may not be what you would expect externally, and may even be implementation details not considered part of the public API. For example, in xarray we use numpy.nanmean() or bottleneck.nanmean() instead of numpy.mean(). For NumPy and xarray, I think we could (and should) define an interface to support subclasses and duck types for generic operations for core use-cases. My main concern with subclasses / duck-arrays is undefined/untested behavior, especially where we might silently give the wrong answer or trigger some undesired operation (e.g., loading a lazily computed into memory) rather than raising an informative error. Leaking implementation details is another concern: we have already had several cases in NumPy where a function only worked on a subclass if a particular method was called internally, and broke when that was changed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Nov 2 15:40:26 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 2 Nov 2017 14:40:26 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 9:45 AM wrote: > >> similar, scipy.special has ufuncs >> what units are those? >> >> Most code that I know (i.e. scipy.stats and statsmodels) does not use only >> "normal mathematical operations with ufuncs" >> I guess there are a lot of "abnormal" mathematical operations >> where just simply propagating the units will not work. >> > >> Aside: The problem is more general also for other datastructures. >> E.g. statsmodels for most parts uses only numpy ndarrays inside the >> algorithm and computations because that provides well defined >> behavior. (e.g. pandas behaved too differently in many cases) >> I don't have much idea yet about how to change the infrastructure to >> allow the use of dask arrays, sparse matrices and similar and possibly >> automatic differentiation. >> > > This is the exact same reason why pandas and xarray do not support > wrapping arbitrary ndarray subclasses or duck array types. The operations > we use internally (on numpy.ndarray objects) may not be what you would > expect externally, and may even be implementation details not considered > part of the public API. For example, in xarray we use numpy.nanmean() or > bottleneck.nanmean() instead of numpy.mean(). > > For NumPy and xarray, I think we could (and should) define an interface to > support subclasses and duck types for generic operations for core > use-cases. My main concern with subclasses / duck-arrays is > undefined/untested behavior, especially where we might silently give the > wrong answer or trigger some undesired operation (e.g., loading a lazily > computed into memory) rather than raising an informative error. Leaking > implementation details is another concern: we have already had several > cases in NumPy where a function only worked on a subclass if a particular > method was called internally, and broke when that was changed. > Would this issue be ameliorated given Nathaniel's proposal to try to move away from subclasses and towards storing data in dtypes? Or would that just mean that xarray would need to ban dtypes it doesn't know about? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.debuyl at kuleuven.be Thu Nov 2 15:38:24 2017 From: pierre.debuyl at kuleuven.be (Pierre de Buyl) Date: Thu, 2 Nov 2017 20:38:24 +0100 Subject: [Numpy-discussion] Python @ FOSDEM 2018 Message-ID: <20171102193824.GA24760@pi-x230> Dear SciPythonists and NumPythonists, FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. For FOSDEM 2018, we will try the new concept of a virtual Python-devroom: there is no dedicated Python room but instead, we promote the presence of Python in all devrooms. We hope to have at least one Python talk in every devroom (Yes, even in Perl, Ada, Go and Rust devrooms ;-) ). How can you help to highlight the Python community at Python-FOSDEM 2018? Propose your talk in the closest related devroom: https://fosdem.org/2018/news/2017-10-04-accepted-developer-rooms/ Not all devrooms are language-specific and a number of topics come to mind for data and science participants: "Monitoring & Cloud devroom" https://lists.fosdem.org/pipermail/fosdem/2017-October/002631.html "HPC, Big Data, and Data Science" https://lists.fosdem.org/pipermail/fosdem/2017-October/002615.html "LLVM toolchain" https://lists.fosdem.org/pipermail/fosdem/2017-October/002624.html Most call for contributions end around the 24 of november. Send a copy of your proposition to python-devroom AT lists.fosdem DOT org. We will publish a dedicated schedule for Python on https://python-fosdem.org/ and at our stand. A dinner will be also organized, stay tuned. We are waiting for your talks proposals. The Python-FOSDEM committee From harrigan.matthew at gmail.com Thu Nov 2 16:39:08 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Thu, 2 Nov 2017 16:39:08 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Numpy already does support a specific unit, datetime64 and timedelta64, think through that very mechanism. Its also probably the most complicated unit since at least there is no such thing as leap meters. And it works well and is very useful IMHO On Thu, Nov 2, 2017 at 3:40 PM, Nathan Goldbaum wrote: > > > On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > >> On Thu, Nov 2, 2017 at 9:45 AM wrote: >> >>> similar, scipy.special has ufuncs >>> what units are those? >>> >>> Most code that I know (i.e. scipy.stats and statsmodels) does not use >>> only >>> "normal mathematical operations with ufuncs" >>> I guess there are a lot of "abnormal" mathematical operations >>> where just simply propagating the units will not work. >>> >> >>> Aside: The problem is more general also for other datastructures. >>> E.g. statsmodels for most parts uses only numpy ndarrays inside the >>> algorithm and computations because that provides well defined >>> behavior. (e.g. pandas behaved too differently in many cases) >>> I don't have much idea yet about how to change the infrastructure to >>> allow the use of dask arrays, sparse matrices and similar and possibly >>> automatic differentiation. >>> >> >> This is the exact same reason why pandas and xarray do not support >> wrapping arbitrary ndarray subclasses or duck array types. The operations >> we use internally (on numpy.ndarray objects) may not be what you would >> expect externally, and may even be implementation details not considered >> part of the public API. For example, in xarray we use numpy.nanmean() or >> bottleneck.nanmean() instead of numpy.mean(). >> >> For NumPy and xarray, I think we could (and should) define an interface >> to support subclasses and duck types for generic operations for core >> use-cases. My main concern with subclasses / duck-arrays is >> undefined/untested behavior, especially where we might silently give the >> wrong answer or trigger some undesired operation (e.g., loading a lazily >> computed into memory) rather than raising an informative error. Leaking >> implementation details is another concern: we have already had several >> cases in NumPy where a function only worked on a subclass if a particular >> method was called internally, and broke when that was changed. >> > > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 17:05:08 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 17:05:08 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: My 2? here is that all code should feel free to assume certain type of input, as long as it is documented properly, but there is no reason to enforce that by, e.g., putting `asarray` everywhere. Then, for some pieces ducktypes and subclasses will just work like magic, and uses you might never have foreseen become possible. For others, whoever wants to use them has to do work (and up to a package maintainers to decide whether or not to accept PRs that implement hooks, etc.) I do see the argument that this way one becomes constrained in the internal implementation, as a change may break an outward-looking function, but while at times this may be inconvenient, in my experience at others it may just make one realize an even better implementation is possible. But then, I really like duck-typing... -- Marten From ben.v.root at gmail.com Thu Nov 2 17:09:33 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 2 Nov 2017 17:09:33 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Duck typing is great and all for classes that implement some or all of the ndarray interface.... but remember what the main reason for asarray() and asanyarray(): to automatically promote lists and tuples and other "array-likes" to ndarrays. Ignoring the use-case of lists of lists is problematic at best. Ben Root On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > My 2? here is that all code should feel free to assume certain type of > input, as long as it is documented properly, but there is no reason to > enforce that by, e.g., putting `asarray` everywhere. Then, for some > pieces ducktypes and subclasses will just work like magic, and uses > you might never have foreseen become possible. For others, whoever > wants to use them has to do work (and up to a package maintainers to > decide whether or not to accept PRs that implement hooks, etc.) > > I do see the argument that this way one becomes constrained in the > internal implementation, as a change may break an outward-looking > function, but while at times this may be inconvenient, in my > experience at others it may just make one realize an even better > implementation is possible. But then, I really like duck-typing... > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 17:37:21 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 17:37:21 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root wrote: > Duck typing is great and all for classes that implement some or all of the > ndarray interface.... but remember what the main reason for asarray() and > asanyarray(): to automatically promote lists and tuples and other > "array-likes" to ndarrays. Ignoring the use-case of lists of lists is > problematic at best. How I wish numpy had never gone there! Convenience for what, exactly? For the user not having to put `array()` around the list themselves? We slow down everything for that? And even now we're trying to remove some of the cases where both tuples and lists are allowed. Grrrrrr. Of course, we are well and truly stuck with it - now it is one of the main reasons to subclass rather than duck-type... Anyway, water under the bridge... -- Marten From josef.pktd at gmail.com Thu Nov 2 17:51:57 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 17:51:57 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root wrote: > Duck typing is great and all for classes that implement some or all of the > ndarray interface.... but remember what the main reason for asarray() and > asanyarray(): to automatically promote lists and tuples and other > "array-likes" to ndarrays. Ignoring the use-case of lists of lists is > problematic at best. > > Ben Root > > > On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> My 2? here is that all code should feel free to assume certain type of >> input, as long as it is documented properly, but there is no reason to >> enforce that by, e.g., putting `asarray` everywhere. Then, for some >> pieces ducktypes and subclasses will just work like magic, and uses >> you might never have foreseen become possible. For others, whoever >> wants to use them has to do work (and up to a package maintainers to >> decide whether or not to accept PRs that implement hooks, etc.) >> >> I do see the argument that this way one becomes constrained in the >> internal implementation, as a change may break an outward-looking >> function, but while at times this may be inconvenient, in my >> experience at others it may just make one realize an even better >> implementation is possible. But then, I really like duck-typing... >> > One problem in general is that there is no protocol about what operations are implemented in a numpy ndarray equivalent way in those ducks, i.e. if they quack in a compatible way. One small example, pandas standard deviation, std, used by default ddof=1, and didn't have an option to override it instead of using ddof=0 that numpy uses. So even though we could call a std method of the ducks, the t-test results would be a bit different and visibly different in small samples depending on the type of the data. A possible alternative would be to compute std from scratch and forgo the available function or method. I tried once in the scipy.zscore function to be agnostic about the type and not use asarray, it's a simple operation but still it required special handling of numpy matrices because it preserves the dimension in reduce operations. After more than a few lines it is difficult to keep track of what type is no used. Another subclass that is often broken in default code are masked arrays because asarray throws away the mask. But asanyarray wouldn't work always either because the mask needs code for handling the masked values. For example scipy.stats ended up with separate functions for masked arrays. Josef > >> -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 18:21:06 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 02 Nov 2017 22:21:06 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum wrote: > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > Yes, I think custom dtypes would definitely help. Custom dtypes have a well contained interface, so lots of operations (e.g., concatenate, reshaping, indexing) are guaranteed to work in a dtype independent way. If you try to do an unsupported operation for such a dtype (e.g., np.datetime64), you will generally get a good error message about an invalid dtype. In contrast, you can overload a subclass with totally arbitrary semantics (e.g., np.matrix) and of course for duck-types as well. This makes a big difference for libraries like dask or xarray, which need a standard interface to guarantee they do the right thing. I'm pretty sure we can wrap a custom dtype ndarray with units, but there's no way we're going to support np.matrix without significant work. It's hard to know which is which without well defined interfaces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Nov 2 18:33:16 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 2 Nov 2017 17:33:16 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum > wrote: > >> Would this issue be ameliorated given Nathaniel's proposal to try to move >> away from subclasses and towards storing data in dtypes? Or would that just >> mean that xarray would need to ban dtypes it doesn't know about? >> > > Yes, I think custom dtypes would definitely help. Custom dtypes have a > well contained interface, so lots of operations (e.g., concatenate, > reshaping, indexing) are guaranteed to work in a dtype independent way. If > you try to do an unsupported operation for such a dtype (e.g., > np.datetime64), you will generally get a good error message about an > invalid dtype. > > In contrast, you can overload a subclass with totally arbitrary semantics > (e.g., np.matrix) and of course for duck-types as well. > > This makes a big difference for libraries like dask or xarray, which need > a standard interface to guarantee they do the right thing. I'm pretty sure > we can wrap a custom dtype ndarray with units, but there's no way we're > going to support np.matrix without significant work. It's hard to know > which is which without well defined interfaces. > Ah, but what if the dtype modifies the interface? That might sound evil, but it's something that's been proposed. For example, if I wanted to replace yt's YTArray in a backward compatibile way with a dtype and just use plain ndarrays everywhere, the dtype would need to *at least* modify ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and several other things. Of course if I don't care about backward compatibility I can just do all of these operations on the dtype object itself. However, I suspect whatever implementation of custom dtypes gets added to numpy, it will have the property that it can act like an arbitrary ndarray subclass otherwise libraries like yt, Pint, metpy, and astropy won't be able to switch to it. -Nathan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 18:39:30 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 18:39:30 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: I guess my argument boils down to it being better to state that a function only accepts arrays and happily let it break on, e.g., matrix, than use `asarray` to make a matrix into an array even though it really isn't. I do like the dtype ideas, but think I'd agree they're likely to come with their own problems. But just making new numerical types possible is interesting. -- Marten From shoyer at gmail.com Thu Nov 2 20:33:38 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 03 Nov 2017 00:33:38 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Maybe the best of both worlds would require explicit opt-in for classes that shouldn't be coerced, e.g., xarray.register_data_type(MyArray) or maybe better yet ;) xarray.void_my_nonexistent_warranty_its_my_fault_if_my_buggy_duck_array_breaks_everything(MyArray) On Thu, Nov 2, 2017 at 3:39 PM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > I guess my argument boils down to it being better to state that a > function only accepts arrays and happily let it break on, e.g., > matrix, than use `asarray` to make a matrix into an array even though > it really isn't. > > I do like the dtype ideas, but think I'd agree they're likely to come > with their own problems. But just making new numerical types possible > is interesting. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 20:35:36 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 03 Nov 2017 00:35:36 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 3:35 PM Nathan Goldbaum wrote: > Ah, but what if the dtype modifies the interface? That might sound evil, > but it's something that's been proposed. For example, if I wanted to > replace yt's YTArray in a backward compatibile way with a dtype and just > use plain ndarrays everywhere, the dtype would need to *at least* modify > ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and > several other things. > I suppose we'll need to sort this out. But adding new methods/properties feels pretty safe to me, as long as existing ones are guaranteed to work in the same way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri Nov 3 10:30:13 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 3 Nov 2017 10:30:13 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: Yes, I like the idea of, effectively, creating an ABC for ndarray - with which one can register. -- Marten From charlesr.harris at gmail.com Fri Nov 3 22:56:38 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Nov 2017 20:56:38 -0600 Subject: [Numpy-discussion] NumPy 1.14 branch. Message-ID: Hi All, I'd like to branch NumPy 1.14 soon. Before doing so, I'd like to make sure at a minimum that 1) Changes in array print formatting are done. 2) Proposed deprecations have been make. If there are other things that folks see as essential, now is the time to speak up. Chuckn -------------- next part -------------- An HTML attachment was scrubbed... URL: From bennyrowland at mac.com Sat Nov 4 06:42:34 2017 From: bennyrowland at mac.com (Ben Rowland) Date: Sat, 04 Nov 2017 10:42:34 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: Message-ID: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> > On 2 Nov 2017, at 22:39, Marten van Kerkwijk wrote: > > I guess my argument boils down to it being better to state that a > function only accepts arrays and happily let it break on, e.g., > matrix, than use `asarray` to make a matrix into an array even though > it really isn?t. I would support this attitude, the user can always call `asarray` when passing their data into the function if necessary, then they know upfront what the consequences will be. For my own ndarray subclass, I want it to behave exactly as a standard ndarray, but in addition I add some metadata and some functions that act on that, for example an affine transform and functions to convert between coordinate systems. The current numpy system of overriding __array_wrap__, __array_finalize__ and __new__ is great to allow the subclass and metadata to propagate through most basic operations. The problem is that many functions using `asarray` strip out all of this metadata and return a bare ndarray. My current solution is to implement an `inherit` method on my subclass which converts an ndarray and copies back all the metadata, which often looks like this: spec_data = data.inherit(np.fft.fft(data)) To use composition instead of inheritance would require me to forward every part of the ndarray API as is, which would be a great deal of work which in nearly every case would only achieve the same results as replacing `asarray` by `asanyarray` in various library functions. I don?t want to change the behaviour of the existing class, just to add some data and methods, and I can?t imagine I am alone in that. Ben > > I do like the dtype ideas, but think I'd agree they're likely to come > with their own problems. But just making new numerical types possible > is interesting. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From m.h.vankerkwijk at gmail.com Sat Nov 4 09:47:15 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 4 Nov 2017 09:47:15 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: Hi Ben, You just summarized excellently why I'm on a quest to change `asarray` to `asanyarray` within numy (or at least add a `subok` keyword for things like `broadcast_arrays`)! Obviously, this covers only ndarray subclasses, not duck types, though I guess in principle one could use the ABC registration mechanism mentioned above to let those types pass through. Returning to the original topic of the thread, with `__array_ufunc__` it now is even easier to keep track of your meta data for ufuncs, and has become possible to massage input data before the ufunc is called (rather than just the output). All the best, Marten From charlesr.harris at gmail.com Sun Nov 5 13:25:37 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 5 Nov 2017 11:25:37 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support Message-ID: Hi All, Thought I'd toss this out there. I'm tending towards better sooner than later in dropping Python 2.7 support as we are starting to run up against places where we would like to use Python 3 features. That is particularly true on Windows where the 2.7 compiler is really old and lacks C99 compatibility. In any case, the timeline I've been playing with is to keep Python 2.7 support through 2018, which given our current pace, would be for NumPy 1.15 and 1.16. After that 1.16 would become a long term support release with backports of critical bug fixes up until the time that Python 2.7 support officially ends. In that timeline, NumPy 1.17 would drop support for 2.7. That proposed schedule is subject to change pending developments and feed back. The main task I think is needed before dropping 2.7 is better handling of unicode strings and bytes. There is the #4208 PR that makes a start on that. If there are other things that folks think are essential, please mention them here. If nothing else, we can begin planning for the transition even if the schedule changes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 6 04:56:18 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Nov 2017 22:56:18 +1300 Subject: [Numpy-discussion] NumPy 1.14 branch. In-Reply-To: References: Message-ID: On Sat, Nov 4, 2017 at 3:56 PM, Charles R Harris wrote: > Hi All, > > I'd like to branch NumPy 1.14 soon. > Sounds good. Before doing so, I'd like to make sure at a minimum that > > 1) Changes in array print formatting are done. > 2) Proposed deprecations have been make. > > If there are other things that folks see as essential, now is the time to > speak up. > Are we good on the pytest status? I see https://github.com/numpy/numpy/pull/9386 is still open. Ralf > Chuckn > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 6 05:10:33 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Nov 2017 23:10:33 +1300 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris wrote: > Hi All, > > Thought I'd toss this out there. I'm tending towards better sooner than > later in dropping Python 2.7 support as we are starting to run up against > places where we would like to use Python 3 features. That is particularly > true on Windows where the 2.7 compiler is really old and lacks C99 > compatibility. > This is probably the most pressing reason to drop 2.7 support. We seem to be expending a lot of effort lately on this stuff. I was previously advocating being more conservative than the timeline you now propose, but this is the pain point that I think gets me over the line. In any case, the timeline I've been playing with is to keep Python 2.7 > support through 2018, which given our current pace, would be for NumPy 1.15 > and 1.16. After that 1.16 would become a long term support release with > backports of critical bug fixes up until the time that Python 2.7 support > officially ends. In that timeline, NumPy 1.17 would drop support for 2.7. > And 3.4 at the same time or even earlier. That proposed schedule is subject to change pending developments and feed > back. > +1 > The main task I think is needed before dropping 2.7 is better handling of > unicode strings and bytes. There is the #4208 > PR that makes a start on that. > Yep, at the very least we need one release that supports 2.7 *and* has fixed all the IO issues on 3.x Ralf If there are other things that folks think are essential, please mention > them here. If nothing else, we can begin planning for the transition even > if the schedule changes. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 6 10:56:11 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Nov 2017 08:56:11 -0700 Subject: [Numpy-discussion] NumPy 1.14 branch. In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 2:56 AM, Ralf Gommers wrote: > > > On Sat, Nov 4, 2017 at 3:56 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> I'd like to branch NumPy 1.14 soon. >> > > Sounds good. > > Before doing so, I'd like to make sure at a minimum that >> >> 1) Changes in array print formatting are done. >> 2) Proposed deprecations have been make. >> >> If there are other things that folks see as essential, now is the time to >> speak up. >> > > Are we good on the pytest status? I see https://github.com/numpy/ > numpy/pull/9386 is still open. > I'm pushing off finishing the pytest transition to 1.15. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Nov 6 12:27:25 2017 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 6 Nov 2017 19:27:25 +0200 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 134, Issue 10 In-Reply-To: References: Message-ID: <6f32661d-83ee-c64f-51f8-b82597d8aa24@gmail.com> An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 6 17:18:24 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 6 Nov 2017 14:18:24 -0800 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > > You just summarized excellently why I'm on a quest to change `asarray` > to `asanyarray` within numpy +1 -- we should all be using asanyarray() most of the time. However a couple notes: asarray() pre-dates asanyarray() by a LOT. asanyarray was added to better handle subclasses, but there is a lot of legacy code out there. An legacy coders -- I know that I still usually use asarray without thinking about it -- sorry! Obviously, this covers only ndarray > subclasses, not duck types, though I guess in principle one could use > the ABC registration mechanism mentioned above to let those types pass > through. > The trick there is that what does it mean to be duck-typed to an ndarray? For may applications its' critical that the C API be the same, so duck-typing doesn't really apply. And in other cases, in only needs to support a small portion of the numpy API. IS essence, there are an almost infinite number of possible ABCs for an ndarray... For my part, I've been known to write custom "array_like" code -- it checks for the handful of methods I know I need to use, and tI test it against the small handful of duck-typed arrays that I know I want my code to work with. Klunky, and maybe we could come up with a standard way to do it and include that in numpy, but I'm not sure that ABCs are the way to do it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 6 17:24:07 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 6 Nov 2017 14:24:07 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Sun, Nov 5, 2017 at 10:25 AM, Charles R Harris wrote: > the timeline I've been playing with is to keep Python 2.7 support through > 2018, which given our current pace, would be for NumPy 1.15 and 1.16. After > that 1.16 would become a long term support release with backports of > critical bug fixes > +1 I think py2.7 is going to be around for a long time yet -- which means we really do want to keep the long term support -- which may be quite some time. But that's doesn't mean people insisting on no upgrading PYthon need to get the latest and greatest numpy. Also -- if py2.7 continues to see the use I expect it will well past when pyton.org officially drops it, I wouldn't be surprised if a Python2.7 Windows build based on a newer compiler would come along -- perhaps by Anaconda or conda-forge, or ??? If that happens, I suppose we could re-visit 2.7 support. Though it sure would be nice to clean up the dang Unicode stuff for good, too! In short, if it makes it easier for numpy to move forward, let's do it! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Mon Nov 6 17:28:04 2017 From: rmay31 at gmail.com (Ryan May) Date: Mon, 6 Nov 2017 15:28:04 -0700 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker wrote: > Klunky, and maybe we could come up with a standard way to do it and > include that in numpy, but I'm not sure that ABCs are the way to do it. > ABCs are *absolutely* the way to go about it. It's the only way baked into the Python language itself that allows you to register a class for purposes of `isinstance` without needing to subclass--i.e. duck-typing. What's needed, though, is not just a single ABC. Some thought and design needs to go into segmenting the ndarray API to declare certain behaviors, just like was done for collections: https://docs.python.org/3/library/collections.abc.html You don't just have a single ABC declaring a collection, but rather "I am a mapping" or "I am a mutable sequence". It's more of a pain for developers to properly specify things, but this is not a bad thing to actually give code some thought. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon Nov 6 19:28:17 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 07 Nov 2017 00:28:17 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Mon, Nov 6, 2017 at 2:29 PM Ryan May wrote: > On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker > wrote: > >> Klunky, and maybe we could come up with a standard way to do it and >> include that in numpy, but I'm not sure that ABCs are the way to do it. >> > > ABCs are *absolutely* the way to go about it. It's the only way baked into > the Python language itself that allows you to register a class for purposes > of `isinstance` without needing to subclass--i.e. duck-typing. > > What's needed, though, is not just a single ABC. Some thought and design > needs to go into segmenting the ndarray API to declare certain behaviors, > just like was done for collections: > > https://docs.python.org/3/library/collections.abc.html > > You don't just have a single ABC declaring a collection, but rather "I am > a mapping" or "I am a mutable sequence". It's more of a pain for developers > to properly specify things, but this is not a bad thing to actually give > code some thought. > I agree, it would be nice to nail down a hierarchy of duck-arrays, if possible. Although, there are quite a few options, so I don't know how doable this is. Any interest in opening up an issue on GitHub to discuss? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Nov 6 20:37:49 2017 From: cournape at gmail.com (David Cournapeau) Date: Tue, 7 Nov 2017 10:37:49 +0900 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 7:24 AM, Chris Barker wrote: > On Sun, Nov 5, 2017 at 10:25 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > > >> the timeline I've been playing with is to keep Python 2.7 support >> through 2018, which given our current pace, would be for NumPy 1.15 and >> 1.16. After that 1.16 would become a long term support release with >> backports of critical bug fixes >> > > +1 > > I think py2.7 is going to be around for a long time yet -- which means we > really do want to keep the long term support -- which may be quite some > time. But that's doesn't mean people insisting on no upgrading PYthon need > to get the latest and greatest numpy. > > Also -- if py2.7 continues to see the use I expect it will well past when > pyton.org officially drops it, I wouldn't be surprised if a Python2.7 > Windows build based on a newer compiler would come along -- perhaps by > Anaconda or conda-forge, or ??? > I suspect that this will indeed happen. I am aware of multiple companies following this path already (building python + numpy themselves with a newer MS compiler). David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 6 21:14:06 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Nov 2017 19:14:06 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 6:37 PM, David Cournapeau wrote: > > > On Tue, Nov 7, 2017 at 7:24 AM, Chris Barker > wrote: > >> On Sun, Nov 5, 2017 at 10:25 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >> >>> the timeline I've been playing with is to keep Python 2.7 support >>> through 2018, which given our current pace, would be for NumPy 1.15 and >>> 1.16. After that 1.16 would become a long term support release with >>> backports of critical bug fixes >>> >> >> +1 >> >> I think py2.7 is going to be around for a long time yet -- which means we >> really do want to keep the long term support -- which may be quite some >> time. But that's doesn't mean people insisting on no upgrading PYthon need >> to get the latest and greatest numpy. >> >> Also -- if py2.7 continues to see the use I expect it will well past when >> pyton.org officially drops it, I wouldn't be surprised if a Python2.7 >> Windows build based on a newer compiler would come along -- perhaps by >> Anaconda or conda-forge, or ??? >> > > I suspect that this will indeed happen. I am aware of multiple companies > following this path already (building python + numpy themselves with a > newer MS compiler). > I think Anaconda is talking about distributing a compiler, but what that will be on windows is anyone's guess. When we drop 2.7, there is a lot of compatibility crud that it would be nice to get rid of, and if we do that then NumPy will no longer compile against 2.7. I suspect some companies have just been putting off the task of upgrading to Python 3, which should be pretty straight forward these days apart from system code that needs to do a lot of work with bytes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Tue Nov 7 09:53:28 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 7 Nov 2017 09:53:28 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: Well, to get the ball rolling a bit, the key thing that matplotlib needs to know is if `shape`, `reshape`, 'size', broadcasting, and logical indexing is respected. So, I see three possible abc's here: one for attribute access (things like `shape` and `size`) and another for shape manipulations (broadcasting and reshape, and assignment to .shape). And then a third abc for indexing support, although, I am not sure how that could get implemented... Cheers! Ben Root On Mon, Nov 6, 2017 at 7:28 PM, Stephan Hoyer wrote: > On Mon, Nov 6, 2017 at 2:29 PM Ryan May wrote: > >> On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker >> wrote: >> >>> Klunky, and maybe we could come up with a standard way to do it and >>> include that in numpy, but I'm not sure that ABCs are the way to do it. >>> >> >> ABCs are *absolutely* the way to go about it. It's the only way baked >> into the Python language itself that allows you to register a class for >> purposes of `isinstance` without needing to subclass--i.e. duck-typing. >> >> What's needed, though, is not just a single ABC. Some thought and design >> needs to go into segmenting the ndarray API to declare certain behaviors, >> just like was done for collections: >> >> https://docs.python.org/3/library/collections.abc.html >> >> You don't just have a single ABC declaring a collection, but rather "I am >> a mapping" or "I am a mutable sequence". It's more of a pain for developers >> to properly specify things, but this is not a bad thing to actually give >> code some thought. >> > > I agree, it would be nice to nail down a hierarchy of duck-arrays, if > possible. Although, there are quite a few options, so I don't know how > doable this is. Any interest in opening up an issue on GitHub to discuss? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue Nov 7 11:18:27 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 7 Nov 2017 11:18:27 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: Hi Benjamin, For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC for astropy, which may be a useful starting point as it also provides a way to implement the methods ndarray uses to reshape and get elements: see https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863 All the best, Marten From chris.barker at noaa.gov Tue Nov 7 15:14:16 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 7 Nov 2017 12:14:16 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris wrote: > Also -- if py2.7 continues to see the use I expect it will well past when >>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7 >>> Windows build based on a newer compiler would come along -- perhaps by >>> Anaconda or conda-forge, or ??? >>> >> >> I suspect that this will indeed happen. I am aware of multiple companies >> following this path already (building python + numpy themselves with a >> newer MS compiler). >> > > I think Anaconda is talking about distributing a compiler, but what that > will be on windows is anyone's guess. When we drop 2.7, there is a lot of > compatibility crud that it would be nice to get rid of, and if we do that > then NumPy will no longer compile against 2.7. I suspect some companies > have just been putting off the task of upgrading to Python 3, which should > be pretty straight forward these days apart from system code that needs to > do a lot of work with bytes. > I agree, and if there is a compelling reason to upgrade, folks WILL do it. But I've been amazed over the years at folks' desire to stick with what they have! And I'm guilty too, anything new I start with py3, but older larger codebases are still py2, I just can't find the energy to spend a the week or so it would probably take to update everything... But in the original post, the Windows Compiler issue was mentioned, so there seems to be two reasons to drop py2: A) wanting to use py3 only features. B) wanting to use never C (C++?) compiler features. I suggest we be clear about which of these is driving the decisions, and explicit about the goals. That is, if (A) is critical, we don't even have to talk about (B) But we could choose to do (B) without doing (A) -- I suspect there will be a user base for that.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Nov 7 15:20:49 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 7 Nov 2017 12:20:49 -0800 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer wrote: > >> What's needed, though, is not just a single ABC. Some thought and design >> needs to go into segmenting the ndarray API to declare certain behaviors, >> just like was done for collections: >> >> https://docs.python.org/3/library/collections.abc.html >> >> You don't just have a single ABC declaring a collection, but rather "I am >> a mapping" or "I am a mutable sequence". It's more of a pain for developers >> to properly specify things, but this is not a bad thing to actually give >> code some thought. >> > > I agree, it would be nice to nail down a hierarchy of duck-arrays, if > possible. Although, there are quite a few options, so I don't know how > doable this is. > Exactly -- there are an exponential amount of options... > Well, to get the ball rolling a bit, the key thing that matplotlib needs > to know is if `shape`, `reshape`, 'size', broadcasting, and logical > indexing is respected. So, I see three possible abc's here: one for > attribute access (things like `shape` and `size`) and another for shape > manipulations (broadcasting and reshape, and assignment to .shape). I think we're going to get into an string of ABCs: ArrayLikeForMPL_ABC etc, etc..... > And then a third abc for indexing support, although, I am not sure how > that could get implemented... This is the really tricky one -- all ABCs really check is the existence of methods -- making sure they behave the same way is up to the developer of the ducktype. which is K, but will require discipline. But indexing, specifically fancy indexing, is another matter -- I'm not sure if there even a way with an ABC to check for what types of indexing are support, but we'd still have the problem with whether the semantics are the same! For example, I work with netcdf variable objects, which are partly duck-typed as ndarrays, but I think n-dimensional fancy indexing works differently... how in the world do you detect that with an ABC??? For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC > for astropy, which may be a useful starting point as it also provides > a way to implement the methods ndarray uses to reshape and get > elements: see > https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863 Sounds like a good starting point for discussion. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Tue Nov 7 16:03:16 2017 From: rmay31 at gmail.com (Ryan May) Date: Tue, 7 Nov 2017 14:03:16 -0700 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Tue, Nov 7, 2017 at 1:20 PM, Chris Barker wrote: > On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer wrote: > >> >>> What's needed, though, is not just a single ABC. Some thought and design >>> needs to go into segmenting the ndarray API to declare certain behaviors, >>> just like was done for collections: >>> >>> https://docs.python.org/3/library/collections.abc.html >>> >>> You don't just have a single ABC declaring a collection, but rather "I >>> am a mapping" or "I am a mutable sequence". It's more of a pain for >>> developers to properly specify things, but this is not a bad thing to >>> actually give code some thought. >>> >> >> I agree, it would be nice to nail down a hierarchy of duck-arrays, if >> possible. Although, there are quite a few options, so I don't know how >> doable this is. >> > > Exactly -- there are an exponential amount of options... > > >> Well, to get the ball rolling a bit, the key thing that matplotlib needs >> to know is if `shape`, `reshape`, 'size', broadcasting, and logical >> indexing is respected. So, I see three possible abc's here: one for >> attribute access (things like `shape` and `size`) and another for shape >> manipulations (broadcasting and reshape, and assignment to .shape). > > > I think we're going to get into an string of ABCs: > > ArrayLikeForMPL_ABC > > etc, etc..... > Only if you try to provide perfectly-sized options for every occasion--but that's not how we do things in (sane) software development. You provide a few options that optimize the common use cases, and you don't try to cover everything--let client code figure out the right combination from the primitives you provide. One can always just inherit/register *all* the ABCs if need be. The status quo is that we have 1 interface that covers everything from multiple dims and shape to math and broadcasting to the entire __array__ interface. Even breaking that up into the 3 "obvious" chunks would be a massive improvement. I just don't want to see this effort bog down into "this is so hard". Getting it perfect is hard; getting it useful is much easier. It's important to note that we can always break up/combine existing ABCs into other ones later. > And then a third abc for indexing support, although, I am not sure how >> that could get implemented... > > > This is the really tricky one -- all ABCs really check is the existence of > methods -- making sure they behave the same way is up to the developer of > the ducktype. > > which is K, but will require discipline. > > But indexing, specifically fancy indexing, is another matter -- I'm not > sure if there even a way with an ABC to check for what types of indexing > are support, but we'd still have the problem with whether the semantics are > the same! > > For example, I work with netcdf variable objects, which are partly > duck-typed as ndarrays, but I think n-dimensional fancy indexing works > differently... how in the world do you detect that with an ABC??? > Even documenting expected behavior as part of these ABCs would go a long way towards helping standardize behavior. Another idea would be to put together a conformance test suite as part of this effort, in lieu of some kind of run-time checking of behavior (which would be terrible). That would help developers of other "ducks" check that they're doing the right things. I'd imagine the existing NumPy test suite would largely cover this. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Nov 7 17:01:53 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 07 Nov 2017 22:01:53 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Tue, Nov 7, 2017 at 12:23 PM Chris Barker wrote: > > And then a third abc for indexing support, although, I am not sure how >> that could get implemented... > > > This is the really tricky one -- all ABCs really check is the existence of > methods -- making sure they behave the same way is up to the developer of > the ducktype. > > which is K, but will require discipline. > > But indexing, specifically fancy indexing, is another matter -- I'm not > sure if there even a way with an ABC to check for what types of indexing > are support, but we'd still have the problem with whether the semantics are > the same! > > For example, I work with netcdf variable objects, which are partly > duck-typed as ndarrays, but I think n-dimensional fancy indexing works > differently... how in the world do you detect that with an ABC??? > We recently worked out a hierarchy of indexing types for xarray. To a crude approximation, we have: - "Basic" indexing support for slices and integers. Nearly every array type satisfies this. - "Outer" or "orthogonal" indexing with slices, integers and 1D arrays. This is what netCDF4-Python and Fortran/MATLAB support. - "Vectorized" indexing with broadcasting and multi-dimensional indexers. NumPy supports a generalization of this, but I would not wish the edge cases involving mixed slices/arrays upon anyone. - "Logical" indexing by a boolean array with the same shape. - "Exactly like NumPy" for subclasses or wrappers around NumPy arrays. There's some ambiguities in this, but that's what specs are for. For most applications, we probably don't need most of these: ABCs for "Basic", "Logical" and "Exactly like NumPy" would go a long ways. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Nov 7 18:27:36 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 7 Nov 2017 17:27:36 -0600 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: On Nov 6, 2017 4:19 PM, "Chris Barker" wrote: On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > > You just summarized excellently why I'm on a quest to change `asarray` > to `asanyarray` within numpy +1 -- we should all be using asanyarray() most of the time. The problem is that if you use 'asanyarray', then you're claiming that your code works correctly for: - regular ndarrays - np.matrix - np.ma masked arrays - and every third party subclass, regardless of their semantics, regardless of whether you've heard of them or not If subclasses followed the Liskov substitution principle, and had different internal implementations but the same public ("duck") API, then this would be fine. But in practice, numpy limitations mean that ndarrays subclasses have to have the same internal implementation, so the only reason to make an ndarray subclass is if you want to make something with a different public API. Basically the whole system is designed for subclasses to be incompatible. The end result is that if you use asanyarray, your code is definitely wrong, because there's no way you're actually doing the right thing for arbitrary ndarray subclasses. But if you don't use asanyarray, then yeah, that's also wrong, because it won't work on mostly-compatible subclasses like astropy's. Given this, different projects reasonably make different choices -- it's not just legacy code that uses asarray. In the long run we obviously need to come up with new options that don't have these tradeoffs (that's why we want to let units to to dtypes, implement methods like __array_ufunc__ to enable duck arrays, etc.) let's try to be sympathetic to other projects that are doing their best :-). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Nov 7 18:40:31 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 7 Nov 2017 17:40:31 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Nov 7, 2017 2:15 PM, "Chris Barker" wrote: On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris wrote: > Also -- if py2.7 continues to see the use I expect it will well past when >>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7 >>> Windows build based on a newer compiler would come along -- perhaps by >>> Anaconda or conda-forge, or ??? >>> >> >> I suspect that this will indeed happen. I am aware of multiple companies >> following this path already (building python + numpy themselves with a >> newer MS compiler). >> > > I think Anaconda is talking about distributing a compiler, but what that > will be on windows is anyone's guess. When we drop 2.7, there is a lot of > compatibility crud that it would be nice to get rid of, and if we do that > then NumPy will no longer compile against 2.7. I suspect some companies > have just been putting off the task of upgrading to Python 3, which should > be pretty straight forward these days apart from system code that needs to > do a lot of work with bytes. > I agree, and if there is a compelling reason to upgrade, folks WILL do it. But I've been amazed over the years at folks' desire to stick with what they have! And I'm guilty too, anything new I start with py3, but older larger codebases are still py2, I just can't find the energy to spend a the week or so it would probably take to update everything... But in the original post, the Windows Compiler issue was mentioned, so there seems to be two reasons to drop py2: A) wanting to use py3 only features. B) wanting to use never C (C++?) compiler features. I suggest we be clear about which of these is driving the decisions, and explicit about the goals. That is, if (A) is critical, we don't even have to talk about (B) But we could choose to do (B) without doing (A) -- I suspect there will be a user base for that.... The problem is it's hard to predict the future. Right now neither PyPI nor conda provide any way to distribute binaries for py27-but-with-a-newer-ABI, and maybe they never will; or maybe they will eventually, but not enough people use them to justify keeping py2 support given the other overheads; or... who knows, really. Right now, the decision in front of us is what to tell people who ask about numpy's py2 support plans, so that they can make their own plans. Given what we know right now, I don't think we should promise to keep support past 2018. If we get there and the situation's changed, and there's both desire and means to extend support we can revisit that. But's better to under-promise and possibly over-deliver, instead of promising to support py2 until after it becomes a millstone around our necks and then realizing we haven't warned anyone and are stuck supporting it another year beyond that... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue Nov 7 21:49:22 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 7 Nov 2017 21:49:22 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: Hi Nathaniel, You're right, I shouldn't be righteous. Though I do think the advantage of `asanyarray` inside numpy is remains that it is easy for a user to add `asarray` to their input to a numpy function, and not easy for a happily compatible subclass to avoid an `asarray` inside a numpy function! I.e., coerce as little as you can get away with... All the best, Marten From matti.picus at gmail.com Wed Nov 8 11:41:03 2017 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 8 Nov 2017 18:41:03 +0200 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand flags? Message-ID: I filed issue 9714 https://github.com/numpy/numpy/issues/9714 and wrote a mail in September trying to get some feedback on what to do with updateifcopy semantics and user-exposed nditer. It garnered no response, so I am trying again. For those who are unfamiliar with the issue see below for a short summary and issue 7054 for a lengthy discussion. Note that pull request 9639 which should be merged very soon changes the magical UPDATEIFCOPY into WRITEBACKIFCOPY, and hopefully will appear in NumPy 1.14. As I mention in the issue, there is a magical update done in this snippet in the next-to-the-last line: |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T i = np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind', op_dtypes=[np.dtype('f4')]) # Check that UPDATEIFCOPY is activated i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5 i = None # magic!!! assert a[2, 1, 1] == -12.5| Not only is this magic very implicit, it relies on refcount semantics and thus does not work on PyPy. Possible solutions: 1. nditer is rarely used, just deprecate updateifcopy use on operands 2. make nditer into a context manager, so the code would become explicit |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T with np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind', op_dtypes=[np.dtype('f4')]) as i: # Check that WRITEBACKIFCOPY is activated i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5 assert a[2, 1, 1] == -12.5 # a is modified in i.__exit__| 3. something else? Any opinions? Does anyone use nditer in production code? Matti ------------------------- what are updateifcopy semantics? When a temporary copy or work buffer is required, NumPy can (ab)use the base attribute of an ndarray by ?? - creating a copy of the data from the base array ?? - mark the base array read-only Then when the temporary buffer is "no longer needed" ?? - the data is copied back ?? - the original base array is marked read-write The trigger for the "no longer needed" decision before pull request 9639 is in the dealloc function. That is not generally a place to do useful work, especially on PyPy which can call dealloc much later. Pull request 9639 adds an explicit PyArray_ResolveWritebackIfCopy api function, and recommends calling it explicitly before dealloc. The only place this change is visible to the python-level user is in nditer. C-API users will need to adapt their code to use the new API function, with a deprecation cycle that is backwardly compatible on CPython. From p.j.a.cock at googlemail.com Wed Nov 8 11:50:38 2017 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 8 Nov 2017 16:50:38 +0000 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 11:40 PM, Nathaniel Smith wrote: > > > > Right now, the decision in front of us is what to tell people who ask about > numpy's py2 support plans, so that they can make their own plans. Given what > we know right now, I don't think we should promise to keep support past > 2018. If we get there and the situation's changed, and there's both desire > and means to extend support we can revisit that. But's better to > under-promise and possibly over-deliver, instead of promising to support py2 > until after it becomes a millstone around our necks and then realizing we > haven't warned anyone and are stuck supporting it another year beyond > that... > > -n NumPy (and to a lesser extent SciPy) is in a tough position being at the bottom of many scientific Python programming stacks. Whenever you drop Python 2 support is going to upset someone. It is too ambitious to pledge to drop support for Python 2.7 no later than 2020, coinciding with the Python development team?s timeline for dropping support for Python 2.7? If that looks doable, NumPy could sign up to http://www.python3statement.org/ Regards, Peter From ilhanpolat at gmail.com Wed Nov 8 12:15:39 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 8 Nov 2017 18:15:39 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: I was about to send the same thing. I think this matter became a vim/emacs issue and Py2 supporters won't take any arguments anymore. But if Instagram can do it, it means that legacy code argument is a matter of will but not a technicality. https://thenewstack.io/instagram-makes-smooth-move-python-3/ Also people are really going out of their ways such as Tauthon https://github.com/naftaliharris/tauthon to stay with Python2. To be honest, I'm convinced that this is a sentimental debate after seeing this fork. On Wed, Nov 8, 2017 at 5:50 PM, Peter Cock wrote: > On Tue, Nov 7, 2017 at 11:40 PM, Nathaniel Smith wrote: > > > > > > > > Right now, the decision in front of us is what to tell people who ask > about > > numpy's py2 support plans, so that they can make their own plans. Given > what > > we know right now, I don't think we should promise to keep support past > > 2018. If we get there and the situation's changed, and there's both > desire > > and means to extend support we can revisit that. But's better to > > under-promise and possibly over-deliver, instead of promising to support > py2 > > until after it becomes a millstone around our necks and then realizing we > > haven't warned anyone and are stuck supporting it another year beyond > > that... > > > > -n > > NumPy (and to a lesser extent SciPy) is in a tough position being at the > bottom of many scientific Python programming stacks. Whenever you > drop Python 2 support is going to upset someone. > > It is too ambitious to pledge to drop support for Python 2.7 no later than > 2020, coinciding with the Python development team?s timeline for dropping > support for Python 2.7? > > If that looks doable, NumPy could sign up to http://www.python3statement. > org/ > > Regards, > > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Wed Nov 8 12:31:45 2017 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 8 Nov 2017 19:31:45 +0200 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand, flags? In-Reply-To: References: Message-ID: <71d9e646-92e3-87f4-e0f9-6d43f845c529@gmail.com> An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Nov 8 13:00:37 2017 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 08 Nov 2017 19:00:37 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: <1510164037.21998.10.camel@sipsolutions.net> On Wed, 2017-11-08 at 18:15 +0100, Ilhan Polat wrote: > I was about to send the same thing. I think this matter became a > vim/emacs issue and Py2 supporters won't take any arguments anymore. > But if Instagram can do it, it means that legacy code argument is a > matter of will but not a technicality. https://thenewstack.io/instagr > am-makes-smooth-move-python-3/ > > Also people are really going out of their ways such as Tauthon https: > //github.com/naftaliharris/tauthon to stay with Python2. To be > honest, I'm convinced that this is a sentimental debate after seeing > this fork. > > In my opinion it is fine for us to drop support for python 2 in master relatively soon (as proposed here). But I guess we will need to a "LTS" release which means some extra maintenance burden until 2020. I could hope those who really need it jumping in to carry some of that (and by 2020 my guess is if anyone still wants to support it longer, we won't stop you, but I doubt the current core devs, at least not me, would be very interested in it). So in my opinion, the current NumPy is excellent and very stable, anyone who needs fancy new stuff likely also wants other fancy new stuff so will soon have to use python 3 anyway.... Which means, if we think the extra burden of a "LTS" is lower then the current hassle, lets do it :). Also downstream seems only half a reason to me, since downstream normally supports much outdated versions anyway? - Sebastian > > > > > > On Wed, Nov 8, 2017 at 5:50 PM, Peter Cock > wrote: > > On Tue, Nov 7, 2017 at 11:40 PM, Nathaniel Smith > > wrote: > > > > > > > > > > > > Right now, the decision in front of us is what to tell people who > > ask about > > > numpy's py2 support plans, so that they can make their own plans. > > Given what > > > we know right now, I don't think we should promise to keep > > support past > > > 2018. If we get there and the situation's changed, and there's > > both desire > > > and means to extend support we can revisit that. But's better to > > > under-promise and possibly over-deliver, instead of promising to > > support py2 > > > until after it becomes a millstone around our necks and then > > realizing we > > > haven't warned anyone and are stuck supporting it another year > > beyond > > > that... > > > > > > -n > > > > NumPy (and to a lesser extent SciPy) is in a tough position being > > at the > > bottom of many scientific Python programming stacks. Whenever you > > drop Python 2 support is going to upset someone. > > > > It is too ambitious to pledge to drop support for Python 2.7 no > > later than > > 2020, coinciding with the Python development team?s timeline for > > dropping > > support for Python 2.7? > > > > If that looks doable, NumPy could sign up to http://www.python3stat > > ement.org/ > > > > Regards, > > > > Peter > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: This is a digitally signed message part URL: From jtaylor.debian at googlemail.com Wed Nov 8 14:08:37 2017 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 8 Nov 2017 20:08:37 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> On 06.11.2017 11:10, Ralf Gommers wrote: > > > On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris > > wrote: > > Hi All, > > Thought I'd toss this out there. I'm tending towards better sooner > than later in dropping Python 2.7 support as we are starting to run > up against places where we would like to use Python 3 features. That > is particularly true on Windows where the 2.7 compiler is really old > and lacks C99 compatibility. > > > This is probably the most pressing reason to drop 2.7 support. We seem > to be expending a lot of effort lately on this stuff. I was previously > advocating being more conservative than the timeline you now propose, > but this is the pain point that I think gets me over the line. Would dropping python2 support for windows earlier than the other platforms a reasonable approach? I am not a big fan of to dropping python2 support before 2020, but I have no issue with dropping python2 support on windows earlier as it is our largest pain point. From njs at pobox.com Wed Nov 8 15:12:55 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 8 Nov 2017 14:12:55 -0600 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand, flags? In-Reply-To: <71d9e646-92e3-87f4-e0f9-6d43f845c529@gmail.com> References: <71d9e646-92e3-87f4-e0f9-6d43f845c529@gmail.com> Message-ID: At a higher level: The issue here is that we need to break the nditer API. This might affect you if you np.nditer (in Python) or the NpyIter_* APIs (in C). The exact cases affected are somewhat hard to describe because nditer's flag processing is complicated [1], but basically it's cases where you are writing to one of the arrays being iterated over and then something else non-trivial happens. The problem is that the API currently uses NumPy's odd UPDATEIFCOPY feature. What it does is give you an "output" array which is not your actual output array, but instead some other temporary array which you can modify freely, and whose contents are later written back to your actual output array. When does this copy happen? Since this is an iterator, then most of the time we can do the writeback for iteration N when we start iteration N+1. However, this doesn't work for the final iteration. On the final iteration, currently the writeback happens when the temporary is garbage collected. *Usually* this happens pretty promptly, but this is dependent on some internal details of how CPython's garbage collector works that are explicitly not part of the Python language spec, and on PyPy you silently and non-deterministically get incorrect results. Plus it's error-prone even on CPython -- if you accidentally have a dangling reference to one array, then suddenly another array will have the wrong contents. So we have two options: - We could stop supporting this mode entirely. Unfortunately, it's hard to know if anyone is using this, since the conditions to trigger it are so complicated, and not necessarily very exotic (e.g. it can happen if you have a function that uses nditer to read one array and write to another, and then someone calls your function with two arrays whose memory overlaps). - We could adjust the API so that there's some explicit operation to trigger the final writeback. At the Python level this would probably mean that we start supporting the use of nditer as a context manager, and eventually start raising an error if you're in one of the "unsafe" case and not using the context manager form. At the C level we probably need some explicit "I'm done with this iterator now" call. One question is which cases exactly should produce warnings/eventually errors. At the Python level, I guess the simplest rule would be that if you have any write/readwrite arrays in your iterator, then you have to use a 'with' block. At the C level, it's a little trickier, because it's hard to tell up-front whether someone has updated their code to call a final cleanup function, and it's hard to emit a warning/error on something that *doesn't* happen. (You could print a warning when the nditer object is GCed if the cleanup function wasn't called, but you can't raise an error there.) I guess the only reasonable option is to deprecate NPY_ITER_READWRITE and NP_ITER_WRITEONLY, and make people switch to passing new flags that have the same semantics but also promise that the user has updated their code to call the new cleanup function. Does that work? Any objections? -n [1] The affected cases are the ones that reach this line: https://github.com/numpy/numpy/blob/c276f326b29bcb7c851169d34f4767da0b4347af/numpy/core/src/multiarray/nditer_constr.c#L2926 So it's something like - all of these things are true: - you have a writable array (nditer flags "write" or "readwrite") - one of these things is true: - you passed the "forcecopy" flag - all of these things are true: - you requested casting - you requested updateifcopy - there's a memory overlap between this array and another of the arrays being iterated over On Wed, Nov 8, 2017 at 11:31 AM, Matti Picus wrote: > > Date: Wed, 8 Nov 2017 18:41:03 +0200 > From: Matti Picus > To: numpy-discussion at python.org > Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand > flags? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > I filed issue 9714 https://github.com/numpy/numpy/issues/9714 and wrote > a mail in September trying to get some feedback on what to do with > updateifcopy semantics and user-exposed nditer. > It garnered no response, so I am trying again. > For those who are unfamiliar with the issue see below for a short > summary and issue 7054 for a lengthy discussion. > Note that pull request 9639 which should be merged very soon changes the > magical UPDATEIFCOPY into WRITEBACKIFCOPY, and hopefully will appear in > NumPy 1.14. > > As I mention in the issue, there is a magical update done in this > snippet in the next-to-the-last line: > > |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T i = np.nditer(a, [], > [['readwrite', 'updateifcopy']], casting='same_kind', > op_dtypes=[np.dtype('f4')]) # Check that UPDATEIFCOPY is activated > i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5 i = None # > magic!!! assert a[2, 1, 1] == -12.5| > > Formatting > > a = np.arange(24, dtype='f8').reshape(2, 3, 4).T > i = np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind', > op_dtypes=[np.dtype('f4')]) > # Check that WRITEBACKIFCOPY is activated > i.operands[0][2, 1, 1] = -12.5 > assert a[2, 1, 1] != -12.5 > i=None # magic > assert a[2, 1, 1] == -12.5 > > Not only is this magic very implicit, it relies on refcount semantics > and thus does not work on PyPy. > Possible solutions: > > 1. nditer is rarely used, just deprecate updateifcopy use on operands > > 2. make nditer into a context manager, so the code would become explicit > > |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T with np.nditer(a, [], > [['readwrite', 'updateifcopy']], casting='same_kind', > op_dtypes=[np.dtype('f4')]) as i: # Check that WRITEBACKIFCOPY is > activated i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5 > assert a[2, 1, 1] == -12.5 # a is modified in i.__exit__| > > Formatting > > a = np.arange(24, dtype='f8').reshape(2, 3, 4).T > with np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind', > op_dtypes=[np.dtype('f4')]) as i: > # Check that WRITEBACKIFCOPY is activated > i.operands[0][2, 1, 1] = -12.5 > assert a[2, 1, 1] != -12.5 > assert a[2, 1, 1] == -12.5 # a is modified in i.__exit__ > > 3. something else? > > Any opinions? Does anyone use nditer in production code? > Matti > > ------------------------- > what are updateifcopy semantics? When a temporary copy or work buffer is > required, NumPy can (ab)use the base attribute of an ndarray by > > ?? - creating a copy of the data from the base array > > ?? - mark the base array read-only > > Then when the temporary buffer is "no longer needed" > > ?? - the data is copied back > > ?? - the original base array is marked read-write > > The trigger for the "no longer needed" decision before pull request 9639 > is in the dealloc function. > That is not generally a place to do useful work, especially on PyPy > which can call dealloc much later. > Pull request 9639 adds an explicit PyArray_ResolveWritebackIfCopy api > function, and recommends calling it explicitly before dealloc. > > The only place this change is visible to the python-level user is in > nditer. > C-API users will need to adapt their code to use the new API function, > with a deprecation cycle that is backwardly compatible on CPython. > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org From chris.barker at noaa.gov Wed Nov 8 17:05:15 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 8 Nov 2017 14:05:15 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Wed, Nov 8, 2017 at 11:08 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > > Would dropping python2 support for windows earlier than the other > platforms a reasonable approach? > no. I'm not Windows fan myself, but it is a HUGE fraction of the userbase. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Wed Nov 8 17:13:39 2017 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 8 Nov 2017 17:13:39 -0500 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand, flags? In-Reply-To: References: <71d9e646-92e3-87f4-e0f9-6d43f845c529@gmail.com> Message-ID: <2618f2cf-0c1f-44f7-40db-95972484c58b@gmail.com> On 11/08/2017 03:12 PM, Nathaniel Smith wrote: > - We could adjust the API so that there's some explicit operation to > trigger the final writeback. At the Python level this would probably > mean that we start supporting the use of nditer as a context manager, > and eventually start raising an error if you're in one of the "unsafe" > case and not using the context manager form. At the C level we > probably need some explicit "I'm done with this iterator now" call. > > One question is which cases exactly should produce warnings/eventually > errors. At the Python level, I guess the simplest rule would be that > if you have any write/readwrite arrays in your iterator, then you have > to use a 'with' block. At the C level, it's a little trickier, because > it's hard to tell up-front whether someone has updated their code to > call a final cleanup function, and it's hard to emit a warning/error > on something that *doesn't* happen. (You could print a warning when > the nditer object is GCed if the cleanup function wasn't called, but > you can't raise an error there.) I guess the only reasonable option is > to deprecate NPY_ITER_READWRITE and NP_ITER_WRITEONLY, and make people > switch to passing new flags that have the same semantics but also > promise that the user has updated their code to call the new cleanup > function. Seems reasonable. When people use the Nditer C-api, they (almost?) always call NpyIter_Dealloc when they're done. Maybe that's a place to put a warning for C-api users. I think you can emit a warning there since that function calls the GC, not the other way around. It looks like you've already discussed the possibilities of putting things in NpyIter_Dealloc though, and it could be tricky, but if we only need a warning maybe there's a way. https://github.com/numpy/numpy/pull/9269/files/6dc0c65e4b2ea67688d6b617da3a175cd603fc18#r127707149 Allan From bryanv at anaconda.com Wed Nov 8 17:17:24 2017 From: bryanv at anaconda.com (Bryan Van de ven) Date: Wed, 8 Nov 2017 16:17:24 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: <9707FFBA-776D-41A7-9CB1-07FE3960BD12@anaconda.com> > On Nov 8, 2017, at 10:50, Peter Cock wrote: > > NumPy (and to a lesser extent SciPy) is in a tough position being at the > bottom of many scientific Python programming stacks. Whenever you > drop Python 2 support is going to upset someone. Existing versions of NumPy will still exist and continue to work with Python 2.7. If users want to say with Python 2.7, that's fine, they will just have to rely on those older/LTS versions. I personally would be happy for projects at the bottom of stacks to take an activist stance and make decisions to actively encourage movement to Python 3. > It is too ambitious to pledge to drop support for Python 2.7 no later than > 2020, coinciding with the Python development team?s timeline for dropping > support for Python 2.7? Developing NumPy is hard, as it is. Everything that can be done to simplify things for the current maintainers and help attract new contributors should be done. It is not reasonable to ask a few (largely volunteer) people to shoulder the burden and difficulties of supporting Python 2.7 for several additional *years* of their life. I agree entirely with Nick Coghlan's comments from another discussion, and think they apply equally well in this instance: """ While it's entirely admirable that many upstream developers are generous enough to help their end users work around this inertia, in the long run doing so is detrimental for everyone concerned, as long term sustaining engineering for old releases is genuinely demotivating for upstream developers (it's a good job, but a lousy way to spend your free time) and for end users, working around institutional inertia this way reduces the pressure to actually get the situation addressed properly. """ Thanks, Bryan From msarahan at gmail.com Wed Nov 8 17:29:28 2017 From: msarahan at gmail.com (Michael Sarahan) Date: Wed, 8 Nov 2017 16:29:28 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <9707FFBA-776D-41A7-9CB1-07FE3960BD12@anaconda.com> References: <9707FFBA-776D-41A7-9CB1-07FE3960BD12@anaconda.com> Message-ID: Anaconda's compilers are for Linux (gcc 7.2) and Mac (llvm/clang 4.0.1) right now. We would like to have clang target all platforms, but that's a lot of development effort. We are also exploring ways of keeping package ecosystems in line, so that building and managing a self-consistent set of python 2.7 packages with a new Visual Studio version or msys2 might be easier. No timeline to report, on that, though. Breaking with the python.org ABI is pretty painful. On Wed, Nov 8, 2017 at 4:17 PM, Bryan Van de ven wrote: > > > On Nov 8, 2017, at 10:50, Peter Cock wrote: > > > > NumPy (and to a lesser extent SciPy) is in a tough position being at the > > bottom of many scientific Python programming stacks. Whenever you > > drop Python 2 support is going to upset someone. > > Existing versions of NumPy will still exist and continue to work with > Python 2.7. If users want to say with Python 2.7, that's fine, they will > just have to rely on those older/LTS versions. I personally would be happy > for projects at the bottom of stacks to take an activist stance and make > decisions to actively encourage movement to Python 3. > > > It is too ambitious to pledge to drop support for Python 2.7 no later > than > > 2020, coinciding with the Python development team?s timeline for dropping > > support for Python 2.7? > > Developing NumPy is hard, as it is. Everything that can be done to > simplify things for the current maintainers and help attract new > contributors should be done. It is not reasonable to ask a few (largely > volunteer) people to shoulder the burden and difficulties of supporting > Python 2.7 for several additional *years* of their life. > > I agree entirely with Nick Coghlan's comments from another discussion, and > think they apply equally well in this instance: > > """ > While it's entirely admirable that many upstream developers are generous > enough to help their end users work around this inertia, in the long run > doing so is detrimental for everyone concerned, as long term sustaining > engineering for old releases is genuinely demotivating for upstream > developers (it's a good job, but a lousy way to spend your free time) and > for end users, working around institutional inertia this way reduces the > pressure to actually get the situation addressed properly. > """ > > Thanks, > > Bryan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Nov 8 17:50:06 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 8 Nov 2017 22:50:06 +0000 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: Hi, On Wed, Nov 8, 2017 at 7:08 PM, Julian Taylor wrote: > On 06.11.2017 11:10, Ralf Gommers wrote: >> >> >> On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris >> > wrote: >> >> Hi All, >> >> Thought I'd toss this out there. I'm tending towards better sooner >> than later in dropping Python 2.7 support as we are starting to run >> up against places where we would like to use Python 3 features. That >> is particularly true on Windows where the 2.7 compiler is really old >> and lacks C99 compatibility. >> >> >> This is probably the most pressing reason to drop 2.7 support. We seem >> to be expending a lot of effort lately on this stuff. I was previously >> advocating being more conservative than the timeline you now propose, >> but this is the pain point that I think gets me over the line. > > > Would dropping python2 support for windows earlier than the other > platforms a reasonable approach? > I am not a big fan of to dropping python2 support before 2020, but I > have no issue with dropping python2 support on windows earlier as it is > our largest pain point. I wonder about this too. I can imagine there are a reasonable number of people using older Linux distributions on which they cannot upgrade to a recent Python 3, but is that likely to be true for Windows? We'd have to make sure we could persuade pypi to give the older version for Windows, by default - I don't know if that is possible. Cheers, Matthew From njs at pobox.com Wed Nov 8 18:15:59 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 8 Nov 2017 17:15:59 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Nov 8, 2017 16:51, "Matthew Brett" wrote: Hi, On Wed, Nov 8, 2017 at 7:08 PM, Julian Taylor wrote: > On 06.11.2017 11:10, Ralf Gommers wrote: >> >> >> On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris >> > wrote: >> >> Hi All, >> >> Thought I'd toss this out there. I'm tending towards better sooner >> than later in dropping Python 2.7 support as we are starting to run >> up against places where we would like to use Python 3 features. That >> is particularly true on Windows where the 2.7 compiler is really old >> and lacks C99 compatibility. >> >> >> This is probably the most pressing reason to drop 2.7 support. We seem >> to be expending a lot of effort lately on this stuff. I was previously >> advocating being more conservative than the timeline you now propose, >> but this is the pain point that I think gets me over the line. > > > Would dropping python2 support for windows earlier than the other > platforms a reasonable approach? > I am not a big fan of to dropping python2 support before 2020, but I > have no issue with dropping python2 support on windows earlier as it is > our largest pain point. I wonder about this too. I can imagine there are a reasonable number of people using older Linux distributions on which they cannot upgrade to a recent Python 3, My impression is that this is increasingly rare, actually. I believe RHEL is still shipping 2.6 by default, which we've already dropped support for, and if you want RH python then they provide supported 2.7 and 3.latest through exactly the same channels. Ubuntu 14.04 is end-of-life in April 2019, so pretty irrelevant if we're talking about 2019 for dropping support, and 16.04 ships with 3.5. Plus with docker, conda, PPAs, etc., getting a recent python is easier than its ever been. > but is that likely to be true for Windows? We'd have to make sure we could persuade pypi to give the older version for Windows, by default - I don't know if that is possible. Currently it's not ? if pip doesn't see a Windows wheel, it'll try downloading and building an sdist. There's a mechanism for sdists to declare what version of python they support but (thanks to the jupyter folks for implementing this), but that's all. The effect is that if we release a version that drops support for py2 entirely, then 'pip install' on py2 will continue to work and give the last supported version, but if we release a version that drops py2 on Windows but keeps it on other platforms then 'pip install' on py2 on Windows will just stop working entirely. This is possible to fix ? it's just software ? but I'm not volunteering... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Wed Nov 8 19:34:18 2017 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 9 Nov 2017 00:34:18 +0000 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <9707FFBA-776D-41A7-9CB1-07FE3960BD12@anaconda.com> References: <9707FFBA-776D-41A7-9CB1-07FE3960BD12@anaconda.com> Message-ID: On Wed, Nov 8, 2017 at 10:17 PM, Bryan Van de ven wrote: > >> On Nov 8, 2017, at 10:50, Peter Cock wrote: >> >> NumPy (and to a lesser extent SciPy) is in a tough position being at the >> bottom of many scientific Python programming stacks. Whenever you >> drop Python 2 support is going to upset someone. > > Existing versions of NumPy will still exist and continue to work with Python 2.7. If users want to say with Python 2.7, that's fine, they will just have to rely on those older/LTS versions. I personally would be happy for projects at the bottom of stacks to take an activist stance and make decisions to actively encourage movement to Python 3. > >> It is too ambitious to pledge to drop support for Python 2.7 no later than >> 2020, coinciding with the Python development team?s timeline for dropping >> support for Python 2.7? > > Developing NumPy is hard, as it is. Everything that can be done to simplify things for the current maintainers and help attract new contributors should be done. It is not reasonable to ask a few (largely volunteer) people to shoulder the burden and difficulties of supporting Python 2.7 for several additional *years* of their life. > > I agree entirely with Nick Coghlan's comments from another discussion, and think they apply equally well in this instance: > > """ > While it's entirely admirable that many upstream developers are generous enough to help their end users work around this inertia, in the long run doing so is detrimental for everyone concerned, as long term sustaining engineering for old releases is genuinely demotivating for upstream developers (it's a good job, but a lousy way to spend your free time) and for end users, working around institutional inertia this way reduces the pressure to actually get the situation addressed properly. > """ > > Thanks, > > Bryan I agree too - I was trying to phrase that email neutrally as I am not a direct NumPy contributor, but to be more explicit, as someone invested in this ecosystem: I'd fully support NumPy pledging to drop Python 2.7 support no later than 2020. I see signing up to http://www.python3statement.org/ as being about helping publicise this choice. (This is not to say dropping Python 2.7 support in NumPy couldn't happen much sooner than 2020 - the C99 compiler issues sounds like a strong pressure to do so.) Peter From ralf.gommers at gmail.com Thu Nov 9 02:57:02 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 9 Nov 2017 20:57:02 +1300 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Thu, Nov 9, 2017 at 12:15 PM, Nathaniel Smith wrote: > On Nov 8, 2017 16:51, "Matthew Brett" wrote: > > Hi, > > On Wed, Nov 8, 2017 at 7:08 PM, Julian Taylor > wrote: > > On 06.11.2017 11:10, Ralf Gommers wrote: > >> > >> > >> On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris > >> > wrote: > >> > >> Hi All, > >> > >> Thought I'd toss this out there. I'm tending towards better sooner > >> than later in dropping Python 2.7 support as we are starting to run > >> up against places where we would like to use Python 3 features. That > >> is particularly true on Windows where the 2.7 compiler is really old > >> and lacks C99 compatibility. > >> > >> > >> This is probably the most pressing reason to drop 2.7 support. We seem > >> to be expending a lot of effort lately on this stuff. I was previously > >> advocating being more conservative than the timeline you now propose, > >> but this is the pain point that I think gets me over the line. > > > > > > Would dropping python2 support for windows earlier than the other > > platforms a reasonable approach? > > I am not a big fan of to dropping python2 support before 2020, but I > > have no issue with dropping python2 support on windows earlier as it is > > our largest pain point. > > I wonder about this too. I can imagine there are a reasonable number > of people using older Linux distributions on which they cannot upgrade > to a recent Python 3, > > > My impression is that this is increasingly rare, actually. I believe RHEL > is still shipping 2.6 by default, which we've already dropped support for, > and if you want RH python then they provide supported 2.7 and 3.latest > through exactly the same channels. Ubuntu 14.04 is end-of-life in April > 2019, so pretty irrelevant if we're talking about 2019 for dropping > support, and 16.04 ships with 3.5. Plus with docker, conda, PPAs, etc., > getting a recent python is easier than its ever been. > > > but > > is that likely to be true for Windows? > > We'd have to make sure we could persuade pypi to give the older > version for Windows, by default - I don't know if that is possible. > > > Currently it's not ? if pip doesn't see a Windows wheel, it'll try > downloading and building an sdist. There's a mechanism for sdists to > declare what version of python they support but (thanks to the jupyter > folks for implementing this), but that's all. The effect is that if we > release a version that drops support for py2 entirely, then 'pip install' > on py2 will continue to work and give the last supported version, but if we > release a version that drops py2 on Windows but keeps it on other platforms > then 'pip install' on py2 on Windows will just stop working entirely. > > This is possible to fix ? it's just software ? but I'm not volunteering... > Given the release cycle of pip (slow) and the bandwidth required to implement this, I think that this is likely a showstopper for Windows-only-3.x-only. Another consideration is that choices made by numpy tend to propagate to the rest of the ecosystem, and support for Python versions that's OS-independent is nicer than Windows special-casing. And yet another is that when we do finally drop 2.7, I think we'd want to get the full benefits of doing so. That's new 3.x features (@ in particular), cleaning up lots of support code, etc. For those reasons I think we should balance the pain and benefits of 2.7 support and just pick a date to drop it completely, not just on Windows. Regarding http://www.python3statement.org/: I'd say that as long as there are people who want to spend their energy on the LTS release (contributors *and* enough maintainer power to review/merge/release), we should not actively prevent them from doing that. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Nov 9 03:52:09 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Nov 2017 02:52:09 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Nov 8, 2017 23:59, "Ralf Gommers" wrote: Regarding http://www.python3statement.org/: I'd say that as long as there are people who want to spend their energy on the LTS release (contributors *and* enough maintainer power to review/merge/release), we should not actively prevent them from doing that. Yeah, agreed. I don't feel like this is incompatible with the spirit of python3statement.org, though looking at the text I can see how it's not clear. My guess is they'd be happy to adjust the text, especially if it lets them add numpy :-). CC'ing Thomas and Matthias. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From solarjoe at posteo.org Thu Nov 9 04:30:57 2017 From: solarjoe at posteo.org (Joe) Date: Thu, 09 Nov 2017 10:30:57 +0100 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: <8e27f1f4cead1cf24fe71b4d6085db6b@posteo.de> Hello, I have a question and hope that you can help me. The doc for vstack mentions that "this function continues to be supported for backward compatibility, but you should prefer np.concatenate or np.stack." Using vstack was convenient because "the arrays must have the same shape along all but the first axis." So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array without using e.g. atleast_2d on the (3,) array. Is there a possibility to mimic that behavior with np.concatenate or np.stack? Joe From encukou at gmail.com Thu Nov 9 05:32:51 2017 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 9 Nov 2017 11:32:51 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: <215c6bf0-a153-56c4-660f-a37d2e2cdf86@gmail.com> On 11/09/2017 12:15 AM, Nathaniel Smith wrote: > On Nov 8, 2017 16:51, "Matthew Brett" > wrote: > > Hi, > > On Wed, Nov 8, 2017 at 7:08 PM, Julian Taylor > > wrote: > > On 06.11.2017 11:10, Ralf Gommers wrote: > >> > >> > >> On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris > >> > >> wrote: > >> > >>? ? ?Hi All, > >> > >>? ? ?Thought I'd toss this out there. I'm tending towards better > sooner > >>? ? ?than later in dropping Python 2.7 support as we are starting > to run > >>? ? ?up against places where we would like to use Python 3 > features. That > >>? ? ?is particularly true on Windows where the 2.7 compiler is > really old > >>? ? ?and lacks C99 compatibility. > >> > >> > >> This is probably the most pressing reason to drop 2.7 support. > We seem > >> to be expending a lot of effort lately on this stuff. I was > previously > >> advocating being more conservative than the timeline you now > propose, > >> but this is the pain point that I think gets me over the line. > > > > > > Would dropping python2 support for windows earlier than the other > > platforms a reasonable approach? > > I am not a big fan of to dropping python2 support before 2020, but I > > have no issue with dropping python2 support on windows earlier as > it is > > our largest pain point. > > I wonder about this too.? I can imagine there are a reasonable number > of people using older Linux distributions on which they cannot upgrade > to a recent Python 3, > > > My impression is that this is increasingly rare, actually. I believe > RHEL is still shipping 2.6 by default, RHEL 6 does have Python 2.6, but RHEL 6 is in its "security and critical fixes only" phase. I would not expect people with Python 2.6 on RHEL 6 to go and upgrade Numpy to the newest version. (But I admit I might be wrong, especially regarding CentOS.) > which we've already dropped > support for, and if you want RH python then they provide supported 2.7 > and 3.latest through exactly the same channels. It might not always be the very latest, but yes, 3.6 is available through Software Collections. Let me know if I can help! I work on Python packaging at Red Hat (though on this list I'm subscribed with my personal e-mail). And feel free to direct people who have trouble running Python 3 on RHEL/CentOS to me. Also, if you haven't read Nick Coghlan's thoughts on these matters, I recommend doing that -- they're from 2015 but still relevant. (It's targetting projects run entirely by volunteers, which might not entirely apply to NumPy, but it still has some good ideas): http://www.curiousefficiency.org/posts/2015/04/stop-supporting-python26.html From allanhaldane at gmail.com Thu Nov 9 12:58:01 2017 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 9 Nov 2017 12:58:01 -0500 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: <8e27f1f4cead1cf24fe71b4d6085db6b@posteo.de> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <8e27f1f4cead1cf24fe71b4d6085db6b@posteo.de> Message-ID: <946427f4-7dbc-8f06-adcc-58863c8118e4@gmail.com> On 11/09/2017 04:30 AM, Joe wrote: > Hello, > > I have a question and hope that you can help me. > > The doc for vstack mentions that "this function continues to be > supported for backward compatibility, but you should prefer > np.concatenate or np.stack." > > Using vstack was convenient because "the arrays must have the same shape > along all but the first axis." > > So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array > without using e.g. atleast_2d on the (3,) array. > > Is there a possibility to mimic that behavior with np.concatenate or > np.stack? > > Joe I might write this as either np.concatenate([a[newaxis,:], b]) (which switches a newaxis for an atleast_2d, and is also more explicit about where the axis is added), or, as np.block([[a],[b]]) Both act like vstack. Allan From njs at pobox.com Thu Nov 9 13:21:43 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Nov 2017 12:21:43 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> Message-ID: See Thomas's reply quoted below (it was rejected by the mailing list since he's not subscribed): On Nov 9, 2017 01:24, "Thomas Kluyver" wrote: On Thu, Nov 9, 2017, at 08:52 AM, Nathaniel Smith wrote: On Nov 8, 2017 23:59, "Ralf Gommers" wrote: Regarding http://www.python3statement.org/: I'd say that as long as there are people who want to spend their energy on the LTS release (contributors *and* enough maintainer power to review/merge/release), we should not actively prevent them from doing that. Yeah, agreed. I don't feel like this is incompatible with the spirit of python3statement.org, though looking at the text I can see how it's not clear. My guess is they'd be happy to adjust the text, especially if it lets them add numpy :-). CC'ing Thomas and Matthias. Thanks Nathaniel. We have (IMO) left a degree of deliberate ambiguity around precisely what 'drop support' means, because it's not going to be the same for all projects. The nature of open source also means that there can be ambiguity over what 'support' entails and who is considered part of the project. I would say that the idea of the statement is compatible with an LTS release series receiving critical bugfixes beyond 2020, while the main energy of the project is focused on Py3-only feature releases. [If numpy-discussion doesn't allow non-member posts, feel free to pass this on or quote it in on-list messages] Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Thu Nov 9 13:24:14 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 9 Nov 2017 10:24:14 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> Message-ID: Hi all, Apologies if this mail appear out of thread I just subscribed to respond. > Yeah, agreed. I don't feel like this is incompatible with the spirit of > python3statement.org, though looking at the text I can see how it's not clear. > My guess is they'd be happy to adjust the text, especially if it lets them add > numpy :-). CC'ing Thomas and Matthias. Happy to see NumPy at least having this conversation ! I agree with Thomas, we're pretty loose on what dropping support means; one of the main reason for the Python-3-Statement is communication to users and other project; and covey that there is a strong intent that you have until 2020 to get ready (if not before). The voice of NumPy have a huge weight in the balance. I quickly went through the thread and have a few responses: > NumPy (and to a lesser extent SciPy) is in a tough position being at the > bottom of many scientific Python programming stacks. Whenever you > drop Python 2 support is going to upset someone. And that is why you should decide of doing it at some point, and telling it to the world and the sooner you decide and advertise it (regardless of effective "deadline" the better. The Scientific Python is in a catch 22 position; Most of the ecosystem will not drop 2.7 because "numpy is still compatible python 2.7", and numpy does not drop it because "many packages rely on numpy support for 2.7". > We'd have to make sure we could persuade pypi to give the older > version for Windows, by default - I don't know if that is possible. I don't think there is, though if you tag a release with `requires_python>3.3`, then pip 9+ users on python 2.7 will not even realise there are new release compatible only with 3.3+. Technically you can make numpy a meta-package that requires numpy-27 on windows only... but it has its own drawbacks. > And yet another is that when we do finally drop 2.7, I think we'd want to > get the full benefits of doing so. That's new 3.x features (@ in > particular), cleaning up lots of support code, etc. > Regarding http://www.python3statement.org/: I'd say that as long as there > are people who want to spend their energy on the LTS release (contributors > *and* enough maintainer power to review/merge/release), we should not > actively prevent them from doing that. These two are _in practice_ against each other; if you do major cleaning then most backports will have a hard time being auto applied (just a warning). If you have a team that want to do a LTS I would suggest "cleaning" only when you are actually touching some code and the python-2 support code is in the way. not cleaning "for the sake of cleaning" at least until the 2 code base are far enough. We have a bot on Jupyter/Matplotlib that help to backport PRs to older branches. I'm happy to open it to the numpy org if it helps. > It is too ambitious to pledge to drop support for Python 2.7 no later than > 2020, coinciding with the Python development team?s timeline for dropping > support for Python 2.7? The hardest part is communication. And not just "We're dropping in 2020" but also "We still care about you, 2.7 users", ans especially tell 2.7 users and old pip users how to correctly pip their dependency on numpy (to still get LTS if LTS there is) One more thing, there is a lot of of discussion about a "Volonteer LTS", you may also want to consider a partnership with a company for an officially recommended commercial offer. Thanks, -- Matthias On Thu, Nov 9, 2017 at 10:21 AM, Nathaniel Smith wrote: > See Thomas's reply quoted below (it was rejected by the mailing list since > he's not subscribed): > > > On Nov 9, 2017 01:24, "Thomas Kluyver" wrote: > > On Thu, Nov 9, 2017, at 08:52 AM, Nathaniel Smith wrote: > > On Nov 8, 2017 23:59, "Ralf Gommers" wrote: > > Regarding http://www.python3statement.org/: I'd say that as long as there > are people who want to spend their energy on the LTS release (contributors > *and* enough maintainer power to review/merge/release), we should not > actively prevent them from doing that. > > > Yeah, agreed. I don't feel like this is incompatible with the spirit of > python3statement.org, though looking at the text I can see how it's not > clear. My guess is they'd be happy to adjust the text, especially if it lets > them add numpy :-). CC'ing Thomas and Matthias. > > > Thanks Nathaniel. We have (IMO) left a degree of deliberate ambiguity around > precisely what 'drop support' means, because it's not going to be the same > for all projects. The nature of open source also means that there can be > ambiguity over what 'support' entails and who is considered part of the > project. > > I would say that the idea of the statement is compatible with an LTS release > series receiving critical bugfixes beyond 2020, while the main energy of the > project is focused on Py3-only feature releases. > > [If numpy-discussion doesn't allow non-member posts, feel free to pass this > on or quote it in on-list messages] > > Thomas > > From robert.kern at gmail.com Thu Nov 9 14:20:26 2017 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 Nov 2017 11:20:26 -0800 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: <8e27f1f4cead1cf24fe71b4d6085db6b@posteo.de> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <8e27f1f4cead1cf24fe71b4d6085db6b@posteo.de> Message-ID: On Thu, Nov 9, 2017 at 1:30 AM, Joe wrote: > > Hello, > > I have a question and hope that you can help me. > > The doc for vstack mentions that "this function continues to be supported for backward compatibility, but you should prefer np.concatenate or np.stack." > > Using vstack was convenient because "the arrays must have the same shape along all but the first axis." > > So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array without using e.g. atleast_2d on the (3,) array. > > Is there a possibility to mimic that behavior with np.concatenate or np.stack? Quite frankly, I ignore the documentation as I think it's recommendation is wrong in these cases. Vive la vstack! -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Nov 9 14:35:43 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 9 Nov 2017 12:35:43 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> Message-ID: On Thu, Nov 9, 2017 at 11:24 AM, Matthias Bussonnier < bussonniermatthias at gmail.com> wrote: > Hi all, > > Apologies if this mail appear out of thread I just subscribed to respond. > > > Yeah, agreed. I don't feel like this is incompatible with the spirit of > > python3statement.org, though looking at the text I can see how it's not > clear. > > My guess is they'd be happy to adjust the text, especially if it lets > them add > > numpy :-). CC'ing Thomas and Matthias. > > Happy to see NumPy at least having this conversation ! I agree with Thomas, > we're pretty loose on what dropping support means; one of the main reason > for > the Python-3-Statement is communication to users and other project; and > covey > that there is a strong intent that you have until 2020 to get ready (if not > before). > > The voice of NumPy have a huge weight in the balance. > > I quickly went through the thread and have a few responses: > > > NumPy (and to a lesser extent SciPy) is in a tough position being at the > > bottom of many scientific Python programming stacks. Whenever you > > drop Python 2 support is going to upset someone. > > And that is why you should decide of doing it at some point, and telling > it to > the world and the sooner you decide and advertise it (regardless of > effective > "deadline" the better. > > The Scientific Python is in a catch 22 position; Most of the ecosystem > will not > drop 2.7 because "numpy is still compatible python 2.7", and numpy does > not drop > it because "many packages rely on numpy support for 2.7". > > > > We'd have to make sure we could persuade pypi to give the older > > version for Windows, by default - I don't know if that is possible. > > I don't think there is, though if you tag a release with > `requires_python>3.3`, > then pip 9+ users on python 2.7 will not even realise there are new release > compatible only with 3.3+. > > Technically you can make numpy a meta-package that requires numpy-27 on > windows > only... but it has its own drawbacks. > > > And yet another is that when we do finally drop 2.7, I think we'd want to > > get the full benefits of doing so. That's new 3.x features (@ in > > particular), cleaning up lots of support code, etc. > > > Regarding http://www.python3statement.org/: I'd say that as long as > there > > are people who want to spend their energy on the LTS release > (contributors > > *and* enough maintainer power to review/merge/release), we should not > > actively prevent them from doing that. > > These two are _in practice_ against each other; if you do major cleaning > then > most backports will have a hard time being auto applied (just a warning). > If you > have a team that want to do a LTS I would suggest "cleaning" only when you > are > actually touching some code and the python-2 support code is in the way. > not > cleaning "for the sake of cleaning" at least until the 2 code base are far > enough. > > We have a bot on Jupyter/Matplotlib that help to backport PRs to older > branches. > I'm happy to open it to the numpy org if it helps. > > > It is too ambitious to pledge to drop support for Python 2.7 no later > than > > 2020, coinciding with the Python development team?s timeline for dropping > > support for Python 2.7? > > The hardest part is communication. And not just "We're dropping in 2020" > but > also "We still care about you, 2.7 users", ans especially tell 2.7 users > and old > pip users how to correctly pip their dependency on numpy (to still get > LTS if LTS > there is) > > One more thing, there is a lot of of discussion about a "Volonteer LTS", > you may > also want to consider a partnership with a company for an officially > recommended > commercial offer. > > One thing worth considering might be making the release that drops 2.7 NumPy 2.0 just so that there is a clear break point. Then if someone wants to continue the 1.x line of releases supporting 2.7 they can do so. ISTR that git now has some features that might aid in that. If the `requires` bit works well with pip then we can also share the same pip page (maybe). Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at anaconda.com Thu Nov 9 14:43:35 2017 From: bryanv at anaconda.com (Bryan Van de ven) Date: Thu, 9 Nov 2017 13:43:35 -0600 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> Message-ID: <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> > On Nov 9, 2017, at 13:35, Charles R Harris wrote: > > One thing worth considering might be making the release that drops 2.7 NumPy 2.0 just so that there is a clear break point. Then if someone wants to continue the 1.x line of releases supporting 2.7 they can do so. ISTR that git now has some features that might aid in that. If the `requires` bit works well with pip then we can also share the same pip page (maybe). I personally think this is definitely advisable. FWIW Bokeh will definitely be bumping major numbers when dropping Python 2 support, or classic notebook support, etc. Bryan From m.h.vankerkwijk at gmail.com Thu Nov 9 15:53:36 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 9 Nov 2017 15:53:36 -0500 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: In astropy we had a similar discussion about version numbers, and decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the first that does not. If we're discussing jumping a major number, we could do the same for numpy. (Admittedly, it made a bit more sense with the numbering scheme astropy had adopted anyway.) -- Marten From markbak at gmail.com Thu Nov 9 16:58:08 2017 From: markbak at gmail.com (Mark Bakker) Date: Thu, 9 Nov 2017 22:58:08 +0100 Subject: [Numpy-discussion] np.vstack vs. np.stack Message-ID: > On 11/09/2017 04:30 AM, Joe wrote: > > Hello, > > > > I have a question and hope that you can help me. > > > > The doc for vstack mentions that "this function continues to be > > supported for backward compatibility, but you should prefer > > np.concatenate or np.stack." > > > > Using vstack was convenient because "the arrays must have the same shape > > along all but the first axis." > > > > So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array > > without using e.g. atleast_2d on the (3,) array. > > > > Is there a possibility to mimic that behavior with np.concatenate or > > np.stack? > > > > Joe > > Can anybody explain why vstack is going the way of the dodo? Why are stack / concatenate better? What is 'bad' about vstack? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Nov 9 17:11:17 2017 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 09 Nov 2017 22:11:17 +0000 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: Message-ID: I think the primary problems with it are: - A poor definition of ?vertical? in the world of stacked matrices - in np.linalg land, this means axis=-2, but in vstack land, it means axis=0. - Mostly undocumented auto-2d behavior that doesn?t make you think well enough about dimensions. Numpy deliberately distinguishes between ?row vectors? (1, N) and vectors (N,), so it?s a shame when APIs like vstack and np.matrix try to hide this distinction. Eric On Thu, 9 Nov 2017 at 13:59 Mark Bakker wrote: On 11/09/2017 04:30 AM, Joe wrote: >> > Hello, >> > >> > I have a question and hope that you can help me. >> > >> > The doc for vstack mentions that "this function continues to be >> > supported for backward compatibility, but you should prefer >> > np.concatenate or np.stack." >> > >> > Using vstack was convenient because "the arrays must have the same shape >> > along all but the first axis." >> > >> > So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array >> > without using e.g. atleast_2d on the (3,) array. >> > >> > Is there a possibility to mimic that behavior with np.concatenate or >> > np.stack? >> > >> > > Joe >> >> > Can anybody explain why vstack is going the way of the dodo? > Why are stack / concatenate better? What is 'bad' about vstack? > > Thanks, > > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 9 17:37:27 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 09 Nov 2017 22:37:27 +0000 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: Message-ID: I'm pretty sure I wrote the offending line in the vstack() docs. The original motivation for stack() was that stacking behavior of hstack(), vstack() and dstack() was somewhat inconsistent, especially with regard to lower dimensional input. stack() is conceptually much simpler and more general. That said, if you know vstack() and find it useful, great, use it. It is not going away in NumPy. We don't remove functions just because there's a better alternative API, but rather use the docs to try to point new users in a better direction. On Thu, Nov 9, 2017 at 2:11 PM Eric Wieser wrote: > I think the primary problems with it are: > > - A poor definition of ?vertical? in the world of stacked matrices - > in np.linalg land, this means axis=-2, but in vstack land, it means > axis=0. > - Mostly undocumented auto-2d behavior that doesn?t make you think > well enough about dimensions. Numpy deliberately distinguishes between ?row > vectors? (1, N) and vectors (N,), so it?s a shame when APIs like vstack > and np.matrix try to hide this distinction. > > Eric > > On Thu, 9 Nov 2017 at 13:59 Mark Bakker wrote: > > On 11/09/2017 04:30 AM, Joe wrote: >>> > Hello, >>> > >>> > I have a question and hope that you can help me. >>> > >>> > The doc for vstack mentions that "this function continues to be >>> > supported for backward compatibility, but you should prefer >>> > np.concatenate or np.stack." >>> > >>> > Using vstack was convenient because "the arrays must have the same >>> shape >>> > along all but the first axis." >>> > >>> > So it was possible to stack an array (3,) and (2, 3) to a (3, 3) array >>> > without using e.g. atleast_2d on the (3,) array. >>> > >>> > Is there a possibility to mimic that behavior with np.concatenate or >>> > np.stack? >>> > >>> >> > Joe >>> >>> >> Can anybody explain why vstack is going the way of the dodo? >> Why are stack / concatenate better? What is 'bad' about vstack? >> >> Thanks, >> >> Mark >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Nov 9 17:39:45 2017 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 Nov 2017 14:39:45 -0800 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: Message-ID: On Thu, Nov 9, 2017 at 1:58 PM, Mark Bakker wrote: > Can anybody explain why vstack is going the way of the dodo? > Why are stack / concatenate better? What is 'bad' about vstack? As far as I can tell, the discussion happened all on Github, not the mailing list. See here for references: https://github.com/numpy/numpy/pull/7253 -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu Nov 9 17:49:34 2017 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 9 Nov 2017 17:49:34 -0500 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: Message-ID: <86264a5a-b78e-4cb4-44c3-720f3325c0e0@gmail.com> On 11/09/2017 05:39 PM, Robert Kern wrote: > On Thu, Nov 9, 2017 at 1:58 PM, Mark Bakker wrote: > >> Can anybody explain why vstack is going the way of the dodo? >> Why are stack / concatenate better? What is 'bad' about vstack? > > As far as I can tell, the discussion happened all on Github, not the > mailing list. See here for references: > > https://github.com/numpy/numpy/pull/7253 > > -- > Robert Kern yes, and in particular this linked comment/PR: https://github.com/numpy/numpy/pull/5605#issuecomment-85180204 Maybe we should reword the vstack docstring so that it doesn't imply that vstack is going away. It should say something weaker like "the functions np.stack, np.concatenate, and np.block are often more general/useful/less confusing alternatives".. or better explain what the problem is. If we limit ourselves to 1d,2d and maybe 3d arrays the vstack behavior doesn't seem all that confusing to me. Allan From robert.kern at gmail.com Thu Nov 9 17:53:48 2017 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 Nov 2017 14:53:48 -0800 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: <86264a5a-b78e-4cb4-44c3-720f3325c0e0@gmail.com> References: <86264a5a-b78e-4cb4-44c3-720f3325c0e0@gmail.com> Message-ID: On Thu, Nov 9, 2017 at 2:49 PM, Allan Haldane wrote: > > On 11/09/2017 05:39 PM, Robert Kern wrote: > > On Thu, Nov 9, 2017 at 1:58 PM, Mark Bakker wrote: > > > >> Can anybody explain why vstack is going the way of the dodo? > >> Why are stack / concatenate better? What is 'bad' about vstack? > > > > As far as I can tell, the discussion happened all on Github, not the > > mailing list. See here for references: > > > > https://github.com/numpy/numpy/pull/7253 > > > > -- > > Robert Kern > > yes, and in particular this linked comment/PR: > > https://github.com/numpy/numpy/pull/5605#issuecomment-85180204 > > Maybe we should reword the vstack docstring so that it doesn't imply > that vstack is going away. It should say something weaker > like "the functions np.stack, np.concatenate, and np.block are often > more general/useful/less confusing alternatives".. or better explain > what the problem is. > > If we limit ourselves to 1d,2d and maybe 3d arrays the vstack behavior > doesn't seem all that confusing to me. I concur. Highlighting that the functions are only being retained "for backward compatibility" does seem to imply to people that they are deprecated and cannot be relied upon to remain. We *do* break backwards compatibility from time to time. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Nov 9 20:52:18 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Nov 2017 17:52:18 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: Fortunately we can wait until we're a bit closer before we have to make any final decision on the version numbering :-) Right now though it would be good to start communicating to users/downstreams about whatever our plans our though, so they can make plans. Here's a first attempt at some text we can put in the documentation and point people to -- any thoughts, on either the plan or the wording? ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- The Python core team plans to stop supporting Python 2 in 2020. The NumPy project has supported both Python 2 and Python 3 in parallel since 2010, and has found that supporting Python 2 is an increasing burden on our limited resources; thus, we plan to eventually drop Python 2 support as well. Now that we're entering the final years of community-supported Python 2, the NumPy project wants to clarify our plans, with the goal of to helping our downstream ecosystem make plans and accomplish the transition with as little disruption as possible. Our current plan is as follows: Until **December 31, 2018**, all NumPy releases will fully support both Python 2 and Python 3. Starting on **January 1, 2019**, any new feature releases will support only Python 3. The last Python-2-supporting release will be designated as a long-term support (LTS) release, meaning that we will continue to merge bug-fixes and make bug-fix releases for a longer period than usual. Specifically, it will be supported by the community until **December 31, 2019**. On **January 1, 2020** we will raise a toast to Python 2, and community support for the last Python-2-supporting release will come to an end. However, it will continue to be available on PyPI indefinitely, and if any commercial vendors wish to extend the LTS support past this point then we are open to letting them use the LTS branch in the official NumPy repository to coordinate that. If you are a NumPy user who requires ongoing Python 2 support in 2020 or later, then please contact your vendor. If you are a vendor who wishes to continue to support NumPy on Python 2 in 2020+, please get in touch; ideally we'd like you to get involved in maintaining the LTS before it actually hits end-of-life, so we can make a clean handoff. To minimize disruption, running 'pip install numpy' on Python 2 will continue to give the last working release in perpetuity; but after January 1, 2019 it may not contain the latest features, and after January 1, 2020 it may not contain the latest bug fixes. For more information on the scientific Python ecosystem's transition to Python-3-only, see: http://www.python3statement.org/ For more information on porting your code to run on Python 3, see: https://docs.python.org/3/howto/pyporting.html ---- Thoughts? -n On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk wrote: > In astropy we had a similar discussion about version numbers, and > decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the > first that does not. If we're discussing jumping a major number, we > could do the same for numpy. (Admittedly, it made a bit more sense > with the numbering scheme astropy had adopted anyway.) -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith -- https://vorpus.org From bussonniermatthias at gmail.com Thu Nov 9 22:35:09 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 9 Nov 2017 19:35:09 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: 'pip install ... will' to 'pip install ... should' especially for 2.7 users it's rarer to have an up to date enough pip (9+) to respect the requires_python metadata. A mention to the py3statement would be appreciated :-) especially if you decide to sign it. You might want to also be a bit more positive on the python 2 burden (that's the stick) and add a phrase about "allowing you to implement features for python 3 users that are incompatible with having compatibility with python 2". The "supported by the community" should (IMHO) be made slightly clearer, as well as what bug fix you expect core dev to _do_ vs _accept_, but that can be a separate document somewhere else to refer to. Thanks ! -- M On Nov 9, 2017 17:52, "Nathaniel Smith" wrote: Fortunately we can wait until we're a bit closer before we have to make any final decision on the version numbering :-) Right now though it would be good to start communicating to users/downstreams about whatever our plans our though, so they can make plans. Here's a first attempt at some text we can put in the documentation and point people to -- any thoughts, on either the plan or the wording? ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- The Python core team plans to stop supporting Python 2 in 2020. The NumPy project has supported both Python 2 and Python 3 in parallel since 2010, and has found that supporting Python 2 is an increasing burden on our limited resources; thus, we plan to eventually drop Python 2 support as well. Now that we're entering the final years of community-supported Python 2, the NumPy project wants to clarify our plans, with the goal of to helping our downstream ecosystem make plans and accomplish the transition with as little disruption as possible. Our current plan is as follows: Until **December 31, 2018**, all NumPy releases will fully support both Python 2 and Python 3. Starting on **January 1, 2019**, any new feature releases will support only Python 3. The last Python-2-supporting release will be designated as a long-term support (LTS) release, meaning that we will continue to merge bug-fixes and make bug-fix releases for a longer period than usual. Specifically, it will be supported by the community until **December 31, 2019**. On **January 1, 2020** we will raise a toast to Python 2, and community support for the last Python-2-supporting release will come to an end. However, it will continue to be available on PyPI indefinitely, and if any commercial vendors wish to extend the LTS support past this point then we are open to letting them use the LTS branch in the official NumPy repository to coordinate that. If you are a NumPy user who requires ongoing Python 2 support in 2020 or later, then please contact your vendor. If you are a vendor who wishes to continue to support NumPy on Python 2 in 2020+, please get in touch; ideally we'd like you to get involved in maintaining the LTS before it actually hits end-of-life, so we can make a clean handoff. To minimize disruption, running 'pip install numpy' on Python 2 will continue to give the last working release in perpetuity; but after January 1, 2019 it may not contain the latest features, and after January 1, 2020 it may not contain the latest bug fixes. For more information on the scientific Python ecosystem's transition to Python-3-only, see: http://www.python3statement.org/ For more information on porting your code to run on Python 3, see: https://docs.python.org/3/howto/pyporting.html ---- Thoughts? -n On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk wrote: > In astropy we had a similar discussion about version numbers, and > decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the > first that does not. If we're discussing jumping a major number, we > could do the same for numpy. (Admittedly, it made a bit more sense > with the numbering scheme astropy had adopted anyway.) -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Nov 10 01:36:51 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 10 Nov 2017 07:36:51 +0100 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: References: Message-ID: <20171110063651.GD1452142@phare.normalesup.org> Another point in defence of vstack vs stack/concatenate: last time I looked, it was faster on smallish arrays. Ga?l From njs at pobox.com Fri Nov 10 05:25:19 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 10 Nov 2017 02:25:19 -0800 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand, flags? In-Reply-To: <2618f2cf-0c1f-44f7-40db-95972484c58b@gmail.com> References: <71d9e646-92e3-87f4-e0f9-6d43f845c529@gmail.com> <2618f2cf-0c1f-44f7-40db-95972484c58b@gmail.com> Message-ID: On Wed, Nov 8, 2017 at 2:13 PM, Allan Haldane wrote: > On 11/08/2017 03:12 PM, Nathaniel Smith wrote: >> - We could adjust the API so that there's some explicit operation to >> trigger the final writeback. At the Python level this would probably >> mean that we start supporting the use of nditer as a context manager, >> and eventually start raising an error if you're in one of the "unsafe" >> case and not using the context manager form. At the C level we >> probably need some explicit "I'm done with this iterator now" call. >> >> One question is which cases exactly should produce warnings/eventually >> errors. At the Python level, I guess the simplest rule would be that >> if you have any write/readwrite arrays in your iterator, then you have >> to use a 'with' block. At the C level, it's a little trickier, because >> it's hard to tell up-front whether someone has updated their code to >> call a final cleanup function, and it's hard to emit a warning/error >> on something that *doesn't* happen. (You could print a warning when >> the nditer object is GCed if the cleanup function wasn't called, but >> you can't raise an error there.) I guess the only reasonable option is >> to deprecate NPY_ITER_READWRITE and NP_ITER_WRITEONLY, and make people >> switch to passing new flags that have the same semantics but also >> promise that the user has updated their code to call the new cleanup >> function. > Seems reasonable. > > When people use the Nditer C-api, they (almost?) always call > NpyIter_Dealloc when they're done. Maybe that's a place to put a warning > for C-api users. I think you can emit a warning there since that > function calls the GC, not the other way around. > > It looks like you've already discussed the possibilities of putting > things in NpyIter_Dealloc though, and it could be tricky, but if we only > need a warning maybe there's a way. > https://github.com/numpy/numpy/pull/9269/files/6dc0c65e4b2ea67688d6b617da3a175cd603fc18#r127707149 Oh, hmm, yeah, on further examination there are some more options here. I had missed that for some reason NpyIter isn't actually a Python object, so actually it's never subject to GC and you always need to call NpyIter_Deallocate when you are finished with it. So that's a natural place to perform writebacks. We don't even need a warning. (Which is good, because warnings can be set to raise errors, and while the docs say that NpyIter_Deallocate can fail, in fact it never has been able to in the past and none of the code in numpy or the examples in the docs actually check the return value. Though I guess in theory writeback can also fail so I suppose we need to start returning NPY_FAIL in that case. But it should be vanishingly rare in practice, and it's not clear if anyone is even using this API outside of numpy.) And for the Python-level API, there is the option of performing the final writeback when the iterator is exhausted. The downside to this is that if someone only goes half-way through the iteration and then aborts (e.g. by raising an exception), then the last round of writeback won't happen. But maybe that's fine, or at least better than forcing the use of 'with' blocks everywhere? If we do this then I think we'd at least want to make sure that the writeback really never happens, as opposed to happening at some random later point when the Python iterator object is GCed. But I'd appreciate if anyone would express a preference between these :-) -n -- Nathaniel J. Smith -- https://vorpus.org From shoyer at gmail.com Fri Nov 10 12:44:23 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 10 Nov 2017 17:44:23 +0000 Subject: [Numpy-discussion] np.vstack vs. np.stack In-Reply-To: <86264a5a-b78e-4cb4-44c3-720f3325c0e0@gmail.com> References: <86264a5a-b78e-4cb4-44c3-720f3325c0e0@gmail.com> Message-ID: On Thu, Nov 9, 2017 at 2:49 PM Allan Haldane wrote: > Maybe we should reword the vstack docstring so that it doesn't imply > that vstack is going away. It should say something weaker > like "the functions np.stack, np.concatenate, and np.block are often > more general/useful/less confusing alternatives".. or better explain > what the problem is. > Yes, I would support this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Fri Nov 10 17:03:01 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Fri, 10 Nov 2017 14:03:01 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Wed, Nov 8, 2017 at 2:50 PM, Matthew Brett wrote: > Hi, > > On Wed, Nov 8, 2017 at 7:08 PM, Julian Taylor > wrote: > > On 06.11.2017 11:10, Ralf Gommers wrote: > >> > >> > >> On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris > >> > wrote: > >> > >> Hi All, > >> > >> Thought I'd toss this out there. I'm tending towards better sooner > >> than later in dropping Python 2.7 support as we are starting to run > >> up against places where we would like to use Python 3 features. That > >> is particularly true on Windows where the 2.7 compiler is really old > >> and lacks C99 compatibility. > >> > >> > >> This is probably the most pressing reason to drop 2.7 support. We seem > >> to be expending a lot of effort lately on this stuff. I was previously > >> advocating being more conservative than the timeline you now propose, > >> but this is the pain point that I think gets me over the line. > > > > > > Would dropping python2 support for windows earlier than the other > > platforms a reasonable approach? > > I am not a big fan of to dropping python2 support before 2020, but I > > have no issue with dropping python2 support on windows earlier as it is > > our largest pain point. > > I wonder about this too. I can imagine there are a reasonable number > of people using older Linux distributions on which they cannot upgrade > to a recent Python 3, but is that likely to be true for Windows? > > We'd have to make sure we could persuade pypi to give the older > version for Windows, by default - I don't know if that is possible. > Pip repo names and actual module names don't have to be the same. One potential work-around would be to make a 'numpylts' repo on PyPi which is the 1.17 version with support for Python 2.7 and bug-fix releases as required. This will still cause regressions but it's a matter of modifying `requirements.txt` in downstream Python 2.7 packages and not much else. E.g. in `requirements.txt`: numpy; python_version>"3.0" numpylts; python_version<"3.0" In both cases you still call `import numpy` in the code. Robert -- Robert McLeod, Ph.D. robbmcleod at gmail.com robbmcleod at protonmail.com www.entropyreduction.al -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Nov 12 04:04:57 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 12 Nov 2017 22:04:57 +1300 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Fri, Nov 10, 2017 at 2:52 PM, Nathaniel Smith wrote: > Fortunately we can wait until we're a bit closer before we have to > make any final decision on the version numbering :-) > > Right now though it would be good to start communicating to > users/downstreams about whatever our plans our though, so they can > make plans. Here's a first attempt at some text we can put in the > documentation and point people to -- any thoughts, on either the plan > or the wording? > > ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- > > The Python core team plans to stop supporting Python 2 in 2020. The > NumPy project has supported both Python 2 and Python 3 in parallel > since 2010, and has found that supporting Python 2 is an increasing > burden on our limited resources; thus, we plan to eventually drop > Python 2 support as well. Now that we're entering the final years of > community-supported Python 2, the NumPy project wants to clarify our > plans, with the goal of to helping our downstream ecosystem make plans > and accomplish the transition with as little disruption as possible. > > Our current plan is as follows: > > Until **December 31, 2018**, all NumPy releases will fully support > both Python 2 and Python 3. > > Starting on **January 1, 2019**, any new feature releases will support > only Python 3. > > The last Python-2-supporting release will be designated as a long-term > support (LTS) release, meaning that we will continue to merge > bug-fixes and make bug-fix releases for a longer period than usual. > Specifically, it will be supported by the community until **December > 31, 2019**. > > On **January 1, 2020** we will raise a toast to Python 2, and > community support for the last Python-2-supporting release will come > to an end. However, it will continue to be available on PyPI > indefinitely, and if any commercial vendors wish to extend the LTS > support past this point then we are open to letting them use the LTS > branch in the official NumPy repository to coordinate that. > > If you are a NumPy user who requires ongoing Python 2 support in 2020 > or later, then please contact your vendor. If you are a vendor who > wishes to continue to support NumPy on Python 2 in 2020+, please get > in touch; ideally we'd like you to get involved in maintaining the LTS > before it actually hits end-of-life, so we can make a clean handoff. > > To minimize disruption, running 'pip install numpy' on Python 2 will > continue to give the last working release in perpetuity; but after > January 1, 2019 it may not contain the latest features, and after > January 1, 2020 it may not contain the latest bug fixes. > > For more information on the scientific Python ecosystem's transition > to Python-3-only, see: http://www.python3statement.org/ > > For more information on porting your code to run on Python 3, see: > https://docs.python.org/3/howto/pyporting.html > > ---- > > Thoughts? > Thanks for writing that up. Text works for me! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Nov 12 11:04:46 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 12 Nov 2017 09:04:46 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Thu, Nov 9, 2017 at 6:52 PM, Nathaniel Smith wrote: > Fortunately we can wait until we're a bit closer before we have to > make any final decision on the version numbering :-) > > Right now though it would be good to start communicating to > users/downstreams about whatever our plans our though, so they can > make plans. Here's a first attempt at some text we can put in the > documentation and point people to -- any thoughts, on either the plan > or the wording? > > ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- > > The Python core team plans to stop supporting Python 2 in 2020. The > NumPy project has supported both Python 2 and Python 3 in parallel > since 2010, and has found that supporting Python 2 is an increasing > burden on our limited resources; thus, we plan to eventually drop > Python 2 support as well. Now that we're entering the final years of > community-supported Python 2, the NumPy project wants to clarify our > plans, with the goal of to helping our downstream ecosystem make plans > and accomplish the transition with as little disruption as possible. > > Our current plan is as follows: > > Until **December 31, 2018**, all NumPy releases will fully support > both Python 2 and Python 3. > > Starting on **January 1, 2019**, any new feature releases will support > only Python 3. > > The last Python-2-supporting release will be designated as a long-term > support (LTS) release, meaning that we will continue to merge > bug-fixes and make bug-fix releases for a longer period than usual. > Specifically, it will be supported by the community until **December > 31, 2019**. > > On **January 1, 2020** we will raise a toast to Python 2, and > community support for the last Python-2-supporting release will come > to an end. However, it will continue to be available on PyPI > indefinitely, and if any commercial vendors wish to extend the LTS > support past this point then we are open to letting them use the LTS > branch in the official NumPy repository to coordinate that. > > If you are a NumPy user who requires ongoing Python 2 support in 2020 > or later, then please contact your vendor. If you are a vendor who > wishes to continue to support NumPy on Python 2 in 2020+, please get > in touch; ideally we'd like you to get involved in maintaining the LTS > before it actually hits end-of-life, so we can make a clean handoff. > > To minimize disruption, running 'pip install numpy' on Python 2 will > continue to give the last working release in perpetuity; but after > January 1, 2019 it may not contain the latest features, and after > January 1, 2020 it may not contain the latest bug fixes. > > For more information on the scientific Python ecosystem's transition > to Python-3-only, see: http://www.python3statement.org/ > > For more information on porting your code to run on Python 3, see: > https://docs.python.org/3/howto/pyporting.html > > ---- > > Thoughts? > > -n > I've put up an NEP for the proposal. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sun Nov 12 14:13:00 2017 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 12 Nov 2017 21:13:00 +0200 Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand, flags? In-Reply-To: References: Message-ID: <58bc1611-8b75-6d58-d07a-f160bd70716a@gmail.com> On 10/11/17 12:25, numpy-discussion-request at python.org wrote: > Date: Fri, 10 Nov 2017 02:25:19 -0800 > From: Nathaniel Smith > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] deprecate updateifcopy in nditer > operand, flags? > Message-ID: > > Content-Type: text/plain; charset="UTF-8" > > On Wed, Nov 8, 2017 at 2:13 PM, Allan Haldane wrote: >> On 11/08/2017 03:12 PM, Nathaniel Smith wrote: >>> - We could adjust the API so that there's some explicit operation to >>> trigger the final writeback. At the Python level this would probably >>> mean that we start supporting the use of nditer as a context manager, >>> and eventually start raising an error if you're in one of the "unsafe" >>> case and not using the context manager form. At the C level we >>> probably need some explicit "I'm done with this iterator now" call. >>> >>> One question is which cases exactly should produce warnings/eventually >>> errors. At the Python level, I guess the simplest rule would be that >>> if you have any write/readwrite arrays in your iterator, then you have >>> to use a 'with' block. At the C level, it's a little trickier, because >>> it's hard to tell up-front whether someone has updated their code to >>> call a final cleanup function, and it's hard to emit a warning/error >>> on something that*doesn't* happen. (You could print a warning when >>> the nditer object is GCed if the cleanup function wasn't called, but >>> you can't raise an error there.) I guess the only reasonable option is >>> to deprecate NPY_ITER_READWRITE and NP_ITER_WRITEONLY, and make people >>> switch to passing new flags that have the same semantics but also >>> promise that the user has updated their code to call the new cleanup >>> function. >> Seems reasonable. >> >> When people use the Nditer C-api, they (almost?) always call >> NpyIter_Dealloc when they're done. Maybe that's a place to put a warning >> for C-api users. I think you can emit a warning there since that >> function calls the GC, not the other way around. >> >> It looks like you've already discussed the possibilities of putting >> things in NpyIter_Dealloc though, and it could be tricky, but if we only >> need a warning maybe there's a way. >> https://github.com/numpy/numpy/pull/9269/files/6dc0c65e4b2ea67688d6b617da3a175cd603fc18#r127707149 > Oh, hmm, yeah, on further examination there are some more options here. > > I had missed that for some reason NpyIter isn't actually a Python > object, so actually it's never subject to GC and you always need to > call NpyIter_Deallocate when you are finished with it. So that's a > natural place to perform writebacks. We don't even need a warning. > (Which is good, because warnings can be set to raise errors, and while > the docs say that NpyIter_Deallocate can fail, in fact it never has > been able to in the past and none of the code in numpy or the examples > in the docs actually check the return value. Though I guess in theory > writeback can also fail so I suppose we need to start returning > NPY_FAIL in that case. But it should be vanishingly rare in practice, > and it's not clear if anyone is even using this API outside of numpy.) > > And for the Python-level API, there is the option of performing the > final writeback when the iterator is exhausted. The downside to this > is that if someone only goes half-way through the iteration and then > aborts (e.g. by raising an exception), then the last round of > writeback won't happen. But maybe that's fine, or at least better than > forcing the use of 'with' blocks everywhere? If we do this then I > think we'd at least want to make sure that the writeback really never > happens, as opposed to happening at some random later point when the > Python iterator object is GCed. But I'd appreciate if anyone would > express a preference between these:-) > > -n > > -- Nathaniel J. Smith -- https://vorpus.org We cannot assume that the call to NPyIter_Deallocate() can resolve writebackifcopy semantics. NPyIter_Copy() will return a new iterator (after Py_INCREF ing the operands), so when either the original or the copy is deallocated the operand's writeback buffer may still be needed. So at the C level the user must resolve the writback when the last copy of the iterator is deallocated. At the python level we can force the use of a context manager and prohibit use of a suspicious (one with writebackifcopy semantics) nditer outside of a context manager. As for non-exhausted nditers, IMO using a context manager makes it very clear when the writeback resolution is meant to happen. Do we really want to support a use case where someone creates an iterator, uses it partially, then needs to think carefully about whether the operand changes will be resolved? Matti From toddrjen at gmail.com Sun Nov 12 16:12:15 2017 From: toddrjen at gmail.com (Todd) Date: Sun, 12 Nov 2017 16:12:15 -0500 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Nov 9, 2017 20:52, "Nathaniel Smith" wrote: Fortunately we can wait until we're a bit closer before we have to make any final decision on the version numbering :-) Right now though it would be good to start communicating to users/downstreams about whatever our plans our though, so they can make plans. Here's a first attempt at some text we can put in the documentation and point people to -- any thoughts, on either the plan or the wording? ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- The Python core team plans to stop supporting Python 2 in 2020. The NumPy project has supported both Python 2 and Python 3 in parallel since 2010, and has found that supporting Python 2 is an increasing burden on our limited resources; thus, we plan to eventually drop Python 2 support as well. Now that we're entering the final years of community-supported Python 2, the NumPy project wants to clarify our plans, with the goal of to helping our downstream ecosystem make plans and accomplish the transition with as little disruption as possible. Our current plan is as follows: Until **December 31, 2018**, all NumPy releases will fully support both Python 2 and Python 3. Starting on **January 1, 2019**, any new feature releases will support only Python 3. The last Python-2-supporting release will be designated as a long-term support (LTS) release, meaning that we will continue to merge bug-fixes and make bug-fix releases for a longer period than usual. Specifically, it will be supported by the community until **December 31, 2019**. On **January 1, 2020** we will raise a toast to Python 2, and community support for the last Python-2-supporting release will come to an end. However, it will continue to be available on PyPI indefinitely, and if any commercial vendors wish to extend the LTS support past this point then we are open to letting them use the LTS branch in the official NumPy repository to coordinate that. If you are a NumPy user who requires ongoing Python 2 support in 2020 or later, then please contact your vendor. If you are a vendor who wishes to continue to support NumPy on Python 2 in 2020+, please get in touch; ideally we'd like you to get involved in maintaining the LTS before it actually hits end-of-life, so we can make a clean handoff. To minimize disruption, running 'pip install numpy' on Python 2 will continue to give the last working release in perpetuity; but after January 1, 2019 it may not contain the latest features, and after January 1, 2020 it may not contain the latest bug fixes. For more information on the scientific Python ecosystem's transition to Python-3-only, see: http://www.python3statement.org/ For more information on porting your code to run on Python 3, see: https://docs.python.org/3/howto/pyporting.html ---- Thoughts? -n On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk wrote: > In astropy we had a similar discussion about version numbers, and > decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the > first that does not. If we're discussing jumping a major number, we > could do the same for numpy. (Admittedly, it made a bit more sense > with the numbering scheme astropy had adopted anyway.) -- Marten > _______________________________________________ Might it make sense to do this in a synchronized manner with scipy? So both numpy and scipy drop support for python 2 on the first release after December 31 2018, and numpy's first python3-only release comes before (or simultaneously with) scipy's. Then scipy can set is minimum supported numpy version to be the first python3-only version. That allows scipy to have a clean, obvious point where scipy supports only the latest numpy. This will diverge later, but it seems to be a relatively safe place to bring them back into sync. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Nov 12 16:54:13 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 12 Nov 2017 13:54:13 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Nov 12, 2017 1:12 PM, "Todd" wrote: Might it make sense to do this in a synchronized manner with scipy? So both numpy and scipy drop support for python 2 on the first release after December 31 2018, and numpy's first python3-only release comes before (or simultaneously with) scipy's. Then scipy can set is minimum supported numpy version to be the first python3-only version. That allows scipy to have a clean, obvious point where scipy supports only the latest numpy. This will diverge later, but it seems to be a relatively safe place to bring them back into sync. That's really a question for the scipy devs on the scipy mailing list. There's substantial overlap between the numpy and scipy communities, but not everyone is on both lists and they're distinct projects that sometimes have unique issues to worry about. I'd like to see numpy's downstream projects become more aggressive about dropping support for old numpy versions in general, but there's no technical reason that scipy's first 3-only release couldn't continue to support one or more numpy 2+3 releases. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Nov 13 03:11:47 2017 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 13 Nov 2017 09:11:47 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On 10 November 2017 at 23:03, Robert McLeod wrote: > E.g. in `requirements.txt`: > > numpy; python_version>"3.0" > numpylts; python_version<"3.0" > > In both cases you still call `import numpy` in the code. > For this to be efficient, it should be done soon enough to allow downstream projects to adapt their requirements.txt. Release managers: how much more effort would it be to upload current numpy to both numpy and numpylts? /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Mon Nov 13 10:47:17 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 13 Nov 2017 07:47:17 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: > For this to be efficient, it should be done soon enough to allow downstream projects to adapt their requirements.txt. > Release managers: how much more effort would it be to upload current numpy to both numpy and numpylts? I'm not quite sure I see the point. you would ask downstream to change `numpy` to `numpylts` instead of `numpy` to `numpy<2` ? Also I think then you have the risk of having for example pandas saying `numpy<2` and scipy saying `numpylts` and now the pasckages are incompatible ? -- M On Mon, Nov 13, 2017 at 12:11 AM, Da?id wrote: > On 10 November 2017 at 23:03, Robert McLeod wrote: >> >> E.g. in `requirements.txt`: >> >> numpy; python_version>"3.0" >> numpylts; python_version<"3.0" >> >> In both cases you still call `import numpy` in the code. > > > For this to be efficient, it should be done soon enough to allow downstream > projects to adapt their requirements.txt. > > Release managers: how much more effort would it be to upload current numpy > to both numpy and numpylts? > > /David. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Mon Nov 13 13:04:31 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 13 Nov 2017 10:04:31 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Fri, Nov 10, 2017 at 2:03 PM, Robert McLeod wrote: > Pip repo names and actual module names don't have to be the same. One > potential work-around would be to make a 'numpylts' repo on PyPi which is > the 1.17 version with support for Python 2.7 and bug-fix releases as > required. This will still cause regressions but it's a matter of modifying > `requirements.txt` in downstream Python 2.7 packages and not much else. > > E.g. in `requirements.txt`: > > numpy; python_version>"3.0" > numpylts; python_version<"3.0" > Can't we handle this with numpy versioning? IIUC, numpy (py3 only) and numpy (LTS) will not only support different platforms, but also be different versions. So if you have py2 or py2+3 code that uses numpy, it will have to specify a <= version number anyway. Also -- I think Nathaniel's point was that wheels have the python version baked in, so pip, when run from py2, should find the latest py2 compatible numpy automagically. And thanks for writing this up -- LGTM -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Mon Nov 13 13:08:03 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Mon, 13 Nov 2017 10:08:03 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Mon, Nov 13, 2017 at 7:47 AM, Matthias Bussonnier < bussonniermatthias at gmail.com> wrote: > > For this to be efficient, it should be done soon enough to allow > downstream projects to adapt their requirements.txt. > > Release managers: how much more effort would it be to upload current > numpy to both numpy and numpylts? > > I'm not quite sure I see the point. you would ask downstream to change > `numpy` to `numpylts` instead of `numpy` to `numpy<2` ? > > Also I think then you have the risk of having for example pandas > saying `numpy<2` and scipy saying `numpylts` and now the pasckages are > incompatible ? The trouble is PyPi doesn't allow multiple branches. So if you upload NumPy 2.0 wheels, then you cannot turn around and upload 1.18.X bug-fix patches. At least, this is my understanding of PyPi. -- Robert McLeod, Ph.D. robbmcleod at gmail.com robbmcleod at protonmail.com www.entropyreduction.al -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Mon Nov 13 13:10:35 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 13 Nov 2017 10:10:35 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: > The trouble is PyPi doesn't allow multiple branches. So if you upload NumPy 2.0 wheels, then you cannot turn around and upload 1.18.X bug-fix patches. At least, this is my understanding of PyPi. That's perfectly feasible. We've been maintaining a 6.x (Python 3 only) and 5.x (Python2+3) of IPython for about a year now. -- M On Mon, Nov 13, 2017 at 10:08 AM, Robert McLeod wrote: > > > On Mon, Nov 13, 2017 at 7:47 AM, Matthias Bussonnier > wrote: >> >> > For this to be efficient, it should be done soon enough to allow >> > downstream projects to adapt their requirements.txt. >> > Release managers: how much more effort would it be to upload current >> > numpy to both numpy and numpylts? >> >> I'm not quite sure I see the point. you would ask downstream to change >> `numpy` to `numpylts` instead of `numpy` to `numpy<2` ? >> >> Also I think then you have the risk of having for example pandas >> saying `numpy<2` and scipy saying `numpylts` and now the pasckages are >> incompatible ? > > > The trouble is PyPi doesn't allow multiple branches. So if you upload > NumPy 2.0 wheels, then you cannot turn around and upload 1.18.X bug-fix > patches. At least, this is my understanding of PyPi. > > > > -- > Robert McLeod, Ph.D. > robbmcleod at gmail.com > robbmcleod at protonmail.com > www.entropyreduction.al > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From olivier.grisel at ensta.org Mon Nov 13 13:14:39 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 13 Nov 2017 19:14:39 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: If a wheel is not available for the client platform, pip will try to install the latest version of the source distribution (.tar.gz or .zip) which I think is the cause of the problem here. -- Olivier ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Mon Nov 13 13:26:31 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 13 Nov 2017 10:26:31 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: > If a wheel is not available for the client platform, pip will try to install the latest version of the source distribution (.tar.gz or .zip) which I think is the cause of the problem here. Unless the sdist is tagged with require_python and users have recent-enough pip. Which is what was referred to earlier as "Automagically". This behavior is "new" (Nov/Dec 2016). The upstream patches were written (in part) by the IPython/Jupyter team, for this exact purpose, to not install an incompatible sdists. (Works great I can share download graphs for IPython[0]) It _does_ require to have a version of pip which is not decades old though, and may not work if you use a pypi proxy which is not pep 503 compliant (which happens, we got bug report, users then complained to IT who fixed it). -- M On Mon, Nov 13, 2017 at 10:14 AM, Olivier Grisel wrote: > If a wheel is not available for the client platform, pip will try to install > the latest version of the source distribution (.tar.gz or .zip) which I > think is the cause of the problem here. > > -- > Olivier > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From stefanv at berkeley.edu Mon Nov 13 13:31:08 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 13 Nov 2017 10:31:08 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: <1510597868.1331286.1171102248.044F3095@webmail.messagingengine.com> On Mon, Nov 13, 2017, at 10:26, Matthias Bussonnier wrote: > Unless the sdist is tagged with require_python and users have > recent-enough pip. Is this documented anywhere? I couldn't find it via Google, and suspect it may be widely useful in the next few months. St?fan From bussonniermatthias at gmail.com Mon Nov 13 13:42:29 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 13 Nov 2017 10:42:29 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <1510597868.1331286.1171102248.044F3095@webmail.messagingengine.com> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510597868.1331286.1171102248.044F3095@webmail.messagingengine.com> Message-ID: On Mon, Nov 13, 2017 at 10:31 AM, Stefan van der Walt wrote: > > Is this documented anywhere? I couldn't find it via Google, and suspect > it may be widely useful in the next few months. Everything you need to know is on the Python3Statement practicality page: http://www.python3statement.org/practicalities/ (If it's not , or is unclear, complain to me or TK, yes we should make it more visible) M Pacer and I also gave a talk at Pycon https://www.youtube.com/watch?v=2DkfPzWWC2Q, slides https://carreau.github.io/pycon2017/#/ and Pybay https://www.youtube.com/watch?v=3i6n1RwqQCo, slides http://carreau.github.io/talks/2017-08-13-pybay/docs/index.html#/ Raw data for the graphs https://github.com/Carreau/talks/blob/master/2017-08-13-pybay/IPython-dls_2.ipynb -- M > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From mmanu.chaturvedi at kitware.com Mon Nov 13 13:53:32 2017 From: mmanu.chaturvedi at kitware.com (Mmanu Chaturvedi) Date: Mon, 13 Nov 2017 13:53:32 -0500 Subject: [Numpy-discussion] PyArray_GETITEM and PyArray_SETITEM Message-ID: Hello All, I need to make use of the limited numpy API access Pybind11 gives, in order to add a feature to it. It seems to give access to functions from numpy_api.py [1]. I need to use PyArray_GETITEM and PyArray_SETITEM in order to get and set array elements [2], these functions / macros are not exposed via numpy_api.py, but are in `numpy/ndarraytypes.h`. We were wondering why aren't PyArray_GETITEM and PyArray_SETITEM exposed like the rest of numpy API? Is it possible to replicate the behavior using the members exposed in numpy_api.py ? Any help would be appreciated. Mmanu [1] https://github.com/numpy/numpy/blob/1368cbb696ae27b849eed67b4fd31c 550a55dad5/numpy/core/code_generators/numpy_api.py [2] https://github.com/pybind/pybind11/pull/1152/files#diff- 52f1945d779be1e60903590907bb9326R241 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Mon Nov 13 13:58:09 2017 From: tcaswell at gmail.com (Thomas Caswell) Date: Mon, 13 Nov 2017 18:58:09 +0000 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: I am in very supportive of this plan. For Matplotlib the intention is to do a mpl2.2LTS early 2018 and a mpl3.0 (no major API breaks other than dropping py2 support) summer 2018 with the same meaning of LTS. I also had thought about bumping the minimum numpy version of Matplotlib to the first py3 only version when it is out. There is no technical reason, but it seems nicely symmetric. In general we all need to get better about dropping support for old versions of dependencies (I am throwing stones from inside my glass house). The prolonged support of py2 has warped our idea of how long old versions of things need to be supported and it imposes real costs up and down the stack. Tom On Mon, Nov 13, 2017 at 1:26 PM Matthias Bussonnier < bussonniermatthias at gmail.com> wrote: > > If a wheel is not available for the client platform, pip will try to > install the latest version of the source distribution (.tar.gz or .zip) > which I think is the cause of the problem here. > > Unless the sdist is tagged with require_python and users have > recent-enough pip. Which is what was referred to earlier as > "Automagically". > This behavior is "new" (Nov/Dec 2016). The upstream patches were > written (in part) by the IPython/Jupyter team, for this exact purpose, > to not install an incompatible sdists. > (Works great I can share download graphs for IPython[0]) > > It _does_ require to have a version of pip which is not decades old > though, and may not work if you use a pypi proxy which is not pep 503 > compliant (which happens, we got bug report, users then complained to > IT who fixed it). > -- > M > > On Mon, Nov 13, 2017 at 10:14 AM, Olivier Grisel > wrote: > > If a wheel is not available for the client platform, pip will try to > install > > the latest version of the source distribution (.tar.gz or .zip) which I > > think is the cause of the problem here. > > > > -- > > Olivier > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Nov 13 15:01:43 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 13 Nov 2017 21:01:43 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: <20171113200143.GB4178815@phare.normalesup.org> On Mon, Nov 13, 2017 at 10:26:31AM -0800, Matthias Bussonnier wrote: > This behavior is "new" (Nov/Dec 2016). [snip] > It _does_ require to have a version of pip which is not decades old Just to check that I am not misunderstanding: the version of pip should not be more than a year old; "decades old" is just French hyperbola? Do I understand right? Ga?l From njs at pobox.com Mon Nov 13 16:33:43 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Nov 2017 13:33:43 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: <20171113200143.GB4178815@phare.normalesup.org> References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <20171113200143.GB4178815@phare.normalesup.org> Message-ID: On Nov 13, 2017 12:03, "Gael Varoquaux" wrote: On Mon, Nov 13, 2017 at 10:26:31AM -0800, Matthias Bussonnier wrote: > This behavior is "new" (Nov/Dec 2016). [snip] > It _does_ require to have a version of pip which is not decades old Just to check that I am not misunderstanding: the version of pip should not be more than a year old; "decades old" is just French hyperbola? Do I understand right? Right, the requirement is pip 9, which is currently one year old and will be >2 years old by the time this matters for numpy. It does turn out that there's a bimodal distribution in the wild, where people tend to either use an up to date pip, or else use some truly ancient pip that some Linux LTS distro shipped 5 years ago. Numpy isn't the only project that will be forcing people to upgrade, though, so I think this will work itself out. Especially since in the broken case what happens is that users end up running our setup.py on an unsupported version of python, so we'll be able to detect that and print some loud and informative message. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Mon Nov 13 16:55:22 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 13 Nov 2017 13:55:22 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <20171113200143.GB4178815@phare.normalesup.org> Message-ID: > Just to check that I am not misunderstanding: the version of pip should > not be more than a year old; "decades old" is just French hyperbola? Do I > understand right? Yes, sorry if you can't hear my french accent in writing, I can hear yours :-) There is also a "softer" requirement on setuptools which needs to be recent enough to 1) understand requires_python on machine that will _create_ the sdist/wheel. or 2) accept requires_python as a kwarg (even if does nothing), for linux system that will install from sdist. But by end of 2018 that will be a 3 or 4 years old setuptools. > Right, the requirement is pip 9, which is currently one year old and will be >2 years old by the time this matters for numpy. > It does turn out that there's a bimodal distribution in the wild, where people tend to either use an up to date pip, or else use some truly ancient pip that some Linux LTS distro shipped 5 years ago. Numpy isn't the only project that > will be forcing people to upgrade, though, so I think this will work itself out. Especially since in the broken case what happens is that users end up running our setup.py on an unsupported version of python, so we'll be able to > detect that and print some loud and informative message. Correct, we did that for IPython, got a large spike of sdist-download from Py2+old_pip when we released a Py3 only, the spike disappeared after a few days. We still had a handful of bug reports from people thinking the "You must upgrade pip" message was not relevant, and we realised people pinned ipython with IPython==5.0.0 instead of IPython<6. So the "Loud informative message" should also like tell user how to pin numpy if they can't upgrade pip. -- Matthias On Mon, Nov 13, 2017 at 1:33 PM, Nathaniel Smith wrote: > On Nov 13, 2017 12:03, "Gael Varoquaux" > wrote: > > On Mon, Nov 13, 2017 at 10:26:31AM -0800, Matthias Bussonnier wrote: >> This behavior is "new" (Nov/Dec 2016). [snip] >> It _does_ require to have a version of pip which is not decades old > > Just to check that I am not misunderstanding: the version of pip should > not be more than a year old; "decades old" is just French hyperbola? Do I > understand right? > > > Right, the requirement is pip 9, which is currently one year old and will be >>2 years old by the time this matters for numpy. > > It does turn out that there's a bimodal distribution in the wild, where > people tend to either use an up to date pip, or else use some truly ancient > pip that some Linux LTS distro shipped 5 years ago. Numpy isn't the only > project that will be forcing people to upgrade, though, so I think this will > work itself out. Especially since in the broken case what happens is that > users end up running our setup.py on an unsupported version of python, so > we'll be able to detect that and print some loud and informative message. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Mon Nov 13 19:48:14 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Nov 2017 16:48:14 -0800 Subject: [Numpy-discussion] numpy grant update In-Reply-To: References: Message-ID: On Thu, Oct 26, 2017 at 12:40 PM, Nathaniel Smith wrote: > On Wed, Oct 18, 2017 at 10:24 PM, Nathaniel Smith wrote: >> I'll also be giving a lunch talk at BIDS tomorrow to let folks locally >> know about what's going on, which I think will be recorded ? I'll send >> around a link after in case others are interested. > > Here's that link: https://www.youtube.com/watch?v=fowHwlpGb34 Still no update on that job ad (though we're learning interesting things about Berkeley's HR system!), but we did make a little scratch repo to start brainstorming. This is mostly for getting our own thoughts in order, but if anyone's curious then here it is: https://github.com/njsmith/numpy-grant-planning/ -n -- Nathaniel J. Smith -- https://vorpus.org From ralf.gommers at gmail.com Tue Nov 14 01:57:56 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 14 Nov 2017 19:57:56 +1300 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> Message-ID: On Tue, Nov 14, 2017 at 7:58 AM, Thomas Caswell wrote: > I am in very supportive of this plan. > > For Matplotlib the intention is to do a mpl2.2LTS early 2018 and a mpl3.0 > (no major API breaks other than dropping py2 support) summer 2018 with the > same meaning of LTS. > > I also had thought about bumping the minimum numpy version of Matplotlib > to the first py3 only version when it is out. There is no technical > reason, but it seems nicely symmetric. > > In general we all need to get better about dropping support for old > versions of dependencies (I am throwing stones from inside my glass > house). The prolonged support of py2 has warped our idea of how long old > versions of things need to be supported and it imposes real costs up and > down the stack. > My $2c: dropping support for all-but-the-latest numpy is not a great idea. There's no need to support numpy versions that are >3 years old, but supporting 2-4 versions back is something most projects have consistently done, and it has real value. Both in terms of not forcing users to upgrade multiple packages in lock-step, and for things like debugging (is it a new numpy or an mpl bug? --> check if the failure disappears with older numpy). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Nov 14 21:19:33 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Nov 2017 18:19:33 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: Apparently this is actually uncontroversial, the discussion's died down (see also the comments on Chuck's PR [1]), and anyone who wanted to object has had more than a week to do so, so... I guess we can say this is what's happening and start publicizing it to our users! A direct link to the rendered NEP in the repo is: https://github.com/numpy/numpy/blob/master/doc/neps/dropping-python2.7-proposal.rst (I guess that at some point it will also show up on docs.scipy.org.) -n [1] https://github.com/numpy/numpy/pull/10006 On Thu, Nov 9, 2017 at 5:52 PM, Nathaniel Smith wrote: > Fortunately we can wait until we're a bit closer before we have to > make any final decision on the version numbering :-) > > Right now though it would be good to start communicating to > users/downstreams about whatever our plans our though, so they can > make plans. Here's a first attempt at some text we can put in the > documentation and point people to -- any thoughts, on either the plan > or the wording? > > ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- > > The Python core team plans to stop supporting Python 2 in 2020. The > NumPy project has supported both Python 2 and Python 3 in parallel > since 2010, and has found that supporting Python 2 is an increasing > burden on our limited resources; thus, we plan to eventually drop > Python 2 support as well. Now that we're entering the final years of > community-supported Python 2, the NumPy project wants to clarify our > plans, with the goal of to helping our downstream ecosystem make plans > and accomplish the transition with as little disruption as possible. > > Our current plan is as follows: > > Until **December 31, 2018**, all NumPy releases will fully support > both Python 2 and Python 3. > > Starting on **January 1, 2019**, any new feature releases will support > only Python 3. > > The last Python-2-supporting release will be designated as a long-term > support (LTS) release, meaning that we will continue to merge > bug-fixes and make bug-fix releases for a longer period than usual. > Specifically, it will be supported by the community until **December > 31, 2019**. > > On **January 1, 2020** we will raise a toast to Python 2, and > community support for the last Python-2-supporting release will come > to an end. However, it will continue to be available on PyPI > indefinitely, and if any commercial vendors wish to extend the LTS > support past this point then we are open to letting them use the LTS > branch in the official NumPy repository to coordinate that. > > If you are a NumPy user who requires ongoing Python 2 support in 2020 > or later, then please contact your vendor. If you are a vendor who > wishes to continue to support NumPy on Python 2 in 2020+, please get > in touch; ideally we'd like you to get involved in maintaining the LTS > before it actually hits end-of-life, so we can make a clean handoff. > > To minimize disruption, running 'pip install numpy' on Python 2 will > continue to give the last working release in perpetuity; but after > January 1, 2019 it may not contain the latest features, and after > January 1, 2020 it may not contain the latest bug fixes. > > For more information on the scientific Python ecosystem's transition > to Python-3-only, see: http://www.python3statement.org/ > > For more information on porting your code to run on Python 3, see: > https://docs.python.org/3/howto/pyporting.html > > ---- > > Thoughts? > > -n > > On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk > wrote: >> In astropy we had a similar discussion about version numbers, and >> decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the >> first that does not. If we're discussing jumping a major number, we >> could do the same for numpy. (Admittedly, it made a bit more sense >> with the numbering scheme astropy had adopted anyway.) -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Nathaniel J. Smith -- https://vorpus.org -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Tue Nov 14 22:37:59 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Nov 2017 19:37:59 -0800 Subject: [Numpy-discussion] Upcoming revision of the BLAS standard Message-ID: Hi NumPy and SciPy developers, Apparently there is some work afoot to update the BLAS standard, with a working document here: https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit This seems like something where we might want to get involved in, so that the new standard works for us, and James Demmel (the first author on that proposal and a professor here at Berkeley) suggested they'd be interested to hear our thoughts. I'm not sure exactly what the process is here -- apparently there have been some workshops, and there was going to be a BoF today at Supercomputing, but I don't know what the schedule is or how they'll be making decisions. It's possible for anyone interested to click on that google doc above and make "suggestions", but it seems like maybe it would be useful for the NumPy/SciPy teams to come up with some sort of shared document on what we want? I'm really, really not the biggest linear algebra expert on these lists, so I'm hoping those with more experience will jump in, but to get started here are some initial ideas for things we might want to ask for: - Support for arbitrary strided memory layout - Replacing xerbla with proper error codes (already in that proposal) - There's some discussion about NaN handling where I think we might have opinions. (Am I remember right that currently we have to check for NaNs ourselves all the time because there are libraries that blow up if we don't, and we don't know which ones those are?) - Where the spec ends up giving implementors flexibility, some way to detect at compile time what options they chose. -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Wed Nov 15 10:25:31 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 15 Nov 2017 08:25:31 -0700 Subject: [Numpy-discussion] [SciPy-Dev] Upcoming revision of the BLAS standard In-Reply-To: References: Message-ID: On Tue, Nov 14, 2017 at 8:37 PM, Nathaniel Smith wrote: > Hi NumPy and SciPy developers, > > Apparently there is some work afoot to update the BLAS standard, with > a working document here: > > https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdB > DvtD5I14QHp9OE/edit > > This seems like something where we might want to get involved in, so > that the new standard works for us, and James Demmel (the first author > on that proposal and a professor here at Berkeley) suggested they'd be > interested to hear our thoughts. > > I'm not sure exactly what the process is here -- apparently there have > been some workshops, and there was going to be a BoF today at > Supercomputing, but I don't know what the schedule is or how they'll > be making decisions. It's possible for anyone interested to click on > that google doc above and make "suggestions", but it seems like maybe > it would be useful for the NumPy/SciPy teams to come up with some sort > of shared document on what we want? > > I'm really, really not the biggest linear algebra expert on these > lists, so I'm hoping those with more experience will jump in, but to > get started here are some initial ideas for things we might want to > ask for: > > - Support for arbitrary strided memory layout > - Replacing xerbla with proper error codes (already in that proposal) > - There's some discussion about NaN handling where I think we might > have opinions. (Am I remember right that currently we have to check > for NaNs ourselves all the time because there are libraries that blow > up if we don't, and we don't know which ones those are?) > - Where the spec ends up giving implementors flexibility, some way to > detect at compile time what options they chose. > Somewhat unrelated, but it would be nice to have 64 bit integers. That is already possible with compiler flags, but it would help if there was an easy way to tell what the compiled library was using. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Fri Nov 17 07:35:49 2017 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Nov 2017 12:35:49 +0000 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: Since Konrad Hinsen no longer follows the NumPy discussion list for lack of time, he has not posted here - but he has commented about this on Twitter and written up a good blog post: http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem/ In a field where scientific code is expected to last and be developed on a timescale of decades, the change of pace with Python 2 and 3 is harder to handle. Regards, Peter On Wed, Nov 15, 2017 at 2:19 AM, Nathaniel Smith wrote: > Apparently this is actually uncontroversial, the discussion's died > down (see also the comments on Chuck's PR [1]), and anyone who wanted > to object has had more than a week to do so, so... I guess we can say > this is what's happening and start publicizing it to our users! > > A direct link to the rendered NEP in the repo is: > https://github.com/numpy/numpy/blob/master/doc/neps/dropping-python2.7-proposal.rst > > (I guess that at some point it will also show up on docs.scipy.org.) > > -n > > [1] https://github.com/numpy/numpy/pull/10006 > > On Thu, Nov 9, 2017 at 5:52 PM, Nathaniel Smith wrote: >> Fortunately we can wait until we're a bit closer before we have to >> make any final decision on the version numbering :-) >> >> Right now though it would be good to start communicating to >> users/downstreams about whatever our plans our though, so they can >> make plans. Here's a first attempt at some text we can put in the >> documentation and point people to -- any thoughts, on either the plan >> or the wording? >> >> ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- >> >> The Python core team plans to stop supporting Python 2 in 2020. The >> NumPy project has supported both Python 2 and Python 3 in parallel >> since 2010, and has found that supporting Python 2 is an increasing >> burden on our limited resources; thus, we plan to eventually drop >> Python 2 support as well. Now that we're entering the final years of >> community-supported Python 2, the NumPy project wants to clarify our >> plans, with the goal of to helping our downstream ecosystem make plans >> and accomplish the transition with as little disruption as possible. >> >> Our current plan is as follows: >> >> Until **December 31, 2018**, all NumPy releases will fully support >> both Python 2 and Python 3. >> >> Starting on **January 1, 2019**, any new feature releases will support >> only Python 3. >> >> The last Python-2-supporting release will be designated as a long-term >> support (LTS) release, meaning that we will continue to merge >> bug-fixes and make bug-fix releases for a longer period than usual. >> Specifically, it will be supported by the community until **December >> 31, 2019**. >> >> On **January 1, 2020** we will raise a toast to Python 2, and >> community support for the last Python-2-supporting release will come >> to an end. However, it will continue to be available on PyPI >> indefinitely, and if any commercial vendors wish to extend the LTS >> support past this point then we are open to letting them use the LTS >> branch in the official NumPy repository to coordinate that. >> >> If you are a NumPy user who requires ongoing Python 2 support in 2020 >> or later, then please contact your vendor. If you are a vendor who >> wishes to continue to support NumPy on Python 2 in 2020+, please get >> in touch; ideally we'd like you to get involved in maintaining the LTS >> before it actually hits end-of-life, so we can make a clean handoff. >> >> To minimize disruption, running 'pip install numpy' on Python 2 will >> continue to give the last working release in perpetuity; but after >> January 1, 2019 it may not contain the latest features, and after >> January 1, 2020 it may not contain the latest bug fixes. >> >> For more information on the scientific Python ecosystem's transition >> to Python-3-only, see: http://www.python3statement.org/ >> >> For more information on porting your code to run on Python 3, see: >> https://docs.python.org/3/howto/pyporting.html >> >> ---- >> >> Thoughts? >> >> -n >> >> On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk >> wrote: >>> In astropy we had a similar discussion about version numbers, and >>> decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the >>> first that does not. If we're discussing jumping a major number, we >>> could do the same for numpy. (Admittedly, it made a bit more sense >>> with the numbering scheme astropy had adopted anyway.) -- Marten >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> >> -- >> Nathaniel J. Smith -- https://vorpus.org > > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From ilhanpolat at gmail.com Fri Nov 17 08:33:31 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Fri, 17 Nov 2017 14:33:31 +0100 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: I've actually engaged with him on Twitter too but just to repeat one part here : scarce academic resources to maintain code is not an argument. Out of all places, it is academia that should have come up with or should have contributed greatly to open-source instead of paper-writing frenzy among each other. As many people already have written many blog posts/tweets etc. academia does not value software as scientific products but demand software continuously. As an ex-academician I can safely ignore that argument.Scientific code is expected to be maintained properly. I understand the sentiment but blocking progress because of legacy code is a burden on the posterity and a luxury for the past. On Fri, Nov 17, 2017 at 1:35 PM, Peter Cock wrote: > Since Konrad Hinsen no longer follows the NumPy discussion list > for lack of time, he has not posted here - but he has commented > about this on Twitter and written up a good blog post: > > http://blog.khinsen.net/posts/2017/11/16/a-plea-for- > stability-in-the-scipy-ecosystem/ > > In a field where scientific code is expected to last and be developed > on a timescale of decades, the change of pace with Python 2 and 3 > is harder to handle. > > Regards, > > Peter > > On Wed, Nov 15, 2017 at 2:19 AM, Nathaniel Smith wrote: > > Apparently this is actually uncontroversial, the discussion's died > > down (see also the comments on Chuck's PR [1]), and anyone who wanted > > to object has had more than a week to do so, so... I guess we can say > > this is what's happening and start publicizing it to our users! > > > > A direct link to the rendered NEP in the repo is: > > https://github.com/numpy/numpy/blob/master/doc/neps/ > dropping-python2.7-proposal.rst > > > > (I guess that at some point it will also show up on docs.scipy.org.) > > > > -n > > > > [1] https://github.com/numpy/numpy/pull/10006 > > > > On Thu, Nov 9, 2017 at 5:52 PM, Nathaniel Smith wrote: > >> Fortunately we can wait until we're a bit closer before we have to > >> make any final decision on the version numbering :-) > >> > >> Right now though it would be good to start communicating to > >> users/downstreams about whatever our plans our though, so they can > >> make plans. Here's a first attempt at some text we can put in the > >> documentation and point people to -- any thoughts, on either the plan > >> or the wording? > >> > >> ---- DRAFT TEXT - NOT FINAL - DO NOT POST THIS TO HACKERNEWS OK? OK ---- > >> > >> The Python core team plans to stop supporting Python 2 in 2020. The > >> NumPy project has supported both Python 2 and Python 3 in parallel > >> since 2010, and has found that supporting Python 2 is an increasing > >> burden on our limited resources; thus, we plan to eventually drop > >> Python 2 support as well. Now that we're entering the final years of > >> community-supported Python 2, the NumPy project wants to clarify our > >> plans, with the goal of to helping our downstream ecosystem make plans > >> and accomplish the transition with as little disruption as possible. > >> > >> Our current plan is as follows: > >> > >> Until **December 31, 2018**, all NumPy releases will fully support > >> both Python 2 and Python 3. > >> > >> Starting on **January 1, 2019**, any new feature releases will support > >> only Python 3. > >> > >> The last Python-2-supporting release will be designated as a long-term > >> support (LTS) release, meaning that we will continue to merge > >> bug-fixes and make bug-fix releases for a longer period than usual. > >> Specifically, it will be supported by the community until **December > >> 31, 2019**. > >> > >> On **January 1, 2020** we will raise a toast to Python 2, and > >> community support for the last Python-2-supporting release will come > >> to an end. However, it will continue to be available on PyPI > >> indefinitely, and if any commercial vendors wish to extend the LTS > >> support past this point then we are open to letting them use the LTS > >> branch in the official NumPy repository to coordinate that. > >> > >> If you are a NumPy user who requires ongoing Python 2 support in 2020 > >> or later, then please contact your vendor. If you are a vendor who > >> wishes to continue to support NumPy on Python 2 in 2020+, please get > >> in touch; ideally we'd like you to get involved in maintaining the LTS > >> before it actually hits end-of-life, so we can make a clean handoff. > >> > >> To minimize disruption, running 'pip install numpy' on Python 2 will > >> continue to give the last working release in perpetuity; but after > >> January 1, 2019 it may not contain the latest features, and after > >> January 1, 2020 it may not contain the latest bug fixes. > >> > >> For more information on the scientific Python ecosystem's transition > >> to Python-3-only, see: http://www.python3statement.org/ > >> > >> For more information on porting your code to run on Python 3, see: > >> https://docs.python.org/3/howto/pyporting.html > >> > >> ---- > >> > >> Thoughts? > >> > >> -n > >> > >> On Thu, Nov 9, 2017 at 12:53 PM, Marten van Kerkwijk > >> wrote: > >>> In astropy we had a similar discussion about version numbers, and > >>> decided to make 2.0 the LTS that still supports python 2.7 and 3.0 the > >>> first that does not. If we're discussing jumping a major number, we > >>> could do the same for numpy. (Admittedly, it made a bit more sense > >>> with the numbering scheme astropy had adopted anyway.) -- Marten > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at python.org > >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> -- > >> Nathaniel J. Smith -- https://vorpus.org > > > > > > > > -- > > Nathaniel J. Smith -- https://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Nov 17 10:35:18 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Nov 2017 08:35:18 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Fri, Nov 17, 2017 at 5:35 AM, Peter Cock wrote: > Since Konrad Hinsen no longer follows the NumPy discussion list > for lack of time, he has not posted here - but he has commented > about this on Twitter and written up a good blog post: > > http://blog.khinsen.net/posts/2017/11/16/a-plea-for- > stability-in-the-scipy-ecosystem/ > > In a field where scientific code is expected to last and be developed > on a timescale of decades, the change of pace with Python 2 and 3 > is harder to handle. > > Regards, > > Peter > > Konrad has been making that argument for a long time, and I don't know what the long term solution is. However, the use of Fortran as a benchmark of stability is a bit misleading. Fortran was undergoing rapid development in the years before Fortran 77, with Fortran II, DEC Fortran, and Rational Fortran being somewhat incompatible variations on the theme, and it was only with the standardization of the language that stability could be assumed. And if you wrote in a language that didn't survive the winnowing, Algol-68 for instance, you were in trouble. But even apart from from the languages, the hardware was changing, with different floating point formats on different hardware, so that prior to IEEE-754 the results of computations carried out on one machine were not always the same as results on another. Such differences still persist, with dependencies on math library and compiler versions, choice of rounding, and hardware, although with much reduced in effect. The C language is another example, the lack of a C99 standard, maintained compiler on Windows for Python 2.7 being one of the considerations in dropping support for Python 2. And let us not overlook C++, which, IMHO, has only reached its Fortran 77 equivalent with C++11. I think the take away here is that we are still in the very early days of scientific computing with Python, it has really only been coming on strong for maybe five years. Those of us, including Konrad, who were early adopters scouting the terrain, were bound to end up with a few arrows in our back. Early adoption is always a gamble, with a tradeoff between the risk of choosing a language that mutates or dies, versus the payoff of using a language that blossoms and makes life easier. In my mind, Python 3.5 is the rough equivalent of Fortran 77, or maybe Fortran 95, and I don't know when the Python scientific stack will truly settle, but I expect it will be sometime in the next 5-10 years. At that point, we may want to look at having a "reference" version of NumPy, but I think it is still too early to make such a guarantee, although we do try to avoid being too disruptive while still making progress. These considerations are probably cold comfort to folks like Konrad who have extensive code bases, some probably dating back to Numeric, that they need to maintain, but I do think things will get better. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Fri Nov 17 16:11:39 2017 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 17 Nov 2017 16:11:39 -0500 Subject: [Numpy-discussion] PyArray_GETITEM and PyArray_SETITEM In-Reply-To: References: Message-ID: <0d5407ff-d085-a5ae-65e9-ef3060fdd15e@gmail.com> On 11/13/2017 01:53 PM, Mmanu Chaturvedi wrote: > Hello All, > > I need to make use of the limited numpy API access Pybind11 gives, in order > to add a feature to it. It seems to give access to functions from > numpy_api.py [1]. I need to use PyArray_GETITEM and PyArray_SETITEM in > order to get and set array elements [2], these functions / macros are not > exposed via numpy_api.py, but are in `numpy/ndarraytypes.h`. > > We were wondering why aren't PyArray_GETITEM and PyArray_SETITEM exposed > like the rest of numpy API? Is it possible to replicate the behavior using > the members exposed in numpy_api.py ? Any help would be appreciated. > > Mmanu It looks like that was the plan. There are comments there saying they would become part of the API in "numpy 2.0" (which hasn't happened yet). In the meantime, maybe you can use PySequence_SetItem? I expect that there is only very minimal overhead in using that vs PyArray_SETITEM. Allan From chris.barker at noaa.gov Fri Nov 17 16:12:43 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 17 Nov 2017 13:12:43 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: On Fri, Nov 17, 2017 at 4:35 AM, Peter Cock wrote: > Since Konrad Hinsen no longer follows the NumPy discussion list > for lack of time, he has not posted here - but he has commented > about this on Twitter and written up a good blog post: > > http://blog.khinsen.net/posts/2017/11/16/a-plea-for- > stability-in-the-scipy-ecosystem/ > > In a field where scientific code is expected to last and be developed > on a timescale of decades, the change of pace with Python 2 and 3 > is harder to handle. > sure -- but I do not get what the problem is here! from his post: """ The disappearance of Python 2 will leave much scientific software orphaned, and many published results irreproducible. """ This is an issue we should all be concerned about, and, in fact, the scipy community has been particularly active in the reproducibility realm. BUT: that statement makes NO SENSE. dropping Python2 support in numpy (or any other package) means that newer versions of numpy will not run on py2 -- but if you want to reproduce results, you need to run the code WITH THE VERSION THAT WAS USED IN THE FIRST PLACE. So if someone publishes something based on code written in python2.7 and numpy 1.13, then it is not helpful for reproducibility at all for numpy 1.18 (or 2.*, or whatever we call it) to run on python2. So there is no issue here. Potential issues will arise post 2020, when maybe python2.7 (and numpy 1.13) will no longer run on an up to date OS. But the OS vendors do a pretty good job of backward compatibility -- so we've got quite a few years to go on that. And it will also be important that older versions of packages are available -- but as long as we don't delete the archives, that should be the case for a good long while. So not sure what the problem is here. note relevant for reproducibility,but I have always been puzzled that folks often desperately want to run the very latest numpy on an old Python (2.6, 1.5, ....) if you can update your numy, update your darn Python too! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Nov 17 16:44:29 2017 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 17 Nov 2017 21:44:29 +0000 Subject: [Numpy-discussion] PyArray_GETITEM and PyArray_SETITEM In-Reply-To: <0d5407ff-d085-a5ae-65e9-ef3060fdd15e@gmail.com> References: <0d5407ff-d085-a5ae-65e9-ef3060fdd15e@gmail.com> Message-ID: It?s worth noting that PyArray_GETITEM is the equivalent of arr[....].item(), not arr[...]. If you want the behavior of the latter, use PyArray_Scalar instead. Similarly, PyArray_SETITEM is only guaranteed to be equivalent to arr[...] = x when isinstance(x, np.generic) is false. I don?t think these belong in public API yet, because they don?t expose the interface that most people might expect. Their names are based solely on the names of descr->f->getitem. Eric ? On Fri, 17 Nov 2017 at 13:12 Allan Haldane wrote: > On 11/13/2017 01:53 PM, Mmanu Chaturvedi wrote: > > Hello All, > > > > I need to make use of the limited numpy API access Pybind11 gives, in > order > > to add a feature to it. It seems to give access to functions from > > numpy_api.py [1]. I need to use PyArray_GETITEM and PyArray_SETITEM in > > order to get and set array elements [2], these functions / macros are > not > > exposed via numpy_api.py, but are in `numpy/ndarraytypes.h`. > > > > We were wondering why aren't PyArray_GETITEM and PyArray_SETITEM exposed > > like the rest of numpy API? Is it possible to replicate the behavior > using > > the members exposed in numpy_api.py ? Any help would be appreciated. > > > > Mmanu > > It looks like that was the plan. There are comments there saying they > would become part of the API in "numpy 2.0" (which hasn't happened yet). > > In the meantime, maybe you can use PySequence_SetItem? I expect that > there is only very minimal overhead in using that vs PyArray_SETITEM. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Nov 17 17:43:02 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 17 Nov 2017 14:43:02 -0800 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: <2ee331c1-b41d-83fb-f47b-500d4736b06a@googlemail.com> <1510219438.3486161.1166721632.4F745ACF@webmail.messagingengine.com> <04518D5F-DCF0-41E4-AD85-D5B2BFBB8E17@anaconda.com> Message-ID: <1510958582.456543.1176394760.02D87D4A@webmail.messagingengine.com> On Fri, Nov 17, 2017, at 13:12, Chris Barker wrote: > On Fri, Nov 17, 2017 at 4:35 AM, Peter Cock > wrote:>> Since Konrad Hinsen no longer follows the NumPy discussion list >> for lack of time, he has not posted here - but he has commented >> about this on Twitter and written up a good blog post: >> >> http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem/ I don't agree with the general gist of Konrad's post. There are multiple viewpoints on the issue, of course, such as that of developers that are already invested in NumPy or SciPy's APIs, those that will rely on it in the future, and those that are still undecided about whether to use these tools. For those heavily invested such as Konrad, API changes and a language upgrade may seem like a particularly bad situation. Heck, none of us enjoyed having to port all of our code to Python 3, but in reality the changes required were much fewer than commonly imagined and are documented. But in the same way you cause some pain by changing APIs, *not* changing APIs carries a penalty too, more for the other groups I mentioned. The ability to change APIs, albeit slowly, allows cleaner and more intuitive future code, fewer surprises, and makes the environment much more enjoyable to use. We can do a better job of advertising NumPy's deprecation policy. A quick Google search for "x deprecation policy" didn't manage to find it, but did pick up: - http://scikit-learn.org/stable/developers/contributing.html#deprecation - http://scikit-image.org/docs/dev/contribute.html#deprecation-cycle - https://docs.scipy.org/doc/scipy-1.0.0/reference/dev/deprecations.html All the above packages, as well as NumPy, include a section on API changes in their release notes. We may benefit from standardizing deprecation conventions across the community, so that there is a very clear expectation on how often to run your code to be able to see all relevant warnings and fix them. Best regards St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Nov 25 03:14:56 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 25 Nov 2017 08:14:56 +0000 Subject: [Numpy-discussion] Type annotations for NumPy Message-ID: There's been growing interest in supporting PEP-484 style type annotations in NumPy: https://github.com/numpy/numpy/issues/7370 This would allow NumPy users to add type-annotations to their code that uses NumPy, which they could check with mypy, pycharm or pytype. For example: def f(x: np.ndarray) -> np.ndarray: """Identity function on a NumPy array.""" return x Eventually, we could include data types and potentially array shapes as part of the type. This gets quite a bit more complicated, and to do in a really satisfying way would require new features in Python's typing system. To help guide discussion, I wrote a doc describing use-cases and needs for typing array shapes in more detail: https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY Nathaniel Smith and I recently met with group in San Francisco interested in this topic, including several mypy/typeshed developers (Jelle Zijlstra and Ethan Smith). We discussed and came up with a plan for moving forward: 1. Release basic type stubs for numpy.ndarray without dtypes or shapes, as separate "numpy_stubs" package on PyPI per PEP 561. This will let us iterate rapidly on (experimental) type annotations without coupling to NumPy's release cycle. 2. Add support for dtypes in ndarray type-annotations. This might be as simple as writing np.ndarray[np.float64], but will need a decision about appropriate syntax for shape typing to ensure that this is forwards compatible with typing shapes. Note: this will likely require minor changes to NumPy itself, e.g., to add __class_getitem__ per PEP 560. 3. Add support for shapes in ndarray type-annotations, and define a broader standard for typing array shapes. This will require collaboration with type-checker developers on the required typing features (for details, see my doc above). Eventually, this may entail writing a PEP. I'm writing to gauge support for this general plan, and specifically to get support for step 1. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sat Nov 25 10:21:33 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 25 Nov 2017 10:21:33 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: Hi Stephan, A question of perhaps broader scope than what you were asking for, and more out of curiosity than anything else, but can one mix type annotations with others? E.g., in astropy, we have a decorator that looks for units in the annotations (not dissimilar from dtype, I guess). Could one mix annotations or does one have to stick with one purpose? All the best, Marten From charlesr.harris at gmail.com Sat Nov 25 11:53:38 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 25 Nov 2017 09:53:38 -0700 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: On Sat, Nov 25, 2017 at 1:14 AM, Stephan Hoyer wrote: > There's been growing interest in supporting PEP-484 style type annotations > in NumPy: https://github.com/numpy/numpy/issues/7370 > > This would allow NumPy users to add type-annotations to their code that > uses NumPy, which they could check with mypy, pycharm or pytype. For > example: > > def f(x: np.ndarray) -> np.ndarray: > """Identity function on a NumPy array.""" > return x > > Eventually, we could include data types and potentially array shapes as > part of the type. This gets quite a bit more complicated, and to do in a > really satisfying way would require new features in Python's typing system. > To help guide discussion, I wrote a doc describing use-cases and needs for > typing array shapes in more detail: https://docs.google.com/document/d/ > 1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY > > Nathaniel Smith and I recently met with group in San Francisco interested > in this topic, including several mypy/typeshed developers (Jelle Zijlstra > and Ethan Smith). We discussed and came up with a plan for moving forward: > 1. Release basic type stubs for numpy.ndarray without dtypes or shapes, as > separate "numpy_stubs" package on PyPI per PEP 561. This will let us > iterate rapidly on (experimental) type annotations without coupling to > NumPy's release cycle. > 2. Add support for dtypes in ndarray type-annotations. This might be as > simple as writing np.ndarray[np.float64], but will need a decision about > appropriate syntax for shape typing to ensure that this is forwards > compatible with typing shapes. Note: this will likely require minor changes > to NumPy itself, e.g., to add __class_getitem__ per PEP 560. > 3. Add support for shapes in ndarray type-annotations, and define a > broader standard for typing array shapes. This will require collaboration > with type-checker developers on the required typing features (for details, > see my doc above). Eventually, this may entail writing a PEP. > > Can you make a case for the usefulness numpy annotations? What benefits to you want to achieve and how will annotation aid in getting there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni.soma at gmail.com Sat Nov 25 18:09:18 2017 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sun, 26 Nov 2017 10:09:18 +1100 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> This is a complete outsider?s perspective but (a) it would be good if NumPy type annotations could include an ?array_like? type that allows lists, tuples, etc. (b) I?ve always thought (since PEP561) that it would be cool for type annotations to replace compiler type annotations for e.g. Cython and Numba. Is this in the realm of possibility for the future? Juan. On 26 Nov 2017, 3:54 AM +1100, Charles R Harris , wrote: > > > > On Sat, Nov 25, 2017 at 1:14 AM, Stephan Hoyer wrote: > > > There's been growing interest in supporting PEP-484 style type annotations in NumPy: https://github.com/numpy/numpy/issues/7370 > > > > > > This would allow NumPy users to add type-annotations to their code that uses NumPy, which they could check with mypy, pycharm or pytype. For example: > > > > > > def f(x: np.ndarray) -> np.ndarray: > > > ? ? """Identity function on a NumPy array.""" > > > ? ? return x > > > > > > Eventually, we could include data types and potentially array shapes as part of the type. This gets quite a bit more complicated, and to do in a really satisfying way would require new features in Python's typing system. To help guide discussion, I wrote a doc describing use-cases and needs for typing array shapes in more detail: https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY > > > > > > Nathaniel Smith and I recently met with group in San Francisco interested in this topic, including several mypy/typeshed developers (Jelle Zijlstra and Ethan Smith). We discussed and came up with a plan for moving forward: > > > 1. Release basic type stubs for numpy.ndarray without dtypes or shapes, as separate "numpy_stubs" package on PyPI per PEP 561. This will let us iterate rapidly on (experimental) type annotations without coupling to NumPy's release cycle. > > > 2. Add support for dtypes in ndarray type-annotations. This might be as simple as writing np.ndarray[np.float64], but will need a decision about appropriate syntax for shape typing to ensure that this is forwards compatible with typing shapes. Note: this will likely require minor changes to NumPy itself, e.g., to add __class_getitem__ per PEP 560. > > > 3. Add support for shapes in ndarray type-annotations, and define a broader standard for typing array shapes. This will require collaboration with type-checker developers on the required typing features (for details, see my doc above). Eventually, this may entail writing a PEP. > > > > > > > Can you make a case for the usefulness numpy annotations? What benefits to you want to achieve and how will annotation aid in getting there. > > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Nov 25 18:12:16 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 25 Nov 2017 23:12:16 +0000 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: On Sat, Nov 25, 2017 at 7:21 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > A question of perhaps broader scope than what you were asking for, and > more out of curiosity than anything else, but can one mix type > annotations with others? E.g., in astropy, we have a decorator that > looks for units in the annotations (not dissimilar from dtype, I > guess). Could one mix annotations or does one have to stick with one > purpose? > Hi Marten, I took a look at Astropy's units decorator: http://docs.astropy.org/en/stable/api/astropy.units.quantity_input.html Annotations for return values that "coerce" units would be hard to make compatible with typing, because type annotations are used to check programs, not change runtime semantics. But in principle, I think you could even make a physical units library that relies entirely on static type checking for correctness, imposing almost no run-time overhead at all. There are several examples for Haskell: https://wiki.haskell.org/Physical_units I don't see any obvious way to support to mixing of annotations for typing and runtime effects in the same function, though doing so in the same program might be possible. My guess is that the preferred way to do this would be to use decorators for runtime changes to arguments, and keep annotations for typing. The Python community seems to be standardizing on using annotations for typing: https://www.python.org/dev/peps/pep-0563/#non-typing-usage-of-annotations Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Sat Nov 25 18:31:13 2017 From: mrocklin at gmail.com (Matthew Rocklin) Date: Sat, 25 Nov 2017 18:31:13 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: Can you make a case for the usefulness numpy annotations? What benefits to you want to achieve and how will annotation aid in getting there. 1. Error checking on large codebases with systems like MyPy 2. Hinting and error checking at code-writing time with systems like Jedi "Hey, this function expects a 2-d square array but you just passed in a 3d array with irregular sizes" 3. Supporting systems like the Cython compiler with type information, allowing them to speedup pure-python code without switching to the Cython language On Sat, Nov 25, 2017 at 6:12 PM, Stephan Hoyer wrote: > On Sat, Nov 25, 2017 at 7:21 AM Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> A question of perhaps broader scope than what you were asking for, and >> more out of curiosity than anything else, but can one mix type >> annotations with others? E.g., in astropy, we have a decorator that >> looks for units in the annotations (not dissimilar from dtype, I >> guess). Could one mix annotations or does one have to stick with one >> purpose? >> > > Hi Marten, > > I took a look at Astropy's units decorator: > http://docs.astropy.org/en/stable/api/astropy.units.quantity_input.html > > Annotations for return values that "coerce" units would be hard to make > compatible with typing, because type annotations are used to check > programs, not change runtime semantics. But in principle, I think you could > even make a physical units library that relies entirely on static type > checking for correctness, imposing almost no run-time overhead at all. > There are several examples for Haskell: > https://wiki.haskell.org/Physical_units > > I don't see any obvious way to support to mixing of annotations for typing > and runtime effects in the same function, though doing so in the same > program might be possible. My guess is that the preferred way to do this > would be to use decorators for runtime changes to arguments, and keep > annotations for typing. The Python community seems to be standardizing on > using annotations for typing: > https://www.python.org/dev/peps/pep-0563/#non-typing-usage-of-annotations > > Cheers, > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Sat Nov 25 18:33:49 2017 From: mrocklin at gmail.com (Matthew Rocklin) Date: Sat, 25 Nov 2017 18:33:49 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: Thoughts on basing this on a more generic Array type rather than the np.ndarray? I can imagine other nd-array libraries (XArray, Tensorflow, Dask.array) wanting to reuse this work. For dask.array in particular we would want to copy this entirely, but we probably can't specify that dask.arrays are np.ndarrays. It would be nice to ensure that the container type was swappable. On Sat, Nov 25, 2017 at 6:31 PM, Matthew Rocklin wrote: > Can you make a case for the usefulness numpy annotations? What benefits to > you want to achieve and how will annotation aid in getting there. > > > 1. Error checking on large codebases with systems like MyPy > 2. Hinting and error checking at code-writing time with systems like > Jedi "Hey, this function expects a 2-d square array but you just passed in > a 3d array with irregular sizes" > 3. Supporting systems like the Cython compiler with type information, > allowing them to speedup pure-python code without switching to the Cython > language > > > > On Sat, Nov 25, 2017 at 6:12 PM, Stephan Hoyer wrote: > >> On Sat, Nov 25, 2017 at 7:21 AM Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> A question of perhaps broader scope than what you were asking for, and >>> more out of curiosity than anything else, but can one mix type >>> annotations with others? E.g., in astropy, we have a decorator that >>> looks for units in the annotations (not dissimilar from dtype, I >>> guess). Could one mix annotations or does one have to stick with one >>> purpose? >>> >> >> Hi Marten, >> >> I took a look at Astropy's units decorator: >> http://docs.astropy.org/en/stable/api/astropy.units.quantity_input.html >> >> Annotations for return values that "coerce" units would be hard to make >> compatible with typing, because type annotations are used to check >> programs, not change runtime semantics. But in principle, I think you could >> even make a physical units library that relies entirely on static type >> checking for correctness, imposing almost no run-time overhead at all. >> There are several examples for Haskell: >> https://wiki.haskell.org/Physical_units >> >> I don't see any obvious way to support to mixing of annotations for >> typing and runtime effects in the same function, though doing so in the >> same program might be possible. My guess is that the preferred way to do >> this would be to use decorators for runtime changes to arguments, and keep >> annotations for typing. The Python community seems to be standardizing on >> using annotations for typing: >> https://www.python.org/dev/peps/pep-0563/#non-typing-usage-of-annotations >> >> Cheers, >> Stephan >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Nov 25 20:24:55 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 25 Nov 2017 17:24:55 -0800 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> References: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> Message-ID: On Sat, Nov 25, 2017 at 3:09 PM, Juan Nunez-Iglesias wrote: > This is a complete outsider?s perspective but > > (a) it would be good if NumPy type annotations could include an ?array_like? > type that allows lists, tuples, etc. I'm sure this will exist. > (b) I?ve always thought (since PEP561) that it would be cool for type > annotations to replace compiler type annotations for e.g. Cython and Numba. > Is this in the realm of possibility for the future? It turns out that the PEP 484 type system is *mostly* not useful for this. They're really designed for checking consistency across a large code-base, not for enabling compiler speedups. For example, if you annotate something as an int, that means "this object is a subclass of int". This is enough to let mypy catch your mistake if you accidentally pass in a float instead, but it's not enough to tell you anything at all about the object's behavior -- you could make a wacky int subclass that acts like a string or something. Probably there are some benefits that compilers can get from PEP 484 annotations, but you should think of them as largely an orthogonal thing. -n -- Nathaniel J. Smith -- https://vorpus.org From jni.soma at gmail.com Sat Nov 25 20:31:07 2017 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sun, 26 Nov 2017 12:31:07 +1100 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> Message-ID: <0c7b7112-6acb-484a-a391-adf79bd47db6@Spark> On 26 Nov 2017, 12:27 PM +1100, Nathaniel Smith , wrote: > It turns out that the PEP 484 type system is *mostly* not useful for > this. They're really designed for checking consistency across a large > code-base, not for enabling compiler speedups. For example, if you > annotate something as an int, that means "this object is a subclass of > int". This is enough to let mypy catch your mistake if you > accidentally pass in a float instead, but it's not enough to tell you > anything at all about the object's behavior -- you could make a wacky > int subclass that acts like a string or something. But doesn?t Cython do all kinds of type conversions and implied equivalences that could be applied here? e.g. I?m going to annotate this as int, which might mean whatever in mypy, but if I pass this .py file to a newfangled Cython 0.35 compiler, the compiler will understand this to mean ?actually really this is an int?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirillbalunov at gmail.com Sun Nov 26 06:00:53 2017 From: kirillbalunov at gmail.com (Kirill Balunov) Date: Sun, 26 Nov 2017 14:00:53 +0300 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: <0c7b7112-6acb-484a-a391-adf79bd47db6@Spark> References: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> <0c7b7112-6acb-484a-a391-adf79bd47db6@Spark> Message-ID: Hi! 2017-11-26 4:31 GMT+03:00 Juan Nunez-Iglesias : > > On 26 Nov 2017, 12:27 PM +1100, Nathaniel Smith , wrote: > > It turns out that the PEP 484 type system is *mostly* not useful for > this. They're really designed for checking consistency across a large > code-base, not for enabling compiler speedups. For example, if you > annotate something as an int, that means "this object is a subclass of > int". This is enough to let mypy catch your mistake if you > accidentally pass in a float instead, but it's not enough to tell you > anything at all about the object's behavior -- you could make a wacky > int subclass that acts like a string or something. > > I have subscribed to many lists, although I am not an active participant in them. Nevertheless this topic of using the type annotation in their projects was discussed several times on all Cython-like channels (and it becomes much more acute now days). "Misconceptions" arise both for ordinary users and developers, but I have never seen anyone to write clearly why the application of type annotation in Cython (and similar projects) is impossible or not reasonable. Maybe someone close to the topic has the time and energy to sum up and write a brief summary of how to perceive them and why they should be viewed "orthogonal"? Maybe I'm looking too superficially at this topic. But both Mypy and Cython perform type checking. From the Cython point of view I do not see any pitfalls, type checking and type conversions are what Cython is doing right now during compilation (and looks at types as strictly as necessary). >From Mypy's point of view, it's possible that it can delegate all this stuff, using a certain option, on a project's related type checker (which can be much stricter in its assumptions) With kind regards, -gdg -------------- next part -------------- An HTML attachment was scrubbed... URL: From jelle.zijlstra at gmail.com Sun Nov 26 10:04:16 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Sun, 26 Nov 2017 10:04:16 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> <0c7b7112-6acb-484a-a391-adf79bd47db6@Spark> Message-ID: 2017-11-26 6:00 GMT-05:00 Kirill Balunov : > Hi! > > 2017-11-26 4:31 GMT+03:00 Juan Nunez-Iglesias : > >> >> On 26 Nov 2017, 12:27 PM +1100, Nathaniel Smith , wrote: >> >> It turns out that the PEP 484 type system is *mostly* not useful for >> this. They're really designed for checking consistency across a large >> code-base, not for enabling compiler speedups. For example, if you >> annotate something as an int, that means "this object is a subclass of >> int". This is enough to let mypy catch your mistake if you >> accidentally pass in a float instead, but it's not enough to tell you >> anything at all about the object's behavior -- you could make a wacky >> int subclass that acts like a string or something. >> >> > I have subscribed to many lists, although I am not an active participant > in them. Nevertheless this topic of using the type annotation in their > projects was discussed several times on all Cython-like channels (and it > becomes much more acute now days). "Misconceptions" arise both for ordinary > users and developers, but I have never seen anyone to write clearly why the > application of type annotation in Cython (and similar projects) is > impossible or not reasonable. Maybe someone close to the topic has the time > and energy to sum up and write a brief summary of how to perceive them and > why they should be viewed "orthogonal"? > > Maybe I'm looking too superficially at this topic. But both Mypy and > Cython perform type checking. From the Cython point of view I do not see > any pitfalls, type checking and type conversions are what Cython is doing > right now during compilation (and looks at types as strictly as necessary). > From Mypy's point of view, it's possible that it can delegate all this > stuff, using a certain option, on a project's related type checker (which > can be much stricter in its assumptions) > The main (perceived) difficulty is that the type systems are different. If Cython has a list-typed argument, it wants exactly a list so it can use specialized code for lists, but to mypy it means "list or a subclass of list", which is not as easily optimized because the subclass may do things differently from the base class. Similarly, to Cython an int means a C int, and in Python it may mean an arbitrary-precision integer. However, Cython managed to overcome the problem and actually added support for type annotations recently; see https://github.com/cython/cython/issues/1672 and https://github.com/cython/cython/issues/1850. I haven't used the support myself and there are probably still details to be worked out, but in principle it should be possible to use both Cython and mypy on a codebase. > > With kind regards, -gdg > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Nov 26 13:58:55 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 26 Nov 2017 18:58:55 +0000 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: On Sat, Nov 25, 2017 at 3:34 PM Matthew Rocklin wrote: > Thoughts on basing this on a more generic Array type rather than the > np.ndarray? I can imagine other nd-array libraries (XArray, Tensorflow, > Dask.array) wanting to reuse this work. For dask.array in particular we > would want to copy this entirely, but we probably can't specify that > dask.arrays are np.ndarrays. It would be nice to ensure that the container > type was swappable. > Yes, absolutely. I do briefly mention this in my longer doc (see the "Syntax" section). This is also one of my personal goals for this project. This will be most relevant when we start working on typing support for array shapes and broadcasting: details like data types can be more library specific, and can probably be expressed with the existing generics system in the typing module. After we do some experimentation to figure out appropriate syntax and semantics for array shape typing, I would like to standardize the rules for typing multi-dimensional arrays in Python. This will probably entail writing a PEP, so we can add appropriate base classes in the typing module. I view this as the natural complement to existing standard library features that make it easier to interchange between multiple multi-dimensional array libraries, such as memory views and the buffer protocol. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Tue Nov 28 12:09:31 2017 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Tue, 28 Nov 2017 12:09:31 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: I'm strongly in support of this proposal. Type annotations have really helped me write more correct code. I started working on numpy type stubs a few months ago. I needed a mypy plugin to support shape-aware functions. Those whole thing is pretty tricky. Still very WIP, but I'll clean them up a little bit and opensource it shortly. -Robert On Sun, Nov 26, 2017 at 1:58 PM, Stephan Hoyer wrote: > On Sat, Nov 25, 2017 at 3:34 PM Matthew Rocklin > wrote: > >> Thoughts on basing this on a more generic Array type rather than the >> np.ndarray? I can imagine other nd-array libraries (XArray, Tensorflow, >> Dask.array) wanting to reuse this work. For dask.array in particular we >> would want to copy this entirely, but we probably can't specify that >> dask.arrays are np.ndarrays. It would be nice to ensure that the container >> type was swappable. >> > > Yes, absolutely. I do briefly mention this in my longer doc (see the > "Syntax" section). This is also one of my personal goals for this project. > > This will be most relevant when we start working on typing support for > array shapes and broadcasting: details like data types can be more library > specific, and can probably be expressed with the existing generics system > in the typing module. > > After we do some experimentation to figure out appropriate syntax and > semantics for array shape typing, I would like to standardize the rules for > typing multi-dimensional arrays in Python. This will probably entail > writing a PEP, so we can add appropriate base classes in the typing module. > I view this as the natural complement to existing standard library features > that make it easier to interchange between multiple multi-dimensional array > libraries, such as memory views and the buffer protocol. > >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Nov 28 14:04:07 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 28 Nov 2017 19:04:07 +0000 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: On Tue, Nov 28, 2017 at 5:11 PM Robert T. McGibbon wrote: > I'm strongly in support of this proposal. Type annotations have really > helped me write more correct code. > > I started working on numpy type stubs a few months ago. I needed a mypy > plugin to support shape-aware functions. Those whole thing is pretty > tricky. Still very WIP, but I'll clean them up a little bit and opensource > it shortly. > Great to hear -- I'd love to see what this looks like, or hear any lessons you learned from the experience! Actual experience using and writing such a type checker gives you a valuable perspective to share, as opposed to my speculation. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Tue Nov 28 17:35:55 2017 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Tue, 28 Nov 2017 17:35:55 -0500 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: Here's the code: https://github.com/rmcgibbo/numpy-mypy. It's not 100% working yet, but it can do simple stuff, like inferring the shape of arrays created from np.zeros(literal_tuple), and fixing out the shape of the result of an indexing operation (i.e. https://github.com/rmcgibbo/numpy-mypy/blob/master/tests/test_indexing.py). To implement it, I have the beginnings of the stubs that you'd expect, borrowed from https://github.com/machinalis/mypy-data and then revised. Then, on top of that, I wrote some special type-level functions that are implemented inside of a mypy plugin. So, for example, the stub's signature for np.sum is def sum(a: ndarray[_S, _D], axis: AxesType=None, dtype: DtypeType=None, out: ndarray=None, keepdims: bool=False) -> ndarray[_InferDtypeWithDefault[_S], _InferNdimsReduction[_D]]: ... When the stub is applied, the resut's dtype is determined application of the _InferDtypeWithDefault type function, which defaults, as expected, to the dtype of the input array but checks of that was overridden dtype=None kwarg as well. And the _InferNdimsReduction type function has to check the axis and keepdims arguments as well. It's by no means ready for real users, but I hope this is a useful place to build from. Any feedback or contributions would be appreciated. -Robert On Tue, Nov 28, 2017 at 2:04 PM, Stephan Hoyer wrote: > On Tue, Nov 28, 2017 at 5:11 PM Robert T. McGibbon > wrote: > >> I'm strongly in support of this proposal. Type annotations have really >> helped me write more correct code. >> >> I started working on numpy type stubs a few months ago. I needed a mypy >> plugin to support shape-aware functions. Those whole thing is pretty >> tricky. Still very WIP, but I'll clean them up a little bit and opensource >> it shortly. >> > > Great to hear -- I'd love to see what this looks like, or hear any lessons > you learned from the experience! > > Actual experience using and writing such a type checker gives you a > valuable perspective to share, as opposed to my speculation. > > Cheers, > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Nov 28 19:02:12 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 28 Nov 2017 16:02:12 -0800 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: References: Message-ID: <2546525129507284495@unknownmsgid> On Nov 25, 2017, at 3:35 PM, Matthew Rocklin wrote: Thoughts on basing this on a more generic Array type rather than the np.ndarray? This would actually be more consistent with the current python typing approach. I can imagine other nd-array libraries (XArray, Tensorflow, Dask.array) wanting to reuse this work. It may be tough to come up with the right ABC though? see another recent thread on this list. -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Nov 28 18:59:14 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 28 Nov 2017 15:59:14 -0800 Subject: [Numpy-discussion] Type annotations for NumPy In-Reply-To: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> References: <70c5a8e1-6b57-4019-be7a-65081fecf646@Spark> Message-ID: <-4087092425119403979@unknownmsgid> (a) it would be good if NumPy type annotations could include an ?array_like? type that allows lists, tuples, etc. I think that would be a sequence ? already supported by the Typing system. (b) I?ve always thought (since PEP561) that it would be cool for type annotations to replace compiler type annotations for e.g. Cython and Numba. Is this in the realm of possibility for the future? Well, this was brought up early in the Typing discussion, and it was made clear that these kinds of truly static types, as needed by Cython, was a non-goal of the project. That being said, perhaps it could be made to work with a bunch of additional type objects. And we should lol lol to Cython for ideas about how to type numpy arrays. One note: in addition to shape (rank) and types, there is contiguous and C or F order. That may want to be considered. -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhuoql at yahoo.com Wed Nov 29 09:56:28 2017 From: zhuoql at yahoo.com (ZHUO QL (KDr2)) Date: Wed, 29 Nov 2017 14:56:28 +0000 (UTC) Subject: [Numpy-discussion] Is there a way that indexing a matrix of data with a matrix of indices? References: <360382279.3966234.1511967388050.ref@mail.yahoo.com> Message-ID: <360382279.3966234.1511967388050@mail.yahoo.com> Hi, all suppose: - D, is the data matrix, its shape is? M x N- I, is the indices matrix, its shape is M x K,? K<=N Is there a efficient way to get a Matrix R with the same shape of I so that R[x,y] = D[x, I[x,y]] ? A nested for-loop or list-comprehension is too slow for me.?? Thanks. ---- ZHUO QL (KDr2) http://kdr2.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Nov 29 12:31:54 2017 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Nov 2017 18:31:54 +0100 Subject: [Numpy-discussion] Is there a way that indexing a matrix of data with a matrix of indices? In-Reply-To: <360382279.3966234.1511967388050@mail.yahoo.com> References: <360382279.3966234.1511967388050.ref@mail.yahoo.com> <360382279.3966234.1511967388050@mail.yahoo.com> Message-ID: <1511976714.11811.1.camel@sipsolutions.net> On Wed, 2017-11-29 at 14:56 +0000, ZHUO QL (KDr2) wrote: > Hi, all > > suppose: > > - D, is the data matrix, its shape is? M x N > - I, is the indices matrix, its shape is M x K,? K<=N > > Is there a efficient way to get a Matrix R with the same shape of I > so that R[x,y] = D[x, I[x,y]] ? > > A nested for-loop or list-comprehension is too slow for me.?? > Advanced indexing can do any odd thing you might want to do. I would not suggest to use the matrix class, but always use the array class in case you are doing that though. This should do the trick, I will refer the the documentation for how it works, except that it is basically: R[x,y] = D[I1[x, y], I2[x, y]] R = D[np.arange(I.shape[0])[:, np.newaxis], I] - Sebastian > Thanks. > > ---- > ZHUO QL (KDr2) http://kdr2.com > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: This is a digitally signed message part URL: From ehermes at chem.wisc.edu Wed Nov 29 12:25:48 2017 From: ehermes at chem.wisc.edu (Eric Hermes) Date: Wed, 29 Nov 2017 17:25:48 +0000 Subject: [Numpy-discussion] Is there a way that indexing a matrix of data with a matrix of indices? In-Reply-To: References: Message-ID: <1511976347.4041.9.camel@wisc.edu> On Wed, 2017-11-29 at 12:00 -0500, numpy-discussion-request at python.org wrote: > Date: Wed, 29 Nov 2017 14:56:28 +0000 (UTC) > From: "ZHUO QL (KDr2)" > To: Discussion of Numerical Python > Subject: [Numpy-discussion] Is there a way that indexing a matrix of > data with a matrix of indices? > Message-ID: <360382279.3966234.1511967388050 at mail.yahoo.com> > Content-Type: text/plain; charset="utf-8" > > Hi, all > suppose: > - D, is the data matrix, its shape is? M x N- I, is the indices > matrix, its shape is M x K,? K<=N > Is there a efficient way to get a Matrix R with the same shape of I > so that R[x,y] = D[x, I[x,y]] ? > A nested for-loop or list-comprehension is too slow for me.?? > Thanks. I don't know if this will be substantially faster, but you can try the following: I += np.array(range(M))[:, np.newaxis] * N R = D.ravel()[I.ravel()].reshape((M, K)) Eric > ---- > ZHUO QL (KDr2) http://kdr2.com > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: discussion/attachments/20171129/baeaddc0/attachment-0001.html> From zhuoql at yahoo.com Thu Nov 30 00:56:57 2017 From: zhuoql at yahoo.com (ZHUO QL (KDr2)) Date: Thu, 30 Nov 2017 05:56:57 +0000 (UTC) Subject: [Numpy-discussion] Is there a way that indexing a matrix of data with a matrix of indices? In-Reply-To: <1511976347.4041.9.camel@wisc.edu> References: <1511976347.4041.9.camel@wisc.edu> Message-ID: <1491064594.4481888.1512021417324@mail.yahoo.com> Thank you all, all of these methods work well? :) ---- ZHUO QL (KDr2) http://kdr2.com On Thursday, November 30, 2017, 2:26:16 AM GMT+8, Eric Hermes wrote: On Wed, 2017-11-29 at 12:00 -0500, numpy-discussion-request at python.org wrote: > Date: Wed, 29 Nov 2017 14:56:28 +0000 (UTC) > From: "ZHUO QL (KDr2)" > To: Discussion of Numerical Python > Subject: [Numpy-discussion] Is there a way that indexing a matrix of > ??? data with a matrix of indices? > Message-ID: <360382279.3966234.1511967388050 at mail.yahoo.com> > Content-Type: text/plain; charset="utf-8" > > Hi, all > suppose: > - D, is the data matrix, its shape is? M x N- I, is the indices > matrix, its shape is M x K,? K<=N > Is there a efficient way to get a Matrix R with the same shape of I > so that R[x,y] = D[x, I[x,y]] ? > A nested for-loop or list-comprehension is too slow for me.?? > Thanks. I don't know if this will be substantially faster, but you can try the following: I += np.array(range(M))[:, np.newaxis] * N R = D.ravel()[I.ravel()].reshape((M, K)) Eric > ---- > ZHUO QL (KDr2) http://kdr2.com > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: discussion/attachments/20171129/baeaddc0/attachment-0001.html> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 30 09:23:38 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 09:23:38 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? Message-ID: Hi All, I wondered if the move to python3-only starting with numpy 1.17 would be a good reason to act on what we all seem to agree: that the matrix class was a bad idea, with its overriding of multiplication and lack of support for stacks of matrices. For 1.17, minimum python supposedly is >=3.5, so we will be guaranteed to have the matrix multiply operator @ available, and hence there is arguably even less of a case for keeping the matrix class; removing it would allow taking out quite a bit of accumulated special-casing (the immediate reasons for writing this were gh-10123 and 10132). What do people think? If we do go in this direction, we might want to add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for python3; for python2 matrix would never disappear). All the best, Marten From toddrjen at gmail.com Thu Nov 30 11:23:44 2017 From: toddrjen at gmail.com (Todd) Date: Thu, 30 Nov 2017 11:23:44 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: On Nov 30, 2017 09:24, "Marten van Kerkwijk" wrote: Hi All, I wondered if the move to python3-only starting with numpy 1.17 would be a good reason to act on what we all seem to agree: that the matrix class was a bad idea, with its overriding of multiplication and lack of support for stacks of matrices. For 1.17, minimum python supposedly is >=3.5, so we will be guaranteed to have the matrix multiply operator @ available, and hence there is arguably even less of a case for keeping the matrix class; removing it would allow taking out quite a bit of accumulated special-casing (the immediate reasons for writing this were gh-10123 and 10132). What do people think? If we do go in this direction, we might want to add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for python3; for python2 matrix would never disappear). All the best, Marten I still think moving it out into its own package would be better, making it clear that anyone who cares about the class should step up because numpy developers will not do any additional work on it. Similar to how weave was handled with scipy. So simultaneous with the deprecation you release a package with the matrix class. Then people have until the deprecation period is over to port (which should just be a matter of changing the imports). -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.laumann at gmail.com Thu Nov 30 11:20:09 2017 From: chris.laumann at gmail.com (Chris Laumann) Date: Thu, 30 Nov 2017 11:20:09 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: <01799B6B-EF31-40FA-B843-261DDFCA99F0@gmail.com> +1 (not that my lurking vote should necessarily carry much weight). Rip it out asap. The existence of the matrix class has been literally the single biggest source of confusion and subtle bugs in my and my students' codes for years. Best, Chris > On Nov 30, 2017, at 9:23 AM, Marten van Kerkwijk wrote: > > Hi All, > > I wondered if the move to python3-only starting with numpy 1.17 would > be a good reason to act on what we all seem to agree: that the matrix > class was a bad idea, with its overriding of multiplication and lack > of support for stacks of matrices. For 1.17, minimum python supposedly > is >=3.5, so we will be guaranteed to have the matrix multiply > operator @ available, and hence there is arguably even less of a case > for keeping the matrix class; removing it would allow taking out quite > a bit of accumulated special-casing (the immediate reasons for writing > this were gh-10123 and 10132). > > What do people think? If we do go in this direction, we might want to > add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for > python3; for python2 matrix would never disappear). > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From bryanv at anaconda.com Thu Nov 30 11:33:41 2017 From: bryanv at anaconda.com (Bryan Van de ven) Date: Thu, 30 Nov 2017 10:33:41 -0600 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <01799B6B-EF31-40FA-B843-261DDFCA99F0@gmail.com> References: <01799B6B-EF31-40FA-B843-261DDFCA99F0@gmail.com> Message-ID: <97947CFD-4891-416A-829E-CE1D661CF2E8@anaconda.com> This is exactly what we did with the bokeh.charts deprecation. Moving to a separate projects was both a huge relief for the developers as well as a great way to focus and clarify expectations for users. Bryan > On Nov 30, 2017, at 10:20, Chris Laumann wrote: > > +1 (not that my lurking vote should necessarily carry much weight). Rip it out asap. > > The existence of the matrix class has been literally the single biggest source of confusion and subtle bugs in my and my students' codes for years. > > Best, Chris > >> On Nov 30, 2017, at 9:23 AM, Marten van Kerkwijk wrote: >> >> Hi All, >> >> I wondered if the move to python3-only starting with numpy 1.17 would >> be a good reason to act on what we all seem to agree: that the matrix >> class was a bad idea, with its overriding of multiplication and lack >> of support for stacks of matrices. For 1.17, minimum python supposedly >> is >=3.5, so we will be guaranteed to have the matrix multiply >> operator @ available, and hence there is arguably even less of a case >> for keeping the matrix class; removing it would allow taking out quite >> a bit of accumulated special-casing (the immediate reasons for writing >> this were gh-10123 and 10132). >> >> What do people think? If we do go in this direction, we might want to >> add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for >> python3; for python2 matrix would never disappear). >> >> All the best, >> >> Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From m.h.vankerkwijk at gmail.com Thu Nov 30 12:00:11 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 12:00:11 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <97947CFD-4891-416A-829E-CE1D661CF2E8@anaconda.com> References: <01799B6B-EF31-40FA-B843-261DDFCA99F0@gmail.com> <97947CFD-4891-416A-829E-CE1D661CF2E8@anaconda.com> Message-ID: Moving to a subpackage may indeed make more sense, though it might not help as much with getting rid of the hacks inside other parts of numpy to keep matrix working. In that respect it seems a bit different at least from weave. Then again, independently of whether we remove or release a separate package, it is probably best to start by moving all tests involving matrix to matrixlib/tests, so we can at least get a sense of what hacks are actually present. -- Marten From ilhanpolat at gmail.com Thu Nov 30 12:14:13 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Thu, 30 Nov 2017 18:14:13 +0100 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: <01799B6B-EF31-40FA-B843-261DDFCA99F0@gmail.com> <97947CFD-4891-416A-829E-CE1D661CF2E8@anaconda.com> Message-ID: This would be really good to remove the apparent confusion. Moreover, I think cleanly explaining why using "np.matrix" is not a good idea *before* announcing the news would encourage people to accept this decision along the way. That would greatly reduce the sporadic "the devs are deprecating stuff as they see fit without asking us" sentiment. On Thu, Nov 30, 2017 at 6:00 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Moving to a subpackage may indeed make more sense, though it might not > help as much with getting rid of the hacks inside other parts of numpy > to keep matrix working. In that respect it seems a bit different at > least from weave. > > Then again, independently of whether we remove or release a separate > package, it is probably best to start by moving all tests involving > matrix to matrixlib/tests, so we can at least get a sense of what > hacks are actually present. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 30 13:13:58 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 30 Nov 2017 13:13:58 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: On Thu, Nov 30, 2017 at 9:23 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi All, > > I wondered if the move to python3-only starting with numpy 1.17 would > be a good reason to act on what we all seem to agree: that the matrix > class was a bad idea, with its overriding of multiplication and lack > of support for stacks of matrices. I don't think the matrix class was a bad idea at the time. numpy was the underdog, I came from GAUSS and Matlab and numpy arrays were just weird, especially loosing a dimension all the time and the heavy required use of np.newaxis. I guess nowadays kids don't learn `matrix` languages first anymore. recarrays are another half-hearted feature in numpy that is mostly obsolete with pandas and pandas_like DataFrames in other packages. (I don't mind the changes, but the deprecation cycle is often short, especially for users like me that update numpy only about every 3 main versions.) Josef > For 1.17, minimum python supposedly > is >=3.5, so we will be guaranteed to have the matrix multiply > operator @ available, and hence there is arguably even less of a case > for keeping the matrix class; removing it would allow taking out quite > a bit of accumulated special-casing (the immediate reasons for writing > this were gh-10123 and 10132). > > What do people think? If we do go in this direction, we might want to > add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for > python3; for python2 matrix would never disappear). > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Thu Nov 30 13:17:40 2017 From: mrocklin at gmail.com (Matthew Rocklin) Date: Thu, 30 Nov 2017 13:17:40 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: How would the community handle the scipy.sparse matrix subclasses? These are still in common use. Somewhat related: https://github.com/scipy/scipy/issues/8162 On Thu, Nov 30, 2017 at 1:13 PM, wrote: > > > On Thu, Nov 30, 2017 at 9:23 AM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> Hi All, >> >> I wondered if the move to python3-only starting with numpy 1.17 would >> be a good reason to act on what we all seem to agree: that the matrix >> class was a bad idea, with its overriding of multiplication and lack >> of support for stacks of matrices. > > > I don't think the matrix class was a bad idea at the time. > > numpy was the underdog, I came from GAUSS and Matlab and numpy > arrays were just weird, especially loosing a dimension all the time > and the heavy required use of np.newaxis. > I guess nowadays kids don't learn `matrix` languages first anymore. > > recarrays are another half-hearted feature in numpy that is mostly > obsolete with pandas and pandas_like DataFrames in other > packages. > > > (I don't mind the changes, but the deprecation cycle is often short, > especially for users like me that update numpy only about every 3 main > versions.) > > Josef > > >> For 1.17, minimum python supposedly >> is >=3.5, so we will be guaranteed to have the matrix multiply >> operator @ available, and hence there is arguably even less of a case >> for keeping the matrix class; removing it would allow taking out quite >> a bit of accumulated special-casing (the immediate reasons for writing >> this were gh-10123 and 10132). >> >> What do people think? If we do go in this direction, we might want to >> add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for >> python3; for python2 matrix would never disappear). >> > All the best, >> >> Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Nov 30 13:43:38 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 1 Dec 2017 07:43:38 +1300 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: On Fri, Dec 1, 2017 at 7:17 AM, Matthew Rocklin wrote: > How would the community handle the scipy.sparse matrix subclasses? These > are still in common use. > They're not going anywhere for quite a while (until the sparse ndarrays materialize at least). Hence np.matrix needs to be moved, not deleted. We discussed this earlier this year: https://mail.python.org/pipermail/numpy-discussion/2017-January/076332.html > Somewhat related: https://github.com/scipy/scipy/issues/8162 > > On Thu, Nov 30, 2017 at 1:13 PM, wrote: > >> >> >> On Thu, Nov 30, 2017 at 9:23 AM, Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> Hi All, >>> >>> I wondered if the move to python3-only starting with numpy 1.17 would >>> be a good reason to act on what we all seem to agree: that the matrix >>> class was a bad idea, with its overriding of multiplication and lack >>> of support for stacks of matrices. >> >> I'd suggest any release in the next couple of years is fine,but the one where we drop Python 2 support is probably the worst choice. That's one of the few things the core Python devs got 100% right with the Python 3 move: advocate that in the 2->3 transition packages would not make any API changes in order to make porting the least painful. Ralf > >> I don't think the matrix class was a bad idea at the time. >> >> numpy was the underdog, I came from GAUSS and Matlab and numpy >> arrays were just weird, especially loosing a dimension all the time >> and the heavy required use of np.newaxis. >> I guess nowadays kids don't learn `matrix` languages first anymore. >> >> recarrays are another half-hearted feature in numpy that is mostly >> obsolete with pandas and pandas_like DataFrames in other >> packages. >> >> >> (I don't mind the changes, but the deprecation cycle is often short, >> especially for users like me that update numpy only about every 3 main >> versions.) >> >> Josef >> >> >>> For 1.17, minimum python supposedly >>> is >=3.5, so we will be guaranteed to have the matrix multiply >>> operator @ available, and hence there is arguably even less of a case >>> for keeping the matrix class; removing it would allow taking out quite >>> a bit of accumulated special-casing (the immediate reasons for writing >>> this were gh-10123 and 10132). >>> >>> What do people think? If we do go in this direction, we might want to >>> add PendingDeprecationWarning for 1.15 (maybe DeprecationWarning for >>> python3; for python2 matrix would never disappear). >>> >> All the best, >>> >>> Marten >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Nov 30 14:39:48 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 30 Nov 2017 12:39:48 -0700 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: On Thu, Nov 30, 2017 at 11:43 AM, Ralf Gommers wrote: > > > On Fri, Dec 1, 2017 at 7:17 AM, Matthew Rocklin > wrote: > >> How would the community handle the scipy.sparse matrix subclasses? These >> are still in common use. >> > > They're not going anywhere for quite a while (until the sparse ndarrays > materialize at least). Hence np.matrix needs to be moved, not deleted. We > discussed this earlier this year: https://mail.python.org/ > pipermail/numpy-discussion/2017-January/076332.html > > >> Somewhat related: https://github.com/scipy/scipy/issues/8162 >> >> On Thu, Nov 30, 2017 at 1:13 PM, wrote: >> >>> >>> >>> On Thu, Nov 30, 2017 at 9:23 AM, Marten van Kerkwijk < >>> m.h.vankerkwijk at gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> I wondered if the move to python3-only starting with numpy 1.17 would >>>> be a good reason to act on what we all seem to agree: that the matrix >>>> class was a bad idea, with its overriding of multiplication and lack >>>> of support for stacks of matrices. >>> >>> > I'd suggest any release in the next couple of years is fine,but the one > where we drop Python 2 support is probably the worst choice. That's one of > the few things the core Python devs got 100% right with the Python 3 move: > advocate that in the 2->3 transition packages would not make any API > changes in order to make porting the least painful. > > Ralf > Agree, we don't want to pile in too many changes at once. I think the big sticking point is the sparse matrices in SciPy, even issuing a DeprecationWarning could be problematic as long as there are sparse matrices. May I suggest that we put together an NEP for the NumPy side of things? Ralf, does SciPy have a mechanism for proposing such changes? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Nov 30 14:51:28 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 30 Nov 2017 11:51:28 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> On Thu, Nov 30, 2017, at 10:13, josef.pktd at gmail.com wrote: > recarrays are another half-hearted feature in numpy that is mostly > obsolete with pandas and pandas_like DataFrames in other > packages. I'm fully on board with factoring out np.matrix into a subpackage. But I would not touch structured arrays; they are quite useful, and sometimes perform surprisingly well compared to the other solutions around. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Nov 30 14:54:23 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 30 Nov 2017 11:54:23 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: <1512071663.1426893.1189825680.2E4FA80D@webmail.messagingengine.com> On Thu, Nov 30, 2017, at 11:39, Charles R Harris wrote: > Agree, we don't want to pile in too many changes at once. I think the big sticking point is the sparse matrices in SciPy, even issuing a DeprecationWarning could be problematic as long as there are sparse matrices. Could you explain what you mean by SciPy sparse matrices being a big sticking point? St?fan From m.h.vankerkwijk at gmail.com Thu Nov 30 14:58:13 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 14:58:13 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: Hi Ralf, Sorry not to have recalled the previous thread. Your point about not doing things in the python 2->3 move makes sense; handy for me is no reason to give users an incentive not to move to python3 because their matrix-dependent code breaks. It does sound like, given the use of sparse, a separate package - or perhaps (temporary) inclusion in scipy - would be the way to go. In turn, collecting as much of the matrix tests and work-arounds together in the `matrixlib` would be the right first step. And, even better, to collect thoughts in a NEP. Now if only I had not written this while procrastinating on other things... All the best, Marten From m.h.vankerkwijk at gmail.com Thu Nov 30 15:02:08 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 15:02:08 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> References: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> Message-ID: On Thu, Nov 30, 2017 at 2:51 PM, Stefan van der Walt wrote: > On Thu, Nov 30, 2017, at 10:13, josef.pktd at gmail.com wrote: > > recarrays are another half-hearted feature in numpy that is mostly > obsolete with pandas and pandas_like DataFrames in other > packages. > > > I'm fully on board with factoring out np.matrix into a subpackage. But I > would not touch structured arrays; they are quite useful, and sometimes > perform surprisingly well compared to the other solutions around. I think Josef specifically meant `recarrays`, which give access to elements of a structured array via attribute access. I'd tend to agree with him that those turned out not to be such a great idea. But (I think) nobody is arguing we should get rid of arrays with structured dtypes - I use them regularly myself too. -- Marten From stefanv at berkeley.edu Thu Nov 30 17:00:51 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 30 Nov 2017 14:00:51 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> Message-ID: <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> On Thu, Nov 30, 2017, at 12:02, Marten van Kerkwijk wrote: > I think Josef specifically meant `recarrays`, which give access to > elements of a structured array via attribute access. I'd tend to agree > with him that those turned out not to be such a great idea. But (I > think) nobody is arguing we should get rid of arrays with structured > dtypes - I use them regularly myself too. Ah, okay, that makes sense! Which reminds me: while these are quite useful, they're not always particularly pleasant to use. A good first improvement would be to allow columnar printing, and a few utility functions to give you some of the basic functionality of pandas (calculating descriptive statistics like mean, dropping NaN rows, some equivalent of groupby). All these are only a few lines of Python, but can be annoying to figure out. If this sounds appealing, I'd be willing to put together a small NEP. St?fan From efiring at hawaii.edu Thu Nov 30 17:08:11 2017 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 30 Nov 2017 12:08:11 -1000 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> References: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> Message-ID: On 2017/11/30 12:00 PM, Stefan van der Walt wrote: > I think Josef specifically meant `recarrays`, which give access to > elements of a structured array via attribute access. I'd tend to agree > with him that those turned out not to be such a great idea. But (I I have found recarrays to be useful, providing an alternative view that can be convenient. What is the problem with them? Eric From m.h.vankerkwijk at gmail.com Thu Nov 30 17:11:42 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 17:11:42 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> Message-ID: Unlike for matrix, it is not so much a problem as an unclear use case - the main thing they bring to structured dtype arrays is access by attribute, which is slower than just doing getting the field by its key. Anyway, I don't think anybody is suggesting to remove them - they're not a problem in the way matrix is, with its shape-mangling, etc. -- Marten From rainwoodman at gmail.com Thu Nov 30 18:01:36 2017 From: rainwoodman at gmail.com (Feng Yu) Date: Thu, 30 Nov 2017 15:01:36 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> References: <1512071488.946400.1189823584.6A563831@webmail.messagingengine.com> <1512079251.3817341.1189956848.73BD4471@webmail.messagingengine.com> Message-ID: An NEP on utility functions for structured array definitely sounds appealing to me. On Thu, Nov 30, 2017 at 2:00 PM, Stefan van der Walt wrote: > On Thu, Nov 30, 2017, at 12:02, Marten van Kerkwijk wrote: >> I think Josef specifically meant `recarrays`, which give access to >> elements of a structured array via attribute access. I'd tend to agree >> with him that those turned out not to be such a great idea. But (I >> think) nobody is arguing we should get rid of arrays with structured >> dtypes - I use them regularly myself too. > > Ah, okay, that makes sense! > > Which reminds me: while these are quite useful, they're not always > particularly pleasant to use. A good first improvement would be to > allow columnar printing, and a few utility functions to give you some of > the basic functionality of pandas (calculating descriptive statistics > like mean, dropping NaN rows, some equivalent of groupby). All these > are only a few lines of Python, but can be annoying to figure out. If > this sounds appealing, I'd be willing to put together a small NEP. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From njs at pobox.com Thu Nov 30 19:15:59 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 30 Nov 2017 16:15:59 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: On Thu, Nov 30, 2017 at 11:39 AM, Charles R Harris wrote: > > > On Thu, Nov 30, 2017 at 11:43 AM, Ralf Gommers > wrote: >> I'd suggest any release in the next couple of years is fine,but the one >> where we drop Python 2 support is probably the worst choice. That's one of >> the few things the core Python devs got 100% right with the Python 3 move: >> advocate that in the 2->3 transition packages would not make any API changes >> in order to make porting the least painful. > > > Agree, we don't want to pile in too many changes at once. I think the big > sticking point is the sparse matrices in SciPy, even issuing a > DeprecationWarning could be problematic as long as there are sparse > matrices. May I suggest that we put together an NEP for the NumPy side of > things? Ralf, does SciPy have a mechanism for proposing such changes? Agreed here as well... while I want to get rid of np.matrix as much as anyone, doing that anytime soon would be *really* disruptive. - There are tons of little scripts out there written by people who didn't know better; we do want them to learn not to use np.matrix but breaking all their scripts is a painful way to do that - There are major projects like scikit-learn that simply have no alternative to using np.matrix, because of scipy.sparse. So I think the way forward is something like: - Now or whenever someone gets together a PR: issue a PendingDeprecationWarning in np.matrix.__init__ (unless it kills performance for scikit-learn and friends), and put a big warning box at the top of the docs. The idea here is to not actually break anyone's code, but start to get out the message that we definitely don't think anyone should use this if they have any alternative. - After there's an alternative to scipy.sparse: ramp up the warnings, possibly all the way to FutureWarning so that existing scripts don't break but they do get noisy warnings - Eventually, if we think it will reduce maintenance costs: split it into a subpackage I expect that one way or another we'll be maintaining matrix for quite some time, and I agree with whoever said that most of the burden seems to be in keeping the rest of numpy working sensibly with it, so I don't think moving it into a subpackage is itself going to make a big different either way. To me the logic is more like, if/when we decide to actually break everyone's code by making `np.matrix` raise AttributeError, then we should probably provide some package they can import to get their code limping along again, and if we're going to do that anyway then probably we should split it out first and shake out any bugs before we make `np.matrix` start raising errors. But it's going to be quite some time until we reach the "break everyone's code" stage, given just how much code is out there using matrix, so there's no point in making detailed plans right now. -n -- Nathaniel J. Smith -- https://vorpus.org From stefanv at berkeley.edu Thu Nov 30 20:10:42 2017 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 30 Nov 2017 17:10:42 -0800 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: References: Message-ID: <1512090642.419196.1190126384.5FCB7BEE@webmail.messagingengine.com> On Thu, Nov 30, 2017, at 16:15, Nathaniel Smith wrote: > PendingDeprecationWarning in np.matrix.__init__ (unless it kills > performance for scikit-learn and friends), and put a big warning box > at the top of the docs. The idea here is to not actually break > anyone's code, but start to get out the message that we definitely > don't think anyone should use this if they have any alternative. > > - After there's an alternative to scipy.sparse: ramp up the warnings, > possibly all the way to FutureWarning so that existing scripts don't > break but they do get noisy warnings > > - Eventually, if we think it will reduce maintenance costs: split it > into a subpackage Can't we make `np.matrix` into a new package right now, and have NumPy depend on it internally? At that point, start warning users that they should also be using the external package, and eventually just remove the shim in NumPy. St?fan From m.h.vankerkwijk at gmail.com Thu Nov 30 21:02:36 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 30 Nov 2017 21:02:36 -0500 Subject: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17? In-Reply-To: <1512090642.419196.1190126384.5FCB7BEE@webmail.messagingengine.com> References: <1512090642.419196.1190126384.5FCB7BEE@webmail.messagingengine.com> Message-ID: Hi Nathaniel, Thanks for the concrete suggestion: see https://github.com/numpy/numpy/pull/10142 I think this is useful independent of exactly how the eventual move to a new package would work; next step might be to collect all matrix tests in the `libmatrix` sub-module. All the best, Marten