From m.h.vankerkwijk at gmail.com Wed Nov 1 18:50:30 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 1 Nov 2017 18:50:30 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: >From my experience with Quantity, routines that properly ducktype work well, those that feel the need to accept list and blatantly do `asarray` do not - even if in many cases they would have worked if they used `asanyarray`... But there are lots of nice surprises, with, e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom line, I think you should let this stop you from trying only if you know something important does not work. -- Marten From nathan12343 at gmail.com Wed Nov 1 18:55:22 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 1 Nov 2017 17:55:22 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: I think the biggest issues could be resolved if __array_concatenate__ were finished. Unfortunately I don't feel like I can take that on right now. See Ryan May's talk at scipy about using an ndarray subclass for units and the issues he's run into: https://www.youtube.com/watch?v=qCo9bkT9sow On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > From my experience with Quantity, routines that properly ducktype work > well, those that feel the need to accept list and blatantly do > `asarray` do not - even if in many cases they would have worked if > they used `asanyarray`... But there are lots of nice surprises, with, > e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom > line, I think you should let this stop you from trying only if you > know something important does not work. -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 08:46:01 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 08:46:01 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum wrote: > I think the biggest issues could be resolved if __array_concatenate__ were > finished. Unfortunately I don't feel like I can take that on right now. > > See Ryan May's talk at scipy about using an ndarray subclass for units and > the issues he's run into: > > https://www.youtube.com/watch?v=qCo9bkT9sow > Interesting talk, but I don't see how general library code should know what units the output has. for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation) What are units of covariance or correlation between two variables with the same units, and what are they between variables with different units? How do you concatenate and operate arrays with different units? interpolation or prediction would work with using the existing units. partially related: statsmodels uses a wrapper for pandas Series and DataFrames and tries to preserve the index when possible and make up a new DataFrame or Series if the existing index doesn't apply. E.g. predicted values and residuals are in terms of the original provided index, and could also get original units assigned. That would also be possible with prediction confidence intervals. But for the rest, see above. Josef > > > On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> From my experience with Quantity, routines that properly ducktype work >> well, those that feel the need to accept list and blatantly do >> `asarray` do not - even if in many cases they would have worked if >> they used `asanyarray`... But there are lots of nice surprises, with, >> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >> line, I think you should let this stop you from trying only if you >> know something important does not work. -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 08:56:26 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 08:56:26 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 8:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > > How do you concatenate and operate arrays with different units? > > interpolation or prediction would work with using the existing units. > > partially related: > statsmodels uses a wrapper for pandas Series and DataFrames and tries to > preserve the index when possible and make up a new DataFrame or Series if > the existing index doesn't apply. > E.g. predicted values and residuals are in terms of the original provided > index, and could also get original units assigned. That would also be > possible with prediction confidence intervals. But for the rest, see above. > using pint >>> x >>> x / x >>> x / (1 + x) Traceback (most recent call last): File "", line 1, in File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 669, in __add__ raise DimensionalityError(self._units, 'dimensionless') return self._add_sub(other, operator.add) File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line 580, in _add_sub pint.errors.DimensionalityError: Cannot convert from 'meter' to 'dimensionless' np.exp(x) raises pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) to 'dimensionless' (dimensionless) Josef > > Josef > > >> >> >> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> From my experience with Quantity, routines that properly ducktype work >>> well, those that feel the need to accept list and blatantly do >>> `asarray` do not - even if in many cases they would have worked if >>> they used `asanyarray`... But there are lots of nice surprises, with, >>> e.g., `np.fft.fftfreq` just working as one would hope. Anyway, bottom >>> line, I think you should let this stop you from trying only if you >>> know something important does not work. -- Marten >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 11:51:54 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 11:51:54 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Hi Josef, astropy's Quantity is well developed and would give similar results to pint; all those results make sense if one wants to have consistent units. A general library code will actually do the right thing as long as it just uses normal mathematical operations with ufuncs - and as long as it just duck types! - the unit code will then override and properly propagate units to outputs, as can be seen in this example: ``` import astropy.units as u np.fft.fftfreq(8, 1*u.min) # np.fft.fftfreq(8, 1*u.min).var() # ``` > for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation) The units module will force you to take into account `dt`! This is in fact one reason why it is so powerful. So, your example might go something like: ``` flow = [1., 1.5, 1.5] * u.g / u.s dt = [0.5, 0.5, 1.] * u.hr np.sum(flow * dt) # np.sum(flow * dt).to(u.kg) # ``` > How do you concatenate and operate arrays with different units? This is where Nathaniel's `__array_concatenate__` would come in. For regular arrays it is fine to just concatenate, but for almost anything else you need a different approach. For quantities, the most logical one would be to first create an empty array of the right size with the unit of, e.g., the first part to be concatenated, and then set sections to the input quantities (where the setter does unit conversion and will fail if that is not possible). All the best, Marten p.s. A fun subject is what to do with logarithmic units, such as the magnitudes in astronomy... We have a module for that as well; http://docs.astropy.org/en/latest/units/logarithmic_units.html From rmay31 at gmail.com Thu Nov 2 12:23:44 2017 From: rmay31 at gmail.com (Ryan May) Date: Thu, 2 Nov 2017 10:23:44 -0600 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 6:46 AM, wrote: > > > On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum > wrote: > >> I think the biggest issues could be resolved if __array_concatenate__ >> were finished. Unfortunately I don't feel like I can take that on right now. >> >> See Ryan May's talk at scipy about using an ndarray subclass for units >> and the issues he's run into: >> >> https://www.youtube.com/watch?v=qCo9bkT9sow >> > > > Interesting talk, but I don't see how general library code should know > what units the output has. > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > A general library doesn't have to do anything--just not do annoying things like isinstance() checks and calling np.asarray() everywhere. Honestly one of those is the core of most of the problems I run into. It's currently more complicated when doing things in compiled land, but that's implementation details, not any kind of fundamental incompatibility. For basic mathematical operations, units have perfectly well defined semantics that many of us encountered in an introductory physics or chemistry class: - Want to add or subtract two things? They need to have the same units; a units library can handle conversion provided they have the same dimensionality (e.g. length, time) - Multiplication/Divison: combine and cancel units ( m/s * s -> m) Everything else we do on a computer with data in some way boils down to: add, subtract, multiply, divide. Average keeps the same units -- it's just a sum and division by a unit-less constant Integration (in 1-D) involves *two* variables, your data as well as the time/space coordinates (or dx or dt); fundamentally it's a multiplication by dx and a summation. The units results then are e.g. data.units * dx.units. This works just like it does in Physics 101 where you integrate velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) What are units of covariance or correlation between two variables with the > same units, and what are they between variables with different units? > Well, covariance is subtracting the mean from each variable and multiplying the residuals; therefore the units for cov(x, y): (x.units - x.units) * (y.units - y.units) -> x.units * y.units Correlation takes covariance and divides by the product of the standard deviations, so that's: (x.units * y.units) / (x.units * y.units) -> dimensionless Which is what I'd expect for a correlation. > How do you concatenate and operate arrays with different units? > If all arrays have compatible dimensionality (say meters, inches, miles), you convert to one (say the first) and concatenate like normal. If they're not compatible, you error out. > interpolation or prediction would work with using the existing units. > I'm sure you wrote that thinking units didn't play a role, but the math behind those operations works perfectly fine with units, with things cancelling out properly to give the same units out as in. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 12:43:43 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 12:43:43 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 11:51 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Josef, > > astropy's Quantity is well developed and would give similar results to > pint; all those results make sense if one wants to have consistent > units. A general library code will actually do the right thing as long > as it just uses normal mathematical operations with ufuncs - and as > long as it just duck types! - the unit code will then override and > properly propagate units to outputs, as can be seen in this example: > ``` > import astropy.units as u > np.fft.fftfreq(8, 1*u.min) > # min> > np.fft.fftfreq(8, 1*u.min).var() > # > ``` > > > for example if units are some flows per unit of time and we average, sum > or integrate over time, then what are the new units? (e.g. pandas time > aggregation) > > The units module will force you to take into account `dt`! This is in > fact one reason why it is so powerful. So, your example might go > something like: > ``` > flow = [1., 1.5, 1.5] * u.g / u.s > dt = [0.5, 0.5, 1.] * u.hr > np.sum(flow * dt) > # > np.sum(flow * dt).to(u.kg) > # > ``` > > > How do you concatenate and operate arrays with different units? > > This is where Nathaniel's `__array_concatenate__` would come in. For > regular arrays it is fine to just concatenate, but for almost anything > else you need a different approach. For quantities, the most logical > one would be to first create an empty array of the right size with the > unit of, e.g., the first part to be concatenated, and then set > sections to the input quantities (where the setter does unit > conversion and will fail if that is not possible). > For example, "will fail if that is not possible" rules out inhomogeneous arrays (analogous to structure dtypes) How to you get a vander matrix for something simple like a polynomial fit? x[:, None] ** np.arange(3) > > All the best, > > Marten > > p.s. A fun subject is what to do with logarithmic units, such as the > magnitudes in astronomy... We have a module for that as well; > http://docs.astropy.org/en/latest/units/logarithmic_units.html similar, scipy.special has ufuncs what units are those? Most code that I know (i.e. scipy.stats and statsmodels) does not use only "normal mathematical operations with ufuncs" I guess there are a lot of "abnormal" mathematical operations where just simply propagating the units will not work. Aside: The problem is more general also for other datastructures. E.g. statsmodels for most parts uses only numpy ndarrays inside the algorithm and computations because that provides well defined behavior. (e.g. pandas behaved too differently in many cases) I don't have much idea yet about how to change the infrastructure to allow the use of dask arrays, sparse matrices and similar and possibly automatic differentiation. Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Thu Nov 2 12:46:41 2017 From: rmay31 at gmail.com (Ryan May) Date: Thu, 2 Nov 2017 10:46:41 -0600 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 6:56 AM, wrote: > On Thu, Nov 2, 2017 at 8:46 AM, wrote: > >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> What are units of covariance or correlation between two variables with >> the same units, and what are they between variables with different units? >> >> How do you concatenate and operate arrays with different units? >> >> interpolation or prediction would work with using the existing units. >> >> partially related: >> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >> preserve the index when possible and make up a new DataFrame or Series if >> the existing index doesn't apply. >> E.g. predicted values and residuals are in terms of the original provided >> index, and could also get original units assigned. That would also be >> possible with prediction confidence intervals. But for the rest, see above. >> > > using pint > > >>> x > > >>> x / x > > > >>> x / (1 + x) > Traceback (most recent call last): > File "", line 1, in > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 669, in __add__ > raise DimensionalityError(self._units, 'dimensionless') > return self._add_sub(other, operator.add) > File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", > line 580, in _add_sub > pint.errors.DimensionalityError: Cannot convert from 'meter' to > 'dimensionless' > I'm not sure why you have a problem with that results. You tried to take a number in meters and add a dimensionless value to that--that's not a defined operation. That's like saying: "I have a distance of 12 meters and added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, but I can do that operation. 1 second? That makes no sense. If you add units to the 1 then it's a defined operation: >>> reg = pint.UnitRegistry() >>> x / (1 * ureg.meters + x) > np.exp(x) > raises > pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) > to 'dimensionless' (dimensionless) > Well, the Taylor series for exp (around a=0) is: exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... so for that to properly add up, x needs to be dimensionless. It should be noted, though, that I've *never* seen a formula, theoretically derived or empirically fit, require directly taking exp(x) where x is a physical quantity with units. Instead, you have: f = a * exp(kx) Properly calculated values for a, k will have appropriate units attached to them that allows the calculation to proceed without error Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 12:52:29 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 12:52:29 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 12:23 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:46 AM, wrote: > >> >> >> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >> wrote: >> >>> I think the biggest issues could be resolved if __array_concatenate__ >>> were finished. Unfortunately I don't feel like I can take that on right now. >>> >>> See Ryan May's talk at scipy about using an ndarray subclass for units >>> and the issues he's run into: >>> >>> https://www.youtube.com/watch?v=qCo9bkT9sow >>> >> >> >> Interesting talk, but I don't see how general library code should know >> what units the output has. >> for example if units are some flows per unit of time and we average, sum >> or integrate over time, then what are the new units? (e.g. pandas time >> aggregation) >> > > A general library doesn't have to do anything--just not do annoying things > like isinstance() checks and calling np.asarray() everywhere. Honestly one > of those is the core of most of the problems I run into. It's currently > more complicated when doing things in compiled land, but that's > implementation details, not any kind of fundamental incompatibility. > > For basic mathematical operations, units have perfectly well defined > semantics that many of us encountered in an introductory physics or > chemistry class: > - Want to add or subtract two things? They need to have the same units; a > units library can handle conversion provided they have the same > dimensionality (e.g. length, time) > - Multiplication/Divison: combine and cancel units ( m/s * s -> m) > > Everything else we do on a computer with data in some way boils down to: > add, subtract, multiply, divide. > > Average keeps the same units -- it's just a sum and division by a > unit-less constant > Integration (in 1-D) involves *two* variables, your data as well as the > time/space coordinates (or dx or dt); fundamentally it's a multiplication > by dx and a summation. The units results then are e.g. data.units * > dx.units. This works just like it does in Physics 101 where you integrate > velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m) > > What are units of covariance or correlation between two variables with the >> same units, and what are they between variables with different units? >> > > Well, covariance is subtracting the mean from each variable and > multiplying the residuals; therefore the units for cov(x, y): > > (x.units - x.units) * (y.units - y.units) -> x.units * y.units > > Correlation takes covariance and divides by the product of the standard > deviations, so that's: > > (x.units * y.units) / (x.units * y.units) -> dimensionless > > Which is what I'd expect for a correlation. > > >> How do you concatenate and operate arrays with different units? >> > > If all arrays have compatible dimensionality (say meters, inches, miles), > you convert to one (say the first) and concatenate like normal. If they're > not compatible, you error out. > > >> interpolation or prediction would work with using the existing units. >> > > I'm sure you wrote that thinking units didn't play a role, but the math > behind those operations works perfectly fine with units, with things > cancelling out properly to give the same units out as in. > Some of it is in my reply to Marten. regression and polyfit requires an X matrix with different units and then some linear algebra like solve, pinv or svd. So, while the predicted values have well defined units, the computation involves some messier operations, unless you want to forgo linear algebra in all intermediate step and reduce it to sum, division and inverse. Josef > > Ryan > > -- > Ryan May > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 2 13:01:03 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 13:01:03 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 12:46 PM, Ryan May wrote: > On Thu, Nov 2, 2017 at 6:56 AM, wrote: > >> On Thu, Nov 2, 2017 at 8:46 AM, wrote: >> >>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum >>> wrote: >>> >>>> I think the biggest issues could be resolved if __array_concatenate__ >>>> were finished. Unfortunately I don't feel like I can take that on right now. >>>> >>>> See Ryan May's talk at scipy about using an ndarray subclass for units >>>> and the issues he's run into: >>>> >>>> https://www.youtube.com/watch?v=qCo9bkT9sow >>>> >>> >>> >>> Interesting talk, but I don't see how general library code should know >>> what units the output has. >>> for example if units are some flows per unit of time and we average, sum >>> or integrate over time, then what are the new units? (e.g. pandas time >>> aggregation) >>> What are units of covariance or correlation between two variables with >>> the same units, and what are they between variables with different units? >>> >>> How do you concatenate and operate arrays with different units? >>> >>> interpolation or prediction would work with using the existing units. >>> >>> partially related: >>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to >>> preserve the index when possible and make up a new DataFrame or Series if >>> the existing index doesn't apply. >>> E.g. predicted values and residuals are in terms of the original >>> provided index, and could also get original units assigned. That would also >>> be possible with prediction confidence intervals. But for the rest, see >>> above. >>> >> >> using pint >> >> >>> x >> >> >>> x / x >> >> >> >>> x / (1 + x) >> Traceback (most recent call last): >> File "", line 1, in >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 669, in __add__ >> raise DimensionalityError(self._units, 'dimensionless') >> return self._add_sub(other, operator.add) >> File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", >> line 580, in _add_sub >> pint.errors.DimensionalityError: Cannot convert from 'meter' to >> 'dimensionless' >> > > I'm not sure why you have a problem with that results. You tried to take a > number in meters and add a dimensionless value to that--that's not a > defined operation. That's like saying: "I have a distance of 12 meters and > added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert, > but I can do that operation. 1 second? That makes no sense. > > If you add units to the 1 then it's a defined operation: > > >>> reg = pint.UnitRegistry() > >>> x / (1 * ureg.meters + x) > 'dimensionless')> > > >> np.exp(x) >> raises >> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) >> to 'dimensionless' (dimensionless) >> > > Well, the Taylor series for exp (around a=0) is: > > exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ... > > so for that to properly add up, x needs to be dimensionless. It should be > noted, though, that I've *never* seen a formula, theoretically derived or > empirically fit, require directly taking exp(x) where x is a physical > quantity with units. Instead, you have: > > f = a * exp(kx) > > Properly calculated values for a, k will have appropriate units attached > to them that allows the calculation to proceed without error > I was thinking of a simple logit model to predict whether it rains tomorrow The Logit transformation for the probability is exp(k x) / (1 + exp(k x) where k is a parameter to search for in the optimization. x is a matrix with all predictors or explanatory variables which could all have different units. So it sounds to me if we drop asarray, then we just get exceptions or possibly strange results, or we have to introduce a unit that matches everything (like a joker card) for any constants that we are using. Josef > > Ryan > > -- > Ryan May > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 14:39:42 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 14:39:42 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Hi Josef, Indeed, for some applications one would like to have different units for different parts of an array. And that means that, at present, the quantity implementations that we have are no good at storing, say, a covariance matrix involving parameters with different units, where thus each element of the covariance matrix has a different unit. I fear at present it would have to be an object array instead; other cases may be a bit easier to solve, by, e.g., allowing structured arrays with similarly structured units. I do note that actually doing it would clarify, e.g., what the axes in Vandermonde (spelling?) matrices mean. That said, there is truly an enormous benefit for checking units on "regular" operations. Spacecraft have missed Mars because people didn't do it properly... All the best, Marten p.s. The scipy functions should indeed be included in the ufuncs covered; there is a fairly long-standing issue for that in astropy... From josef.pktd at gmail.com Thu Nov 2 15:33:18 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Nov 2017 15:33:18 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 2:39 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Josef, > > Indeed, for some applications one would like to have different units > for different parts of an array. And that means that, at present, the > quantity implementations that we have are no good at storing, say, a > covariance matrix involving parameters with different units, where > thus each element of the covariance matrix has a different unit. I > fear at present it would have to be an object array instead; other > cases may be a bit easier to solve, by, e.g., allowing structured > arrays with similarly structured units. I do note that actually doing > it would clarify, e.g., what the axes in Vandermonde (spelling?) > matrices mean. > (I have problems remembering the spelling of proper names) np.vander and various polyvander functions/methods One point I wanted to make is that the units are overhead and irrelevant in the computation. It's the outcome that might have units. Eg. polyfit could use various underlying polynomials, e.g. numpy.polynomial.chebyshev.chebvander(...) and various linear algebra and projection versions, and the output would still be the same units. aside: I just found an interesting http://docs.astropy.org/en/latest/api/astropy.stats.biweight.biweight_midcovariance.html is pairwise, but uses asanyarray e.g. using asarray (for robust scatter) https://github.com/statsmodels/statsmodels/pull/3230/files#diff-8fd46d3044db86ae7992f5d817eec6c7R473 I guess I would have problems replacing asarray by asanyarray. one last related one What's the inverse of a covariance matrix? It's just sum, multiplication and division (which I wouldn't remember), but for the computation is just np.linalg.inv or np.linalg.pinv which is a simple shortcut. Josef > > That said, there is truly an enormous benefit for checking units on > "regular" operations. Spacecraft have missed Mars because people > didn't do it properly... > https://twitter.com/search?q=2%20unit%20tests.%200%20integration%20tests > > All the best, > > Marten > > p.s. The scipy functions should indeed be included in the ufuncs > covered; there is a fairly long-standing issue for that in astropy... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 15:37:01 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 02 Nov 2017 19:37:01 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 9:45 AM wrote: > similar, scipy.special has ufuncs > what units are those? > > Most code that I know (i.e. scipy.stats and statsmodels) does not use only > "normal mathematical operations with ufuncs" > I guess there are a lot of "abnormal" mathematical operations > where just simply propagating the units will not work. > > Aside: The problem is more general also for other datastructures. > E.g. statsmodels for most parts uses only numpy ndarrays inside the > algorithm and computations because that provides well defined > behavior. (e.g. pandas behaved too differently in many cases) > I don't have much idea yet about how to change the infrastructure to > allow the use of dask arrays, sparse matrices and similar and possibly > automatic differentiation. > This is the exact same reason why pandas and xarray do not support wrapping arbitrary ndarray subclasses or duck array types. The operations we use internally (on numpy.ndarray objects) may not be what you would expect externally, and may even be implementation details not considered part of the public API. For example, in xarray we use numpy.nanmean() or bottleneck.nanmean() instead of numpy.mean(). For NumPy and xarray, I think we could (and should) define an interface to support subclasses and duck types for generic operations for core use-cases. My main concern with subclasses / duck-arrays is undefined/untested behavior, especially where we might silently give the wrong answer or trigger some undesired operation (e.g., loading a lazily computed into memory) rather than raising an informative error. Leaking implementation details is another concern: we have already had several cases in NumPy where a function only worked on a subclass if a particular method was called internally, and broke when that was changed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Nov 2 15:40:26 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 2 Nov 2017 14:40:26 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 9:45 AM wrote: > >> similar, scipy.special has ufuncs >> what units are those? >> >> Most code that I know (i.e. scipy.stats and statsmodels) does not use only >> "normal mathematical operations with ufuncs" >> I guess there are a lot of "abnormal" mathematical operations >> where just simply propagating the units will not work. >> > >> Aside: The problem is more general also for other datastructures. >> E.g. statsmodels for most parts uses only numpy ndarrays inside the >> algorithm and computations because that provides well defined >> behavior. (e.g. pandas behaved too differently in many cases) >> I don't have much idea yet about how to change the infrastructure to >> allow the use of dask arrays, sparse matrices and similar and possibly >> automatic differentiation. >> > > This is the exact same reason why pandas and xarray do not support > wrapping arbitrary ndarray subclasses or duck array types. The operations > we use internally (on numpy.ndarray objects) may not be what you would > expect externally, and may even be implementation details not considered > part of the public API. For example, in xarray we use numpy.nanmean() or > bottleneck.nanmean() instead of numpy.mean(). > > For NumPy and xarray, I think we could (and should) define an interface to > support subclasses and duck types for generic operations for core > use-cases. My main concern with subclasses / duck-arrays is > undefined/untested behavior, especially where we might silently give the > wrong answer or trigger some undesired operation (e.g., loading a lazily > computed into memory) rather than raising an informative error. Leaking > implementation details is another concern: we have already had several > cases in NumPy where a function only worked on a subclass if a particular > method was called internally, and broke when that was changed. > Would this issue be ameliorated given Nathaniel's proposal to try to move away from subclasses and towards storing data in dtypes? Or would that just mean that xarray would need to ban dtypes it doesn't know about? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.debuyl at kuleuven.be Thu Nov 2 15:38:24 2017 From: pierre.debuyl at kuleuven.be (Pierre de Buyl) Date: Thu, 2 Nov 2017 20:38:24 +0100 Subject: [Numpy-discussion] Python @ FOSDEM 2018 Message-ID: <20171102193824.GA24760@pi-x230> Dear SciPythonists and NumPythonists, FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. Every year, 6500+ of developers of free and open source software from all over the world gather at the event in Brussels. For FOSDEM 2018, we will try the new concept of a virtual Python-devroom: there is no dedicated Python room but instead, we promote the presence of Python in all devrooms. We hope to have at least one Python talk in every devroom (Yes, even in Perl, Ada, Go and Rust devrooms ;-) ). How can you help to highlight the Python community at Python-FOSDEM 2018? Propose your talk in the closest related devroom: https://fosdem.org/2018/news/2017-10-04-accepted-developer-rooms/ Not all devrooms are language-specific and a number of topics come to mind for data and science participants: "Monitoring & Cloud devroom" https://lists.fosdem.org/pipermail/fosdem/2017-October/002631.html "HPC, Big Data, and Data Science" https://lists.fosdem.org/pipermail/fosdem/2017-October/002615.html "LLVM toolchain" https://lists.fosdem.org/pipermail/fosdem/2017-October/002624.html Most call for contributions end around the 24 of november. Send a copy of your proposition to python-devroom AT lists.fosdem DOT org. We will publish a dedicated schedule for Python on https://python-fosdem.org/ and at our stand. A dinner will be also organized, stay tuned. We are waiting for your talks proposals. The Python-FOSDEM committee From harrigan.matthew at gmail.com Thu Nov 2 16:39:08 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Thu, 2 Nov 2017 16:39:08 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Numpy already does support a specific unit, datetime64 and timedelta64, think through that very mechanism. Its also probably the most complicated unit since at least there is no such thing as leap meters. And it works well and is very useful IMHO On Thu, Nov 2, 2017 at 3:40 PM, Nathan Goldbaum wrote: > > > On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer wrote: > >> On Thu, Nov 2, 2017 at 9:45 AM wrote: >> >>> similar, scipy.special has ufuncs >>> what units are those? >>> >>> Most code that I know (i.e. scipy.stats and statsmodels) does not use >>> only >>> "normal mathematical operations with ufuncs" >>> I guess there are a lot of "abnormal" mathematical operations >>> where just simply propagating the units will not work. >>> >> >>> Aside: The problem is more general also for other datastructures. >>> E.g. statsmodels for most parts uses only numpy ndarrays inside the >>> algorithm and computations because that provides well defined >>> behavior. (e.g. pandas behaved too differently in many cases) >>> I don't have much idea yet about how to change the infrastructure to >>> allow the use of dask arrays, sparse matrices and similar and possibly >>> automatic differentiation. >>> >> >> This is the exact same reason why pandas and xarray do not support >> wrapping arbitrary ndarray subclasses or duck array types. The operations >> we use internally (on numpy.ndarray objects) may not be what you would >> expect externally, and may even be implementation details not considered >> part of the public API. For example, in xarray we use numpy.nanmean() or >> bottleneck.nanmean() instead of numpy.mean(). >> >> For NumPy and xarray, I think we could (and should) define an interface >> to support subclasses and duck types for generic operations for core >> use-cases. My main concern with subclasses / duck-arrays is >> undefined/untested behavior, especially where we might silently give the >> wrong answer or trigger some undesired operation (e.g., loading a lazily >> computed into memory) rather than raising an informative error. Leaking >> implementation details is another concern: we have already had several >> cases in NumPy where a function only worked on a subclass if a particular >> method was called internally, and broke when that was changed. >> > > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 17:05:08 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 17:05:08 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: My 2? here is that all code should feel free to assume certain type of input, as long as it is documented properly, but there is no reason to enforce that by, e.g., putting `asarray` everywhere. Then, for some pieces ducktypes and subclasses will just work like magic, and uses you might never have foreseen become possible. For others, whoever wants to use them has to do work (and up to a package maintainers to decide whether or not to accept PRs that implement hooks, etc.) I do see the argument that this way one becomes constrained in the internal implementation, as a change may break an outward-looking function, but while at times this may be inconvenient, in my experience at others it may just make one realize an even better implementation is possible. But then, I really like duck-typing... -- Marten From ben.v.root at gmail.com Thu Nov 2 17:09:33 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 2 Nov 2017 17:09:33 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Duck typing is great and all for classes that implement some or all of the ndarray interface.... but remember what the main reason for asarray() and asanyarray(): to automatically promote lists and tuples and other "array-likes" to ndarrays. Ignoring the use-case of lists of lists is problematic at best. Ben Root On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > My 2? here is that all code should feel free to assume certain type of > input, as long as it is documented properly, but there is no reason to > enforce that by, e.g., putting `asarray` everywhere. Then, for some > pieces ducktypes and subclasses will just work like magic, and uses > you might never have foreseen become possible. For others, whoever > wants to use them has to do work (and up to a package maintainers to > decide whether or not to accept PRs that implement hooks, etc.) > > I do see the argument that this way one becomes constrained in the > internal implementation, as a change may break an outward-looking > function, but while at times this may be inconvenient, in my > experience at others it may just make one realize an even better > implementation is possible. But then, I really like duck-typing... > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 17:37:21 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 17:37:21 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root wrote: > Duck typing is great and all for classes that implement some or all of the > ndarray interface.... but remember what the main reason for asarray() and > asanyarray(): to automatically promote lists and tuples and other > "array-likes" to ndarrays. Ignoring the use-case of lists of lists is > problematic at best. > > Ben Root > > > On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> My 2? here is that all code should feel free to assume certain type of >> input, as long as it is documented properly, but there is no reason to >> enforce that by, e.g., putting `asarray` everywhere. Then, for some >> pieces ducktypes and subclasses will just work like magic, and uses >> you might never have foreseen become possible. For others, whoever >> wants to use them has to do work (and up to a package maintainers to >> decide whether or not to accept PRs that implement hooks, etc.) >> >> I do see the argument that this way one becomes constrained in the >> internal implementation, as a change may break an outward-looking >> function, but while at times this may be inconvenient, in my >> experience at others it may just make one realize an even better >> implementation is possible. But then, I really like duck-typing... >> > One problem in general is that there is no protocol about what operations are implemented in a numpy ndarray equivalent way in those ducks, i.e. if they quack in a compatible way. One small example, pandas standard deviation, std, used by default ddof=1, and didn't have an option to override it instead of using ddof=0 that numpy uses. So even though we could call a std method of the ducks, the t-test results would be a bit different and visibly different in small samples depending on the type of the data. A possible alternative would be to compute std from scratch and forgo the available function or method. I tried once in the scipy.zscore function to be agnostic about the type and not use asarray, it's a simple operation but still it required special handling of numpy matrices because it preserves the dimension in reduce operations. After more than a few lines it is difficult to keep track of what type is no used. Another subclass that is often broken in default code are masked arrays because asarray throws away the mask. But asanyarray wouldn't work always either because the mask needs code for handling the masked values. For example scipy.stats ended up with separate functions for masked arrays. Josef > >> -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 18:21:06 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 02 Nov 2017 22:21:06 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum wrote: > Would this issue be ameliorated given Nathaniel's proposal to try to move > away from subclasses and towards storing data in dtypes? Or would that just > mean that xarray would need to ban dtypes it doesn't know about? > Yes, I think custom dtypes would definitely help. Custom dtypes have a well contained interface, so lots of operations (e.g., concatenate, reshaping, indexing) are guaranteed to work in a dtype independent way. If you try to do an unsupported operation for such a dtype (e.g., np.datetime64), you will generally get a good error message about an invalid dtype. In contrast, you can overload a subclass with totally arbitrary semantics (e.g., np.matrix) and of course for duck-types as well. This makes a big difference for libraries like dask or xarray, which need a standard interface to guarantee they do the right thing. I'm pretty sure we can wrap a custom dtype ndarray with units, but there's no way we're going to support np.matrix without significant work. It's hard to know which is which without well defined interfaces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Nov 2 18:33:16 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 2 Nov 2017 17:33:16 -0500 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer wrote: > On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum > wrote: > >> Would this issue be ameliorated given Nathaniel's proposal to try to move >> away from subclasses and towards storing data in dtypes? Or would that just >> mean that xarray would need to ban dtypes it doesn't know about? >> > > Yes, I think custom dtypes would definitely help. Custom dtypes have a > well contained interface, so lots of operations (e.g., concatenate, > reshaping, indexing) are guaranteed to work in a dtype independent way. If > you try to do an unsupported operation for such a dtype (e.g., > np.datetime64), you will generally get a good error message about an > invalid dtype. > > In contrast, you can overload a subclass with totally arbitrary semantics > (e.g., np.matrix) and of course for duck-types as well. > > This makes a big difference for libraries like dask or xarray, which need > a standard interface to guarantee they do the right thing. I'm pretty sure > we can wrap a custom dtype ndarray with units, but there's no way we're > going to support np.matrix without significant work. It's hard to know > which is which without well defined interfaces. > Ah, but what if the dtype modifies the interface? That might sound evil, but it's something that's been proposed. For example, if I wanted to replace yt's YTArray in a backward compatibile way with a dtype and just use plain ndarrays everywhere, the dtype would need to *at least* modify ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and several other things. Of course if I don't care about backward compatibility I can just do all of these operations on the dtype object itself. However, I suspect whatever implementation of custom dtypes gets added to numpy, it will have the property that it can act like an arbitrary ndarray subclass otherwise libraries like yt, Pint, metpy, and astropy won't be able to switch to it. -Nathan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu Nov 2 18:39:30 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 2 Nov 2017 18:39:30 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: I guess my argument boils down to it being better to state that a function only accepts arrays and happily let it break on, e.g., matrix, than use `asarray` to make a matrix into an array even though it really isn't. I do like the dtype ideas, but think I'd agree they're likely to come with their own problems. But just making new numerical types possible is interesting. -- Marten From shoyer at gmail.com Thu Nov 2 20:33:38 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 03 Nov 2017 00:33:38 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Maybe the best of both worlds would require explicit opt-in for classes that shouldn't be coerced, e.g., xarray.register_data_type(MyArray) or maybe better yet ;) xarray.void_my_nonexistent_warranty_its_my_fault_if_my_buggy_duck_array_breaks_everything(MyArray) On Thu, Nov 2, 2017 at 3:39 PM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > I guess my argument boils down to it being better to state that a > function only accepts arrays and happily let it break on, e.g., > matrix, than use `asarray` to make a matrix into an array even though > it really isn't. > > I do like the dtype ideas, but think I'd agree they're likely to come > with their own problems. But just making new numerical types possible > is interesting. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 2 20:35:36 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 03 Nov 2017 00:35:36 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: On Thu, Nov 2, 2017 at 3:35 PM Nathan Goldbaum wrote: > Ah, but what if the dtype modifies the interface? That might sound evil, > but it's something that's been proposed. For example, if I wanted to > replace yt's YTArray in a backward compatibile way with a dtype and just > use plain ndarrays everywhere, the dtype would need to *at least* modify > ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and > several other things. > I suppose we'll need to sort this out. But adding new methods/properties feels pretty safe to me, as long as existing ones are guaranteed to work in the same way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri Nov 3 10:30:13 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 3 Nov 2017 10:30:13 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: Yes, I like the idea of, effectively, creating an ABC for ndarray - with which one can register. -- Marten From charlesr.harris at gmail.com Fri Nov 3 22:56:38 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Nov 2017 20:56:38 -0600 Subject: [Numpy-discussion] NumPy 1.14 branch. Message-ID: Hi All, I'd like to branch NumPy 1.14 soon. Before doing so, I'd like to make sure at a minimum that 1) Changes in array print formatting are done. 2) Proposed deprecations have been make. If there are other things that folks see as essential, now is the time to speak up. Chuckn -------------- next part -------------- An HTML attachment was scrubbed... URL: From bennyrowland at mac.com Sat Nov 4 06:42:34 2017 From: bennyrowland at mac.com (Ben Rowland) Date: Sat, 04 Nov 2017 10:42:34 +0000 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

Message-ID: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> > On 2 Nov 2017, at 22:39, Marten van Kerkwijk wrote: > > I guess my argument boils down to it being better to state that a > function only accepts arrays and happily let it break on, e.g., > matrix, than use `asarray` to make a matrix into an array even though > it really isn?t. I would support this attitude, the user can always call `asarray` when passing their data into the function if necessary, then they know upfront what the consequences will be. For my own ndarray subclass, I want it to behave exactly as a standard ndarray, but in addition I add some metadata and some functions that act on that, for example an affine transform and functions to convert between coordinate systems. The current numpy system of overriding __array_wrap__, __array_finalize__ and __new__ is great to allow the subclass and metadata to propagate through most basic operations. The problem is that many functions using `asarray` strip out all of this metadata and return a bare ndarray. My current solution is to implement an `inherit` method on my subclass which converts an ndarray and copies back all the metadata, which often looks like this: spec_data = data.inherit(np.fft.fft(data)) To use composition instead of inheritance would require me to forward every part of the ndarray API as is, which would be a great deal of work which in nearly every case would only achieve the same results as replacing `asarray` by `asanyarray` in various library functions. I don?t want to change the behaviour of the existing class, just to add some data and methods, and I can?t imagine I am alone in that. Ben > > I do like the dtype ideas, but think I'd agree they're likely to come > with their own problems. But just making new numerical types possible > is interesting. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From m.h.vankerkwijk at gmail.com Sat Nov 4 09:47:15 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 4 Nov 2017 09:47:15 -0400 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: <753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> References:

<753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com> Message-ID: Hi Ben, You just summarized excellently why I'm on a quest to change `asarray` to `asanyarray` within numy (or at least add a `subok` keyword for things like `broadcast_arrays`)! Obviously, this covers only ndarray subclasses, not duck types, though I guess in principle one could use the ABC registration mechanism mentioned above to let those types pass through. Returning to the original topic of the thread, with `__array_ufunc__` it now is even easier to keep track of your meta data for ufuncs, and has become possible to massage input data before the ufunc is called (rather than just the output). All the best, Marten From charlesr.harris at gmail.com Sun Nov 5 13:25:37 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 5 Nov 2017 11:25:37 -0700 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support Message-ID: Hi All, Thought I'd toss this out there. I'm tending towards better sooner than later in dropping Python 2.7 support as we are starting to run up against places where we would like to use Python 3 features. That is particularly true on Windows where the 2.7 compiler is really old and lacks C99 compatibility. In any case, the timeline I've been playing with is to keep Python 2.7 support through 2018, which given our current pace, would be for NumPy 1.15 and 1.16. After that 1.16 would become a long term support release with backports of critical bug fixes up until the time that Python 2.7 support officially ends. In that timeline, NumPy 1.17 would drop support for 2.7. That proposed schedule is subject to change pending developments and feed back. The main task I think is needed before dropping 2.7 is better handling of unicode strings and bytes. There is the #4208 PR that makes a start on that. If there are other things that folks think are essential, please mention them here. If nothing else, we can begin planning for the transition even if the schedule changes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 6 04:56:18 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Nov 2017 22:56:18 +1300 Subject: [Numpy-discussion] NumPy 1.14 branch. In-Reply-To: References: Message-ID: On Sat, Nov 4, 2017 at 3:56 PM, Charles R Harris wrote: > Hi All, > > I'd like to branch NumPy 1.14 soon. > Sounds good. Before doing so, I'd like to make sure at a minimum that > > 1) Changes in array print formatting are done. > 2) Proposed deprecations have been make. > > If there are other things that folks see as essential, now is the time to > speak up. > Are we good on the pytest status? I see https://github.com/numpy/numpy/pull/9386 is still open. Ralf > Chuckn > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 6 05:10:33 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Nov 2017 23:10:33 +1300 Subject: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 7:25 AM, Charles R Harris wrote: > Hi All, > > Thought I'd toss this out there. I'm tending towards better sooner than > later in dropping Python 2.7 support as we are starting to run up against > places where we would like to use Python 3 features. That is particularly > true on Windows where the 2.7 compiler is really old and lacks C99 > compatibility. > This is probably the most pressing reason to drop 2.7 support. We seem to be expending a lot of effort lately on this stuff. I was previously advocating being more conservative than the timeline you now propose, but this is the pain point that I think gets me over the line. In any case, the timeline I've been playing with is to keep Python 2.7 > support through 2018, which given our current pace, would be for NumPy 1.15 > and 1.16. After that 1.16 would become a long term support release with > backports of critical bug fixes up until the time that Python 2.7 support > officially ends. In that timeline, NumPy 1.17 would drop support for 2.7. > And 3.4 at the same time or even earlier. That proposed schedule is subject to change pending developments and feed > back. > +1 > The main task I think is needed before dropping 2.7 is better handling of > unicode strings and bytes. There is the #4208 > PR that makes a start on that. > Yep, at the very least we need one release that supports 2.7 *and* has fixed all the IO issues on 3.x Ralf If there are other things that folks think are essential, please mention > them here. If nothing else, we can begin planning for the transition even > if the schedule changes. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 6 10:56:11 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Nov 2017 08:56:11 -0700 Subject: [Numpy-discussion] NumPy 1.14 branch. In-Reply-To: References:

Message-ID: On Mon, Nov 6, 2017 at 2:56 AM, Ralf Gommers wrote: > > > On Sat, Nov 4, 2017 at 3:56 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> I'd like to branch NumPy 1.14 soon. >> > > Sounds good. > > Before doing so, I'd like to make sure at a minimum that >> >> 1) Changes in array print formatting are done. >> 2) Proposed deprecations have been make. >> >> If there are other things that folks see as essential, now is the time to >> speak up. >> > > Are we good on the pytest status? I see https://github.com/numpy/ > numpy/pull/9386 is still open. > I'm pushing off finishing the pytest transition to 1.15. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Nov 6 12:27:25 2017 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 6 Nov 2017 19:27:25 +0200 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 134, Issue 10 In-Reply-To: References: Message-ID: <6f32661d-83ee-c64f-51f8-b82597d8aa24@gmail.com> An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 6 17:18:24 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 6 Nov 2017 14:18:24 -0800 Subject: [Numpy-discussion] is __array_ufunc__ ready for prime-time? In-Reply-To: References:

<753062D5-582B-4F7B-A003-E24C88F70A2C@mac.com>