Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, 2017-06-30 at 17:55 +1000, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
Just so we are on the same page, nobody is planning a NumPy 2.0, so insisting on not changing anything until a possible NumPy 2.0 is almost like saying it should never happen. Of course we could enmass deprecations and at some point do many at once and call it 2.0, but I am not sure that helps anyone, when compared to saying that we do deprecations for 1-2 years at least, and longer if someone complains.
The question is, do you really see a big advantage in fixing a gazillion tests at once over doing a small part of the fixes one after another? The "big step" thing did not work too well for Python 3....
- Sebastian
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail. com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Indeed, for scikit-learn, this would be a major problem.
Gaël
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 06/30/2017 09:17 AM, Gael Varoquaux wrote:
Indeed, for scikit-learn, this would be a major problem.
Gaël
I just ran the scikit-learn tests.
With the new behavior (removed whitespace), I do get 70 total failures:
$ make test-doc Ran 39 tests in 39.503s FAILED (SKIP=3, failures=19)
$ make test Ran 8122 tests in 387.650s FAILED (SKIP=58, failures=51)
After setting `np.set_printoptions(pad_sign=True)` (see other email) I get only 1 failure in total, which is due to the presence of a 0d array in gaussian_process.rst.
So it looks like the pad_sign option as currently implemented is good enough to avoid almost all doctest errors.
Allan
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
One problem is that it becomes hard (impossible?) for downstream packages such as scikit-learn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
Gaël
On Fri, Jun 30, 2017 at 06:30:53PM -0400, Allan Haldane wrote:
On 06/30/2017 09:17 AM, Gael Varoquaux wrote:
Indeed, for scikit-learn, this would be a major problem.
Gaël
I just ran the scikit-learn tests.
With the new behavior (removed whitespace), I do get 70 total failures:
$ make test-doc Ran 39 tests in 39.503s FAILED (SKIP=3, failures=19)
$ make test Ran 8122 tests in 387.650s FAILED (SKIP=58, failures=51)
After setting `np.set_printoptions(pad_sign=True)` (see other email) I get only 1 failure in total, which is due to the presence of a 0d array in gaussian_process.rst.
So it looks like the pad_sign option as currently implemented is good enough to avoid almost all doctest errors.
Allan
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Jun 30, 2017 at 4:47 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
One problem is that it becomes hard (impossible?) for downstream packages such as scikit-learn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
It's not that hard: wrap the new `set_printoptions(pad=True)` in a `try:` block to catch the error under old versions.
-- Robert Kern
I agree that shipping a sane/sanitising doctest runner would go 95% of the way to alleviating my concerns.
Regarding 2.0, this is the whole point of semantic versioning: downstream packages can pin their dependency as 1.x and know that they - will continue to work with any updates - won’t make their users choose between new NumPy 1.x features and running their software.
The Python 3.x transition was a huge fail, but the version numbering was not the problem.
I do have sympathy for Ralf’s argument that "exact repr's are not part of the NumPy (or Python for that matter) backwards compatibility guarantees”. But it is such a foundational project in Scientific Python that I think extreme care is warranted, beyond any official guarantees. (Hence this thread, yes. Thank you!)
Incidentally, I don’t think "array( 1.)” is such a tragic repr fail. I actually would welcome it because I’ve tried to JSON-serialise these buggers quite a few times because I didn’t realise they were 0d arrays instead of floats. So why exactly is it so bad that there is a space there?
Anyway, all this is (mostly) moot if the next NumPy ships with this doctest++ thingy. That would be an enormously valuable contribution to the whole ecosystem.
Thanks,
Juan.
On 1 Jul 2017, 9:56 AM +1000, Robert Kern robert.kern@gmail.com, wrote:
On Fri, Jun 30, 2017 at 4:47 PM, Gael Varoquaux gael.varoquaux@normalesup.org wrote:
One problem is that it becomes hard (impossible?) for downstream packages such as scikit-learn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
It's not that hard: wrap the new `set_printoptions(pad=True)` in a `try:` block to catch the error under old versions.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Jun 30, 2017 at 7:23 PM, Juan Nunez-Iglesias jni.soma@gmail.com wrote:
I agree that shipping a sane/sanitising doctest runner would go 95% of the way to alleviating my concerns.
Regarding 2.0, this is the whole point of semantic versioning: downstream packages can pin their dependency as 1.x and know that they
- will continue to work with any updates
- won’t make their users choose between new NumPy 1.x features and running
their software.
Semantic versioning is somewhere between useless and harmful for non-trivial projects. It's a lovely idea, it would be lovely if it worked, but in practice it either means you make every release a major release, which doesn't help anything, or else you never make a major release until eventually everyone gets so frustrated that they fork the project or do a python 3 style break-everything major release, which is a cure that's worse than the original disease.
NumPy's strategy instead is to make small, controlled, rolling breaking changes in 1.x releases. Every release breaks something for someone somewhere, but ideally only after debate and appropriate warning, and hopefully most release don't break things for *you*. Change is going to happen one way or another, and it's easier to manage a small amount of breakage every few releases than to manage a giant chunk all at once. (The latter just seems easier because it's in the future, so your brain is like "eh I'm sure I'll be fine" until you get there and realize how doomed you are.)
Plus, the reality is that every numpy release ever made has accidentally broken something for someone somewhere, so instead of lying to ourselves and pretending that we can keep things perfectly backwards compatible at all times, we might as well acknowledge that and try to manage the cost of breakage and make them worthwhile. Heck, even bug fixes are frequently compatibility-breaking changes in reality, and here we are debating whether tweaking whitespace in reprs is a compatibility-breaking change. There's no line of demarcation between breaking changes and non-breaking changes, just shades of grey, and we can do better engineering if our processes acknowledge that.
Another critique of semantic versioning: https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e
The Google philosophy of "error budgets", which is somewhat analogous to the argument I'm making for a compatibility-breakage budget: https://www.usenix.org/node/189332 https://landing.google.com/sre/book/chapters/service-level-objectives.html#x...
-n
On Fri, Jun 30, 2017 at 7:23 PM, Juan Nunez-Iglesias jni.soma@gmail.com wrote:
I do have sympathy for Ralf’s argument that "exact repr's are not part of
the NumPy (or Python for that matter) backwards compatibility guarantees”. But it is such a foundational project in Scientific Python that I think extreme care is warranted, beyond any official guarantees. (Hence this thread, yes. Thank you!)
I would also like to make another distinction here: I don't think anyone's actual *code* has broken because of this change. To my knowledge, it is only downstream projects' *doctests* that break. This might deserve *some* care on our part (beyond notification and keeping it out of a 1.x.y bugfix release), but "extreme care" is just not warranted.
Anyway, all this is (mostly) moot if the next NumPy ships with this
doctest++ thingy. That would be an enormously valuable contribution to the whole ecosystem.
I'd recommend just making an independent project on Github and posting it as its own project to PyPI when you think it's ready. We'll link to it in our documentation. I don't think that it ought to be part of numpy and stuck on our release cadence.
-- Robert Kern
On 06/30/2017 03:55 AM, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size-1 (or perhaps, only size-0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same code-path as higher-d ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size-1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Is it feasible/desirable to provide a doctest runner that ignores whitespace? That would allow downstream projects to fix their doctests on 1.14+ with a one-line change, without breaking tests on 1.13.
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size-1 (or perhaps, only size-0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same code-path as higher-d ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size-1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 06/30/2017 03:04 PM, CJ Carey wrote:
Is it feasible/desirable to provide a doctest runner that ignores whitespace? That would allow downstream projects to fix their doctests on 1.14+ with a one-line change, without breaking tests on 1.13.
Good idea. I have already implemented this actually, see the updated PR. https://github.com/numpy/numpy/pull/9139/
Whether or not the sign position is padded can now be controlled by setting
>>> np.set_printoptions(pad_sign=True) >>> np.set_printoptions(pad_sign=False)
When pad_sign is True, it gives the old behavior, except for size-1 arrays where it still omits the sign position. (Maybe I should limit it even more, to 0d arrays?)
When pad_sign is False (currently default in the PR), it removes the sign padding everywhere if possible.
Allan
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size-1 (or perhaps, only size-0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same code-path as higher-d ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size-1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Jul 1, 2017 at 7:04 AM, CJ Carey perimosocordiae@gmail.com wrote:
Is it feasible/desirable to provide a doctest runner that ignores whitespace?
Yes, and yes. Due to doctest being in the stdlib that is going to take forever to have any effect though; a separate our-sane-doctest module would be the way to ship this I think.
And not only whitespace, also provide sane floating point comparison behavior (AstroPy has something for that that can be reused: https://github.com/astropy/astropy/issues/6312) as well as things a bit more specific to the needs of scientific Python projects like ignoring the hashes in returned matplotlib objects.
That would allow downstream projects to fix their doctests on 1.14+ with a one-line change, without breaking tests on 1.13.
It's worth reading https://docs.python.org/2/library/doctest.html#soapbox. At least the first 2 paragraphs; the rest is mainly an illustration of why doctest default behavior is evil ("doctest also makes an excellent tool for regression testing" - eh, no). The only valid reason nowadays to use doctests is to test that doc examples run and are correct. None of {whitespace, blank lines, small floating point differences between platforms/libs, hashes} are valid reasons to get a test failure.
At the moment there's no polished alternative to using stdlib doctest, so I'm sympathetic to the argument of "this causes a lot of work". On the other hand, exact repr's are not part of the NumPy (or Python for that matter) backwards compatibility guarantees. So imho we should provide that alternative to doctest, and then no longer worry about these kinds of changes and just make them.
Until we have that alternative, I think https://github.com/scipy/scipy/blob/master/tools/refguide_check.py may be useful to other projects - it checks that your examples are not broken, without doing the detailed string comparisons that are so fragile.
Ralf
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan Nunez-Iglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an API-breaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size-1 (or perhaps, only size-0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same code-path as higher-d ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size-1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0-d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1-d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0-d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go -- in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
-- Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doc-tests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('2005-04-04', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
User-defined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion