proposed changes to array printing in 1.14
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Fri, 20170630 at 17:55 +1000, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
Just so we are on the same page, nobody is planning a NumPy 2.0, so insisting on not changing anything until a possible NumPy 2.0 is almost like saying it should never happen. Of course we could enmass deprecations and at some point do many at once and call it 2.0, but I am not sure that helps anyone, when compared to saying that we do deprecations for 12 years at least, and longer if someone complains.
The question is, do you really see a big advantage in fixing a gazillion tests at once over doing a small part of the fixes one after another? The "big step" thing did not work too well for Python 3....
 Sebastian
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail. com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
Indeed, for scikitlearn, this would be a major problem.
Gaël
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 06/30/2017 09:17 AM, Gael Varoquaux wrote:
Indeed, for scikitlearn, this would be a major problem.
Gaël
I just ran the scikitlearn tests.
With the new behavior (removed whitespace), I do get 70 total failures:
$ make testdoc Ran 39 tests in 39.503s FAILED (SKIP=3, failures=19)
$ make test Ran 8122 tests in 387.650s FAILED (SKIP=58, failures=51)
After setting `np.set_printoptions(pad_sign=True)` (see other email) I get only 1 failure in total, which is due to the presence of a 0d array in gaussian_process.rst.
So it looks like the pad_sign option as currently implemented is good enough to avoid almost all doctest errors.
Allan
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
One problem is that it becomes hard (impossible?) for downstream packages such as scikitlearn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
Gaël
On Fri, Jun 30, 2017 at 06:30:53PM 0400, Allan Haldane wrote:
On 06/30/2017 09:17 AM, Gael Varoquaux wrote:
Indeed, for scikitlearn, this would be a major problem.
Gaël
I just ran the scikitlearn tests.
With the new behavior (removed whitespace), I do get 70 total failures:
$ make testdoc Ran 39 tests in 39.503s FAILED (SKIP=3, failures=19)
$ make test Ran 8122 tests in 387.650s FAILED (SKIP=58, failures=51)
After setting `np.set_printoptions(pad_sign=True)` (see other email) I get only 1 failure in total, which is due to the presence of a 0d array in gaussian_process.rst.
So it looks like the pad_sign option as currently implemented is good enough to avoid almost all doctest errors.
Allan
On Fri, Jun 30, 2017 at 05:55:52PM +1000, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
1. For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
2. The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
3. Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
4. Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Fri, Jun 30, 2017 at 4:47 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
One problem is that it becomes hard (impossible?) for downstream packages such as scikitlearn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
It's not that hard: wrap the new `set_printoptions(pad=True)` in a `try:` block to catch the error under old versions.
 Robert Kern
I agree that shipping a sane/sanitising doctest runner would go 95% of the way to alleviating my concerns.
Regarding 2.0, this is the whole point of semantic versioning: downstream packages can pin their dependency as 1.x and know that they  will continue to work with any updates  won’t make their users choose between new NumPy 1.x features and running their software.
The Python 3.x transition was a huge fail, but the version numbering was not the problem.
I do have sympathy for Ralf’s argument that "exact repr's are not part of the NumPy (or Python for that matter) backwards compatibility guarantees”. But it is such a foundational project in Scientific Python that I think extreme care is warranted, beyond any official guarantees. (Hence this thread, yes. Thank you!)
Incidentally, I don’t think "array( 1.)” is such a tragic repr fail. I actually would welcome it because I’ve tried to JSONserialise these buggers quite a few times because I didn’t realise they were 0d arrays instead of floats. So why exactly is it so bad that there is a space there?
Anyway, all this is (mostly) moot if the next NumPy ships with this doctest++ thingy. That would be an enormously valuable contribution to the whole ecosystem.
Thanks,
Juan.
On 1 Jul 2017, 9:56 AM +1000, Robert Kern robert.kern@gmail.com, wrote:
On Fri, Jun 30, 2017 at 4:47 PM, Gael Varoquaux gael.varoquaux@normalesup.org wrote:
One problem is that it becomes hard (impossible?) for downstream packages such as scikitlearn to doctest under multiple versions of the numpy. Past experience has shown that it could be useful.
It's not that hard: wrap the new `set_printoptions(pad=True)` in a `try:` block to catch the error under old versions.
 Robert Kern _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Fri, Jun 30, 2017 at 7:23 PM, Juan NunezIglesias jni.soma@gmail.com wrote:
I agree that shipping a sane/sanitising doctest runner would go 95% of the way to alleviating my concerns.
Regarding 2.0, this is the whole point of semantic versioning: downstream packages can pin their dependency as 1.x and know that they
 will continue to work with any updates
 won’t make their users choose between new NumPy 1.x features and running
their software.
Semantic versioning is somewhere between useless and harmful for nontrivial projects. It's a lovely idea, it would be lovely if it worked, but in practice it either means you make every release a major release, which doesn't help anything, or else you never make a major release until eventually everyone gets so frustrated that they fork the project or do a python 3 style breakeverything major release, which is a cure that's worse than the original disease.
NumPy's strategy instead is to make small, controlled, rolling breaking changes in 1.x releases. Every release breaks something for someone somewhere, but ideally only after debate and appropriate warning, and hopefully most release don't break things for *you*. Change is going to happen one way or another, and it's easier to manage a small amount of breakage every few releases than to manage a giant chunk all at once. (The latter just seems easier because it's in the future, so your brain is like "eh I'm sure I'll be fine" until you get there and realize how doomed you are.)
Plus, the reality is that every numpy release ever made has accidentally broken something for someone somewhere, so instead of lying to ourselves and pretending that we can keep things perfectly backwards compatible at all times, we might as well acknowledge that and try to manage the cost of breakage and make them worthwhile. Heck, even bug fixes are frequently compatibilitybreaking changes in reality, and here we are debating whether tweaking whitespace in reprs is a compatibilitybreaking change. There's no line of demarcation between breaking changes and nonbreaking changes, just shades of grey, and we can do better engineering if our processes acknowledge that.
Another critique of semantic versioning: https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e
The Google philosophy of "error budgets", which is somewhat analogous to the argument I'm making for a compatibilitybreakage budget: https://www.usenix.org/node/189332 https://landing.google.com/sre/book/chapters/servicelevelobjectives.html#x...
n
On Fri, Jun 30, 2017 at 7:23 PM, Juan NunezIglesias jni.soma@gmail.com wrote:
I do have sympathy for Ralf’s argument that "exact repr's are not part of
the NumPy (or Python for that matter) backwards compatibility guarantees”. But it is such a foundational project in Scientific Python that I think extreme care is warranted, beyond any official guarantees. (Hence this thread, yes. Thank you!)
I would also like to make another distinction here: I don't think anyone's actual *code* has broken because of this change. To my knowledge, it is only downstream projects' *doctests* that break. This might deserve *some* care on our part (beyond notification and keeping it out of a 1.x.y bugfix release), but "extreme care" is just not warranted.
Anyway, all this is (mostly) moot if the next NumPy ships with this
doctest++ thingy. That would be an enormously valuable contribution to the whole ecosystem.
I'd recommend just making an independent project on Github and posting it as its own project to PyPI when you think it's ready. We'll link to it in our documentation. I don't think that it ought to be part of numpy and stuck on our release cadence.
 Robert Kern
On 06/30/2017 03:55 AM, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size1 (or perhaps, only size0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same codepath as higherd ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk m.h.vankerkwijk@gmail.com, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
Is it feasible/desirable to provide a doctest runner that ignores whitespace? That would allow downstream projects to fix their doctests on 1.14+ with a oneline change, without breaking tests on 1.13.
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size1 (or perhaps, only size0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same codepath as higherd ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 06/30/2017 03:04 PM, CJ Carey wrote:
Is it feasible/desirable to provide a doctest runner that ignores whitespace? That would allow downstream projects to fix their doctests on 1.14+ with a oneline change, without breaking tests on 1.13.
Good idea. I have already implemented this actually, see the updated PR. https://github.com/numpy/numpy/pull/9139/
Whether or not the sign position is padded can now be controlled by setting
>>> np.set_printoptions(pad_sign=True) >>> np.set_printoptions(pad_sign=False)
When pad_sign is True, it gives the old behavior, except for size1 arrays where it still omits the sign position. (Maybe I should limit it even more, to 0d arrays?)
When pad_sign is False (currently default in the PR), it removes the sign padding everywhere if possible.
Allan
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size1 (or perhaps, only size0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same codepath as higherd ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Sat, Jul 1, 2017 at 7:04 AM, CJ Carey perimosocordiae@gmail.com wrote:
Is it feasible/desirable to provide a doctest runner that ignores whitespace?
Yes, and yes. Due to doctest being in the stdlib that is going to take forever to have any effect though; a separate oursanedoctest module would be the way to ship this I think.
And not only whitespace, also provide sane floating point comparison behavior (AstroPy has something for that that can be reused: https://github.com/astropy/astropy/issues/6312) as well as things a bit more specific to the needs of scientific Python projects like ignoring the hashes in returned matplotlib objects.
That would allow downstream projects to fix their doctests on 1.14+ with a oneline change, without breaking tests on 1.13.
It's worth reading https://docs.python.org/2/library/doctest.html#soapbox. At least the first 2 paragraphs; the rest is mainly an illustration of why doctest default behavior is evil ("doctest also makes an excellent tool for regression testing"  eh, no). The only valid reason nowadays to use doctests is to test that doc examples run and are correct. None of {whitespace, blank lines, small floating point differences between platforms/libs, hashes} are valid reasons to get a test failure.
At the moment there's no polished alternative to using stdlib doctest, so I'm sympathetic to the argument of "this causes a lot of work". On the other hand, exact repr's are not part of the NumPy (or Python for that matter) backwards compatibility guarantees. So imho we should provide that alternative to doctest, and then no longer worry about these kinds of changes and just make them.
Until we have that alternative, I think https://github.com/scipy/scipy/blob/master/tools/refguide_check.py may be useful to other projects  it checks that your examples are not broken, without doing the detailed string comparisons that are so fragile.
Ralf
On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane allanhaldane@gmail.com wrote:
On 06/30/2017 03:55 AM, Juan NunezIglesias wrote:
To reiterate my point on a previous thread, I don't think this should happen until NumPy 2.0. This *will* break a massive number of doctests, and what's worse, it will do so in a way that makes it difficult to support doctesting for both 1.13 and 1.14. I don't see a big enough benefit to these changes to justify breaking everyone's tests before an APIbreaking version bump.
I am still on the fence about exactly how annoying this change would be, and it is is good to hear whether this affects you and how badly.
Yes, someone would have to spend an hour removing a hundred spaces in doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps). But none of your end users are going to have their scripts break, there are no new warnings or exceptions.
A followup questions is, to what degree can we compromise? Would it be acceptable to skip the big change #1, but keep the other 3 changes? I expect they affect far fewer doctests. Or, for instance, I could scale back #1 so it only affects size1 (or perhaps, only size0) arrays. What amount of change would be OK, and how is changing a small number of doctests different from changing more?
Also, let me clarify the motivations for the changes. As Marten noted, change #2 is what motivated all the other changes. Currently 0d arrays print in their own special way which was making it very hard to implement fixes to voidtype str/repr, and the datetime and other 0d reprs are weird. The fix is to make 0d arrays print using the same codepath as higherd ndarrays, but then we ended up with reprs like "array( 1.)" because of the space for the sign position. So I removed the space from the sign position for all float arrays. But as I noted I probably could remove it for only size1 or 0d arrays and still fix my problem, even though I think it might be pretty hacky to implement in the numpy code.
Allan
On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com>, wrote:
To add to Allan's message: point (2), the printing of 0d arrays, is the one that is the most important in the sense that it rectifies a really strange situation, where the printing cannot be logically controlled by the same mechanism that controls >=1d arrays (see PR).
While point 3 can also be considered a bug fix, 1 & 4 are at some level matters of taste; my own reason for supporting their implementation now is that the 0d arrays already forces me (or, specifically, astropy) to rewrite quite a few doctests, and I'd rather have everything in one go  in this respect, it is a pity that this is separate from the earlier change in printing for structured arrays (which was also much for the better, but broke a lot of doctests).
 Marten
On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane allanhaldane@gmail.com wrote:
Hello all,
There are various updates to array printing in preparation for numpy 1.14. See https://github.com/numpy/numpy/pull/9139/
Some are quite likely to break other projects' doctests which expect a particular str or repr of arrays, so I'd like to warn the list in case anyone has opinions.
The current proposed changes, from most to least painful by my reckoning, are:
For float arrays, an extra space previously used for the sign position will now be omitted in many cases. Eg, `repr(arange(4.))` will now return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
The printing of 0d arrays is overhauled. This is a bit finicky to describe, please see the release note in the PR. As an example of the effect of this, the `repr(np.array(0.))` now prints as 'array(0.)` instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now like "array('20050404', dtype='datetime64[D]')" instead of "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
Userdefined dtypes which did not properly implement their `repr` (and `str`) should do so now. Otherwise it now falls back to `object.__repr__`, which will return something ugly like `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on only implementing the `item` method and the repr of that would be printed. But no longer, because this risks infinite recursions.).
Bool arrays of size 1 with a 'True' value will now omit a space, so that `repr(array([True]))` is now 'array([True])' instead of 'array([ True])'.
Allan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
participants (9)

Allan Haldane

CJ Carey

Gael Varoquaux

Juan NunezIglesias

Marten van Kerkwijk

Nathaniel Smith

Ralf Gommers

Robert Kern

Sebastian Berg