[Numpy-discussion] proposed changes to array printing in 1.14

Ralf Gommers ralf.gommers at gmail.com
Fri Jun 30 19:05:08 EDT 2017


On Sat, Jul 1, 2017 at 7:04 AM, CJ Carey <perimosocordiae at gmail.com> wrote:

> Is it feasible/desirable to provide a doctest runner that ignores
> whitespace?
>

Yes, and yes. Due to doctest being in the stdlib that is going to take
forever to have any effect though; a separate our-sane-doctest module would
be the way to ship this I think.

And not only whitespace, also provide sane floating point comparison
behavior (AstroPy has something for that that can be reused:
https://github.com/astropy/astropy/issues/6312) as well as things a bit
more specific to the needs of scientific Python projects like ignoring the
hashes in returned matplotlib objects.


> That would allow downstream projects to fix their doctests on 1.14+ with a
> one-line change, without breaking tests on 1.13.
>

It's worth reading https://docs.python.org/2/library/doctest.html#soapbox.
At least the first 2 paragraphs; the rest is mainly an illustration of why
doctest default behavior is evil ("doctest also makes an excellent tool for
regression testing" - eh, no). The only valid reason nowadays to use
doctests is to test that doc examples run and are correct. None of
{whitespace, blank lines, small floating point differences between
platforms/libs, hashes} are valid reasons to get a test failure.

At the moment there's no polished alternative to using stdlib doctest, so
I'm sympathetic to the argument of "this causes a lot of work". On the
other hand, exact repr's are not part of the NumPy (or Python for that
matter) backwards compatibility guarantees. So imho we should provide that
alternative to doctest, and then no longer worry about these kinds of
changes and just make them.

Until we have that alternative, I think
https://github.com/scipy/scipy/blob/master/tools/refguide_check.py may be
useful to other projects - it checks that your examples are not broken,
without doing the detailed string comparisons that are so fragile.

Ralf



>
> On Fri, Jun 30, 2017 at 11:11 AM, Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 06/30/2017 03:55 AM, Juan Nunez-Iglesias wrote:
>>
>>> To reiterate my point on a previous thread, I don't think this should
>>> happen until NumPy 2.0. This *will* break a massive number of doctests, and
>>> what's worse, it will do so in a way that makes it difficult to support
>>> doctesting for both 1.13 and 1.14. I don't see a big enough benefit to
>>> these changes to justify breaking everyone's tests before an API-breaking
>>> version bump.
>>>
>>
>> I am still on the fence about exactly how annoying this change would be,
>> and it is is good to hear whether this affects you and how badly.
>>
>> Yes, someone would have to spend an hour removing a hundred spaces in
>> doctests, and the 1.13 to 1.14 period is trickier (but virtualenv helps).
>> But none of your end users are going to have their scripts break, there are
>> no new warnings or exceptions.
>>
>> A followup questions is, to what degree can we compromise? Would it be
>> acceptable to skip the big change #1, but keep the other 3 changes? I
>> expect they affect far fewer doctests. Or, for instance, I could scale back
>> #1 so it only affects size-1 (or perhaps, only size-0) arrays. What amount
>> of change would be OK, and how is changing a small number of doctests
>> different from changing more?
>>
>> Also, let me clarify the motivations for the changes. As Marten noted,
>> change #2 is what motivated all the other changes. Currently 0d arrays
>> print in their own special way which was making it very hard to implement
>> fixes to voidtype str/repr, and the datetime and other 0d reprs are weird.
>> The fix is to make 0d arrays print using the same code-path as higher-d
>> ndarrays, but then we ended up with reprs like "array( 1.)" because of the
>> space for the sign position. So I removed the space from the sign position
>> for all float arrays. But as I noted I probably could remove it for only
>> size-1 or 0d arrays and still fix my problem, even though I think it might
>> be pretty hacky to implement in the numpy code.
>>
>> Allan
>>
>>
>>
>>
>>
>>> On 30 Jun 2017, 6:42 AM +1000, Marten van Kerkwijk <
>>> m.h.vankerkwijk at gmail.com>, wrote:
>>>
>>>> To add to Allan's message: point (2), the printing of 0-d arrays, is
>>>> the one that is the most important in the sense that it rectifies a
>>>> really strange situation, where the printing cannot be logically
>>>> controlled by the same mechanism that controls >=1-d arrays (see PR).
>>>>
>>>> While point 3 can also be considered a bug fix, 1 & 4 are at some
>>>> level matters of taste; my own reason for supporting their
>>>> implementation now is that the 0-d arrays already forces me (or,
>>>> specifically, astropy) to rewrite quite a few doctests, and I'd rather
>>>> have everything in one go -- in this respect, it is a pity that this
>>>> is separate from the earlier change in printing for structured arrays
>>>> (which was also much for the better, but broke a lot of doctests).
>>>>
>>>> -- Marten
>>>>
>>>>
>>>>
>>>> On Thu, Jun 29, 2017 at 3:38 PM, Allan Haldane <allanhaldane at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> There are various updates to array printing in preparation for numpy
>>>>> 1.14. See https://github.com/numpy/numpy/pull/9139/
>>>>>
>>>>> Some are quite likely to break other projects' doc-tests which expect a
>>>>> particular str or repr of arrays, so I'd like to warn the list in case
>>>>> anyone has opinions.
>>>>>
>>>>> The current proposed changes, from most to least painful by my
>>>>> reckoning, are:
>>>>>
>>>>> 1.
>>>>> For float arrays, an extra space previously used for the sign position
>>>>> will now be omitted in many cases. Eg, `repr(arange(4.))` will now
>>>>> return 'array([0., 1., 2., 3.])' instead of 'array([ 0., 1., 2., 3.])'.
>>>>>
>>>>> 2.
>>>>> The printing of 0d arrays is overhauled. This is a bit finicky to
>>>>> describe, please see the release note in the PR. As an example of the
>>>>> effect of this, the `repr(np.array(0.))` now prints as 'array(0.)`
>>>>> instead of 'array(0.0)'. Also the repr of 0d datetime arrays is now
>>>>> like
>>>>> "array('2005-04-04', dtype='datetime64[D]')" instead of
>>>>> "array(datetime.date(2005, 4, 4), dtype='datetime64[D]')".
>>>>>
>>>>> 3.
>>>>> User-defined dtypes which did not properly implement their `repr` (and
>>>>> `str`) should do so now. Otherwise it now falls back to
>>>>> `object.__repr__`, which will return something ugly like
>>>>> `<mytype object at 0x7f37f1b4e918>`. (Previously you could depend on
>>>>> only implementing the `item` method and the repr of that would be
>>>>> printed. But no longer, because this risks infinite recursions.).
>>>>>
>>>>> 4.
>>>>> Bool arrays of size 1 with a 'True' value will now omit a space, so
>>>>> that
>>>>> `repr(array([True]))` is now 'array([True])' instead of
>>>>> 'array([ True])'.
>>>>>
>>>>> Allan
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170701/cc546da5/attachment-0001.html>


More information about the NumPy-Discussion mailing list