Fwd: Re: Unambiguous repr for recursive objects
--Guido (mobile) ---------- Forwarded message ---------- From: <python-ideas-owner@python.org> Date: Dec 26, 2015 9:33 AM Subject: Re: [Python-ideas] Unambiguous repr for recursive objects To: <gvanrossum@gmail.com> Cc: Your message has been rejected, probably because you are not subscribed to the mailing list and the list's policy is to prohibit non-members from posting to it. If you think that your messages are being rejected in error, contact the mailing list owner at python-ideas-owner@python.org. ---------- Forwarded message ---------- From: Guido van Rossum <gvanrossum@gmail.com> To: Serhiy Storchaka <storchaka@gmail.com> Cc: Python-Ideas <python-ideas@python.org> Date: Sat, 26 Dec 2015 09:33:37 -0700 Subject: Re: [Python-ideas] Unambiguous repr for recursive objects I disagree. We should not take this guideline too literally. The dots are easily understood and nobody has been fooled by a list containing an ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?) --Guido (mobile)
So, Would I be correct in that, because of this regression in __repr__ behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off? On Dec 26, 2015 11:37 AM, "Guido van Rossum" <guido@python.org> wrote:
--Guido (mobile) ---------- Forwarded message ---------- From: <python-ideas-owner@python.org> Date: Dec 26, 2015 9:33 AM Subject: Re: [Python-ideas] Unambiguous repr for recursive objects To: <gvanrossum@gmail.com> Cc:
Your message has been rejected, probably because you are not subscribed to the mailing list and the list's policy is to prohibit non-members from posting to it. If you think that your messages are being rejected in error, contact the mailing list owner at python-ideas-owner@python.org.
---------- Forwarded message ---------- From: Guido van Rossum <gvanrossum@gmail.com> To: Serhiy Storchaka <storchaka@gmail.com> Cc: Python-Ideas <python-ideas@python.org> Date: Sat, 26 Dec 2015 09:33:37 -0700 Subject: Re: [Python-ideas] Unambiguous repr for recursive objects
I disagree. We should not take this guideline too literally. The dots are easily understood and nobody has been fooled by a list containing an ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?)
--Guido (mobile)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Saturday, December 26, 2015 8:48 AM, Wes Turner <wes.turner@gmail.com> wrote:
So,
Would I be correct in that, because of this regression in __repr__ behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off?
No. 2.x did not provide total information. It used the exact same __repr__ as 3.x. (If you can come up with a way to provide total information that's readable to both humans and the parser, I'm sure everyone would love to see it.) Any tests that string compare __repr__ to test the equality of two lists in 2.x will get the same results in 3.x. They're still probably bad tests, but no worse than before. The only difference is that the `...` is a valid literal in 3.x, so `[1, 2, [...]]` is a valid list display in 3.x, and it wasn't in 2.x. (Even there, tests that assume __repr__ equality are no more broken than before: a list containing 1, 2, and itself reprs as `[1, 2, [...]]`, while a list containing 1, 2, and a list containing `...` reprs as `[1, 2, [Ellipsis]]`, so they will not be mistakenly compared equal.) As Serhiy points out, there actually _is_ a regression here: tests that depend on the fact that a circular list will raise a SyntaxError on eval(repr(x)) do break with 3.0. I doubt there were many such tests, given that nobody's noticed the problem until half a decade later, but I suppose that is a regression. At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
On Dec 26, 2015 4:05 PM, "Andrew Barnert" <abarnert@yahoo.com> wrote:
On Saturday, December 26, 2015 8:48 AM, Wes Turner <wes.turner@gmail.com>
wrote:
So,
Would I be correct in that, because of this regression in __repr__
behavior (elipsis instead of total information) from 2.x to 3.x, any tests that string compare __repr__ are now off?
No. 2.x did not provide total information. It used the exact same
__repr__ as 3.x. (If you can come up with a way to provide total information that's readable to both humans and the parser, I'm sure everyone would love to see it.)
Any tests that string compare __repr__ to test the equality of two lists
in 2.x will get the same results in 3.x. They're still probably bad tests, but no worse than before. The only difference is that the `...` is a valid literal in 3.x, so `[1, 2, [...]]` is a valid list display in 3.x, and it wasn't in 2.x. (Even there, tests that assume __repr__ equality are no more broken than before: a list containing 1, 2, and itself reprs as `[1, 2, [...]]`, while a list containing 1, 2, and a list containing `...` reprs as `[1, 2, [Ellipsis]]`, so they will not be mistakenly compared equal.) Got it, thanks! * https://docs.python.org/3/library/constants.html#Ellipsis * http://python-reference.readthedocs.org/ewhat-does-the-python-ellipsis-objec... * http://stackoverflow.com/questions/772124/what-does-the-python-ellipsis-obje...
As Serhiy points out, there actually _is_ a regression here: tests that
depend on the fact that a circular list will raise a SyntaxError on eval(repr(x)) do break with 3.0. I doubt there were many such tests, given that nobody's noticed the problem until half a decade later, but I suppose that is a regression.
At any rate, I think what people are actually worried about here is not
the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
Andrew Barnert writes:
No. 2.x did not provide total information. It used the exact same __repr__ as 3.x. (If you can come up with a way to provide total information that's readable to both humans and the parser, I'm sure everyone would love to see it.)
Emacs Lisp has an option to (and some versions of Javascript used to borrow the same syntax) represent circular references and duplicate references with a syntax where the first reference has #N=(whatever) and other references as #N#. So, a circular list would be #1=(#1#); a list containing a reference to itself and two references to another list would be #1=(#1# #2=(3) #2#), etc. Emacs' parser supports it, Javascript's never did even on the versions that could produce the format. When the option is turned off, it substitutes circular references, but not duplicate references, with #N where N appears the level of nesting from the top of the expression where the outermost copy of the reference appears, a syntax which is not supported by its parser.
On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
Right, this is the reason I think it's reasonable to suggesting changing the recursive repr - the current form is one that *humans* that have only learned Python 3 are likely to misinterpret, since the fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a quirk originating in the fact that "..." is restricted to subscripts in Python 2. I don't think it's a major problem (as recursive container representations aren't something that comes up every day), but switching to "<...>" does have the advantage of allowing for a consistent recursive reference representation across all container types, regardless of whether they have native syntax or not. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
At any rate, I think what people are actually worried about here is not
On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote: the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
Right, this is the reason I think it's reasonable to suggesting changing the recursive repr - the current form is one that *humans* that have only learned Python 3 are likely to misinterpret, since the fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a quirk originating in the fact that "..." is restricted to subscripts in Python 2.
I don't think it's a major problem (as recursive container representations aren't something that comes up every day), but switching to "<...>" does have the advantage of allowing for a consistent recursive reference representation across all container types, regardless of whether they have native syntax or not.
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual. -- --Guido van Rossum (python.org/~guido)
On 2015-12-27 17:16, Guido van Rossum wrote:
On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:
On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote: > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
Right, this is the reason I think it's reasonable to suggesting changing the recursive repr - the current form is one that *humans* that have only learned Python 3 are likely to misinterpret, since the fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a quirk originating in the fact that "..." is restricted to subscripts in Python 2.
I don't think it's a major problem (as recursive container representations aren't something that comes up every day), but switching to "<...>" does have the advantage of allowing for a consistent recursive reference representation across all container types, regardless of whether they have native syntax or not.
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
We could always just use 4 dots instead.
On 12/27/2015 06:16 PM, Guido van Rossum wrote:
On Sat, Dec 26, 2015 at 10:19 PM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:
On 27 December 2015 at 07:05, Andrew Barnert via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote: > At any rate, I think what people are actually worried about here is not the theoretical chance that such a regression might have happened 5 years ago, but the more practical fact that 3.x might be misleading to human beings in cases where 2.x wasn't. For example, if you mostly do 3.x NumPy stuff, you're used to passing ellipses around, and maybe even storing them in index arrays, but you rarely if ever see a circular list. So, when you see, say, `[[1], [2], [...]]` on the REPL, you may misinterpret it as meaning something different from what it does.
Right, this is the reason I think it's reasonable to suggesting changing the recursive repr - the current form is one that *humans* that have only learned Python 3 are likely to misinterpret, since the fact that "repr(...)"produces "Ellipsis" rather than "..." is itself a quirk originating in the fact that "..." is restricted to subscripts in Python 2.
I don't think it's a major problem (as recursive container representations aren't something that comes up every day), but switching to "<...>" does have the advantage of allowing for a consistent recursive reference representation across all container types, regardless of whether they have native syntax or not.
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
I'm not sure. As a newcomer, I would see the "..." ellipsis as "something has been left out" (possibly because of printout length etc., exactly as it is used in numpy, BTW), not "you made a recursive structure". Explicit (and still un-evalable) would be e.g. "<recursive>" Georg
On 27.12.15 19:16, Guido van Rossum wrote:
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
My second alternative was to use full object.__repr__. E.g. <list object at 0xb7111498>. Or, if this is considered too long, shorter form: <list>. Or, as Georg suggested, use the word "recursive" for clearness: <recursive>. Or combine type name and the word "recursive": <recursive list>.
On Sun, Dec 27, 2015 at 1:20 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 27.12.15 19:16, Guido van Rossum wrote:
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
My second alternative was to use full object.__repr__. E.g. <list object at 0xb7111498>.
The problem isn't that it's too long (though it is) but that it just poses the question "why is this not using the regular [etc] notation?"
Or, if this is considered too long, shorter form: <list>.
Same here.
Or, as Georg suggested, use the word "recursive" for clearness: <recursive>. Or combine type name and the word "recursive": <recursive list>.
Sure, but I still am curious what problem you are really trying to solve. The problem seems to be purely in your mind. You also seem to be taken the guideline that the repr() of an object should be eval()-able way too strictly. It is just a guideline to help class authors decide what their repr() should look like if they don't have a better idea. And the guideline encourages writing repr()s that are intuitive to readers. Beyond that there's nothing of value -- it just reduces guesswork on both sides. -- --Guido van Rossum (python.org/~guido)
On 27.12.15 22:41, Guido van Rossum wrote:
Sure, but I still am curious what problem you are really trying to solve. The problem seems to be purely in your mind. You also seem to be taken the guideline that the repr() of an object should be eval()-able way too strictly. It is just a guideline to help class authors decide what their repr() should look like if they don't have a better idea. And the guideline encourages writing repr()s that are intuitive to readers. Beyond that there's nothing of value -- it just reduces guesswork on both sides.
Thank you, now I understand this.
On Sun, Dec 27, 2015 at 10:16:34AM -0700, Guido van Rossum wrote:
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
As a data point, or perhaps an anecdote point, I've been a regular on the tutor@ and python-list@ lists for many years now, and I don't recall seeing recursive lists being an issue. I can't categorically say that it has *never* come up, but it certainly isn't common. My sense is that the not-really-an-invariant-more-of-a-guideline that eval'ing the repr of an object returns the object is not that important here. There are many things you can put in a list which will break the invariant. It is a little unfortunate that [...] is no longer a syntax error, giving us this: eval("[[...]]") == [[Ellipsis]] but I don't see that as a problem worth fixing. I think the repr of OrderedDict is fine the way it is, and I like the fact that it uses a bare ... to refer to itself rather than wrapping it in braces like regular dicts. It just looks nicer in the OrderedDict repr: OrderedDict([('key', ...)]) versus OrderedDict([('key', {...})]) I thought I would generate an extreme example, an OrderedDict with multiple references to itself in values which contain references to themselves as well: py> from collections import OrderedDict py> o = OrderedDict([(1, []), (2, {}), (3, ('a', []))]) py> o[1].append(o[1]) py> o[1].append(o) py> o[2]['x'] = o[2] py> o[2]['y'] = o py> o[3][-1].append(o[3]) py> o[3][-1].append(o) py> o[4] = o py> o OrderedDict([(1, [[...], ...]), (2, {'y': ..., 'x': {...}}), (3, ('a', [(...), ...])), (4, ...)]) As an extreme case, I would hope that I would never need to debug something this complex in real life, but I think it is useful to see all the different kinds of recursive reprs in one place. I think it is useful that they are all slightly different. If it looked like this: OrderedDict([(1, [<...>, <...>]), (2, {'y': <...>, 'x': <...>}), (3, ('a', [<...>, <...>])), (4, <...>)]) we would lose valuable hints about the types, and if they all used object __repr__ the amount of visual noise would be overwhelming: OrderedDict([(1, [<list object at 0xb7bcf7cc>, <collections.OrderedDict object at 0xb7bcbc5c>]), (2, {'y': <collections.OrderedDict object at 0xb7bcbc5c>, 'x': <dict object at 0xb7c6f96c>}), (3, ('a', [<tuple object at 0xb7bb320c>, <collections.OrderedDict object at 0xb7bcbc5c>])), (4, <collections.OrderedDict object at 0xb7bcbc5c>)]) Given the risk that any such change will break doctests, I don't think this is a problem worth fixing: +1 on keeping the status quo -1 on using the verbose object.__repr__ -0.5 on consistently using <...> for all types -0.5 on changing the repr of recursive OrderedDicts to be more like dict -- Steve
Newlines in OrderedDict.__repr__ and/or pprint(OrderedDict) would be helpful: * http://stackoverflow.com/questions/4301069/any-way-to-properly-pretty-print-... * http://bugs.python.org/issue10592 closed; superseded by: * http://bugs.python.org/issue7434 "general pprint rewrite" As a workaround, a suitable JSONEncoder and json.dumps(obj, indent=2) works alright. On Dec 27, 2015 7:55 PM, "Steven D'Aprano" <steve@pearwood.info> wrote:
On Sun, Dec 27, 2015 at 10:16:34AM -0700, Guido van Rossum wrote:
I really feel you all are overworrying and overthinking this. A downside to me is that <...> isn't clear about what the type of the object is. The use case here is not sophisticated users, it's beginners who have accidentally managed to create a recursive list or dict. They have most likely not even encountered Ellipsis objects yet. There's nothing clearer than the current notation to help them see that they've done something unusual.
As a data point, or perhaps an anecdote point, I've been a regular on the tutor@ and python-list@ lists for many years now, and I don't recall seeing recursive lists being an issue. I can't categorically say that it has *never* come up, but it certainly isn't common.
My sense is that the not-really-an-invariant-more-of-a-guideline that eval'ing the repr of an object returns the object is not that important here. There are many things you can put in a list which will break the invariant. It is a little unfortunate that [...] is no longer a syntax error, giving us this:
eval("[[...]]") == [[Ellipsis]]
but I don't see that as a problem worth fixing.
I think the repr of OrderedDict is fine the way it is, and I like the fact that it uses a bare ... to refer to itself rather than wrapping it in braces like regular dicts. It just looks nicer in the OrderedDict repr:
OrderedDict([('key', ...)])
versus
OrderedDict([('key', {...})])
I thought I would generate an extreme example, an OrderedDict with multiple references to itself in values which contain references to themselves as well:
py> from collections import OrderedDict py> o = OrderedDict([(1, []), (2, {}), (3, ('a', []))]) py> o[1].append(o[1]) py> o[1].append(o) py> o[2]['x'] = o[2] py> o[2]['y'] = o py> o[3][-1].append(o[3]) py> o[3][-1].append(o) py> o[4] = o py> o OrderedDict([(1, [[...], ...]), (2, {'y': ..., 'x': {...}}), (3, ('a', [(...), ...])), (4, ...)])
As an extreme case, I would hope that I would never need to debug something this complex in real life, but I think it is useful to see all the different kinds of recursive reprs in one place. I think it is useful that they are all slightly different. If it looked like this:
OrderedDict([(1, [<...>, <...>]), (2, {'y': <...>, 'x': <...>}), (3, ('a', [<...>, <...>])), (4, <...>)])
we would lose valuable hints about the types, and if they all used object __repr__ the amount of visual noise would be overwhelming:
OrderedDict([(1, [<list object at 0xb7bcf7cc>, <collections.OrderedDict object at 0xb7bcbc5c>]), (2, {'y': <collections.OrderedDict object at 0xb7bcbc5c>, 'x': <dict object at 0xb7c6f96c>}), (3, ('a', [<tuple object at 0xb7bb320c>, <collections.OrderedDict object at 0xb7bcbc5c>])), (4, <collections.OrderedDict object at 0xb7bcbc5c>)])
Given the risk that any such change will break doctests, I don't think this is a problem worth fixing:
+1 on keeping the status quo -1 on using the verbose object.__repr__ -0.5 on consistently using <...> for all types -0.5 on changing the repr of recursive OrderedDicts to be more like dict
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 28 December 2015 at 10:55, Steven D'Aprano <steve@pearwood.info> wrote:
Given the risk that any such change will break doctests, I don't think this is a problem worth fixing:
+1 on keeping the status quo -1 on using the verbose object.__repr__ -0.5 on consistently using <...> for all types -0.5 on changing the repr of recursive OrderedDicts to be more like dict
+1 here - I've been persuaded that changing this behaviour isn't worth the disruption (to existing third party documentation, if nothing else). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I disagree. We should not take this guideline too literally. The dots are easily understood and nobody has been fooled by a list containing an ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?)
Yes, the repr of an ellipsis is the string 'Ellipsis'. But the repr of recursive list still looks as Python expression, and if somebody uses repr/eval wraparound, he will silently get wrong result instead of an error.
Well, that is not the point of the repr() guideline. It is so you can understand what value you got. There are plenty of other cases where eval() of the repr silently gives something different, e.g. when the same object occurs multiple times. Neither proposal is clearer to understand. --Guido (mobile) On Dec 26, 2015 9:59 AM, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
I disagree. We should not take this guideline too literally. The dots
are easily understood and nobody has been fooled by a list containing an ellipsis yet. (Also, isn't the repr of an ellipsis the string 'Ellipsis'?)
Yes, the repr of an ellipsis is the string 'Ellipsis'. But the repr of recursive list still looks as Python expression, and if somebody uses repr/eval wraparound, he will silently get wrong result instead of an error.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (9)
-
Andrew Barnert
-
Georg Brandl
-
Guido van Rossum
-
MRAB
-
Nick Coghlan
-
Random832
-
Serhiy Storchaka
-
Steven D'Aprano
-
Wes Turner