[Python-3000] Recursive str

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 15 19:29:56 CEST 2008


Guido van Rossum writes:

 > A complaint about this would carry more weight when it came from
 > someone who actually has to deal with the issue than coming from a
 > purely theoretical perspective (unless I'm wrong and you actually read
 > Japanese).

This *is* a problem.  In my experience a lot of string bugs are "off
by one" bugs (inserting UTF signatures that shouldn't be there in the
middle of string, fencepost errors, etc), which stick out like a sore
thumb when printed readably.  But they're very hard to diagnose when
what I'm seeing looks like output from "cat /dev/random".

I don't suffer from it particularly because most of my test data is
ASCII, and even when I do use Japanese, Emacs has commands to "wash" a
portion of the buffer as needed.  On the other hand Japanese is my
second language.  I suppose a native might be really bothered that the
strings are not readable without extra effort.

 > Another issue is that repr() is supposed to return an 8-bit string. I
 > don't think we should put non-ASCII characters in the output in some
 > encoding.

No, we should not put non-ASCII characters in the output of repr() for
2.x.  It's not worth the effort to expand it to allow ISO 8859/1.  And
anything locale-specific is right out, you'll have buildbots going red
across the globe, no doubt.  Not just once, either.  Locale-specific
stuff is very hard to enforce consistency on.

 > In Py3k we may be able to do something else though -- instead of
 > insisting on ASCII we could allow a much larger set of characters to
 > be unescaped.

Yes.  The implications of the PEP 3131 discussions about Unicode
identifiers should be considered carefully.  Eg, consider the
potential of confusing ASCII 'A' with Cyrillic 'A'.  I'm very unhappy
with the idea of having Cyrillic 'A' \u-escaped when calling repr() on
objects in a Russian's program, but I don't like the alternative of
having "print repr(bogus)" being no more informative than "print
bogus" in this situation any better.


More information about the Python-3000 mailing list