[Python-3000] Recursive str

Wed Apr 16 03:10:07 CEST 2008

On Tue, Apr 15, 2008 at 7:06 PM, atsuo ishimoto <ishimoto at gembook.org> wrote:
> For debugging, I think patch http://bugs.python.org/issue2630 is
> practical enough if error handler of sys.stdout is 'backslashescape'.
>
> If you are Russian and you want to print list of Cyrillic string, you
> can print repr(listOfRussian). If you want to see more detailed
> information of specific string, you can print
> repr(russianStr).encode("ascii", "backslashreplace"). Latter gives
> you a same result as Python2's repr(russianStr).
> If you are not Russian and working on ASCII console,
> print(repr(listOfRussian)) give you a same result as Python2.

I agree with that this is enoguh. I see two main uses for repr when it
comes to strings: to put quotes around the contents, and to replace
control characters with safe representations the interpreter
understands. The third use, to represent strings unambiguously, is not
a major point, and is clearly not serviced as you cannot tell via repr
if string1 *is* string2; only that they are equal.

The first (quotes) disambiguates values in lists containing strings
with commas, and the second (backslash replaced control characters)
avoids using characters with special meanings. The latter also
historically disambiguates everything beyond ascii, but in practice
just as it's more useful to have 'mystring' than <str object at
0x12345678>. Similarly it's more useful to have '日本語' than to have
'\u1234\u5678\u9abc'. While there are cases this can become visually
ambiguous, it will still pass the ideal case s == eval(repr(s)).
Finally, Atsuo Ishimoto's .encode("ascii", "backslashreplace") is much
more explicit about expectations, and handles identifying whether you
have a combined character, or a base and combining diacritic, etc.

What should the string_escape codec do when repr has been changed
(assuming it's not internally linked directly to repr)? I can see
benefits to matching repr and benefits to being more like
ASCII+backslashreplace, and don't have a strong preference like I do
for repr.

[Apologies for hitting reply on the unicodedata suggestion yesterday.]

Michael
-- 
Michael Urman