[Python-Dev] repr vs. str and locales again
Guido van Rossum
guido@python.org
Sun, 21 May 2000 20:47:16 -0700
Let's reboot this thread. Never mind the details of the actual patch,
or why it would affect a particular index.
Obviously if we're going to patch string_print() we're also going to
patch string_repr() (and vice versa) -- the former (without the
Py_PRINT_RAW flag) is supposed to be an optimization of the latter.
(I hadn't even read the patch that far to realize that it only did one
and not the other.)
The point is simply this.
The repr() function for a string turns it into a valid string literal.
There's considerable freedom allowed in this conversion, some of which
is taken (e.g. it prefers single quotes but will use double quotes
when the string contains single quotes).
For safety reasons, control characters are replaced by their octal
escapes. This is also done for non-ASCI characters.
Lots of people, most of them living in countries where Latin-1 (or
another 8-bit ASCII superset) is in actual use, would prefer that
non-ASCII characters would be left alone rather than changed into
octal escapes. I think it's not unreasonable to ask that what they
consider printable characters aren't treated as control characters.
I think that using the locale to guide this is reasonable. If the
locale is set to imply Latin-1, then we can assume that most output
devices are capable of displaying those characters. What good does
converting those characters to octal escapes do us then? If the input
string was in fact binary goop, then the output will be unreadable
goop -- but it won't screw up the output device (as control characters
are wont to do, which is the main reason to turn them into octal
escapes).
So I don't see how the patch can do much harm, I don't expect that it
will break much code, and I see a real value for those who use
Latin-1 or other 8-bit supersets of ASCII.
The one objection could be that the locale may be obsolescent -- but
I've only heard /F vent an opinion about that; personally, I doubt
that we will be able to remove the locale any time soon, even if we
invent a better way. Plus, I think that "better way" should address
this issue anyway. If the locale eventually disappears, the feature
automatically disappears with it, because you *have* to make a
locale.setlocale() call before the behavior of repr() changes.
--Guido van Rossum (home page: http://www.python.org/~guido/)