[Python-Dev] Re: Multibyte repr()
Guido van Rossum
guido@python.org
Thu, 10 Oct 2002 08:59:26 -0400
[Guido]
> > Well, if you *want* to see the hex codes for all non-ASCII characters,
> > repr() used to be your friend. No more. If you *want* to see the
> > printable characters, you could always use print.
[Atsuo Ishimoto]
> I'm happy with this.
"This" was ambiguous. Are you happy with what's in current CVS, or
with the old repr()?
> I'm distributing modified version of Python Win32
> installer at http://www.python.jp/Zope/download/pythonjpdist. This
> version of Python contains similar modifications for Japanese ShiftJIS
> users.
>
> But this patch has one problem. Because result of repr() depends on
> locale setting, we cannot assume text-form pickle could be restored
> everywhere. For example, under Japanese ShiftJIS locale,
>
> >>> s = '\x83\x5c' # This is a multi-byte character, third letter of "Python"
> >>> # in Japanese. Note that trailing character is '\'
> >>>
> >>> pickle.dump(s, f)
>
> I assume CVS version of Python fails to load this pickled object because
> backslash followed by quote is illegal. This problem may happens for
> Japanese ShiftJIS encoding, but I don't know whether there are another
> encodings causes same problem or not.
I tried this, and I could not find any problems with the resulting
pickle. The pickle looks like this:
"S'\\x83\\\\'\np0\n."
I couldn't get this to fail loading in Python 2.1, 2.2 or 2.3 (CVS);
I tried both pickle and cPickle.
> I think this is not a major problem since we can avoid this by using
> binary form pickle, or using Unicode for text form pickle. But to
> eliminate this problem entirely, Python can have another slot to get a
> string representation of object, may be named tp_dumps.
> tp_dumps always returns hex codes for codes for all non-ASCII characters
> and is called whenever valid Python string literals are required.
I don't think this particular issue (pickling) is a problem. But I
*do* continue to worry that making repr() depend on the locale may be
a bigger problem than what it attempts to solve.
[Hye-Shik Chang]
> I realized that string_repr's depending on locale can be a problem
> maker for many unexpected situations. What I wanted in this patch is
> just to see _real_ string even in lists or dictionaries.
> I and CJKV users may feel happy even without string_repr locale patch.
I'm not sure I follow. What is the alternative that you propose?
--Guido van Rossum (home page: http://www.python.org/~guido/)