[Python-Dev] Re: Multibyte repr()

Guido van Rossum guido@python.org
Thu, 10 Oct 2002 08:59:26 -0400


[Guido]
> > Well, if you *want* to see the hex codes for all non-ASCII characters,
> > repr() used to be your friend.  No more.  If you *want* to see the
> > printable characters, you could always use print.

[Atsuo Ishimoto]
> I'm happy with this.

"This" was ambiguous.  Are you happy with what's in current CVS, or
with the old repr()?

> I'm distributing modified version of Python Win32
> installer at http://www.python.jp/Zope/download/pythonjpdist. This
> version of Python contains similar modifications for Japanese ShiftJIS
> users.
> 
> But this patch has one problem. Because result of repr() depends on
> locale setting, we cannot assume text-form pickle could be restored
> everywhere. For example, under Japanese ShiftJIS locale, 
> 
> >>> s = '\x83\x5c'  # This is a multi-byte character, third letter of "Python"
> >>>                 # in Japanese. Note that trailing character is '\'
> >>> 
> >>> pickle.dump(s, f)
> 
> I assume CVS version of Python fails to load this pickled object because
> backslash followed by quote is illegal. This problem may happens for
> Japanese ShiftJIS encoding, but I don't know whether there are another
> encodings causes same problem or not.

I tried this, and I could not find any problems with the resulting
pickle.  The pickle looks like this:

"S'\\x83\\\\'\np0\n."

I couldn't get this to fail loading in Python 2.1, 2.2 or 2.3 (CVS);
I tried both pickle and cPickle.

> I think this is not a major problem since we can avoid this by using
> binary form pickle, or using Unicode for text form pickle. But to
> eliminate this problem entirely, Python can have another slot to get a
> string representation of object, may be named tp_dumps. 
> tp_dumps always returns hex codes for codes for all non-ASCII characters
> and is called whenever valid Python string literals are required.

I don't think this particular issue (pickling) is a problem.  But I
*do* continue to worry that making repr() depend on the locale may be
a bigger problem than what it attempts to solve.

[Hye-Shik Chang]
> I realized that string_repr's depending on locale can be a problem
> maker for many unexpected situations. What I wanted in this patch is
> just to see _real_ string even in lists or dictionaries.
> I and CJKV users may feel happy even without string_repr locale patch.

I'm not sure I follow.  What is the alternative that you propose?

--Guido van Rossum (home page: http://www.python.org/~guido/)