[Python-Dev] Re: Multibyte repr()

Atsuo Ishimoto ishimoto@gembook.org
Thu, 10 Oct 2002 14:10:01 +0900


Hello,

On Wed, 09 Oct 2002 17:07:33 -0400
Guido van Rossum <guido@python.org> wrote:

> Well, if you *want* to see the hex codes for all non-ASCII characters,
> repr() used to be your friend.  No more.  If you *want* to see the
> printable characters, you could always use print.

I'm happy with this. I'm distributing modified version of Python Win32
installer at http://www.python.jp/Zope/download/pythonjpdist. This
version of Python contains similar modifications for Japanese ShiftJIS
users.

But this patch has one problem. Because result of repr() depends on
locale setting, we cannot assume text-form pickle could be restored
everywhere. For example, under Japanese ShiftJIS locale, 

>>> s = '\x83\x5c'  # This is a multi-byte character, third letter of "Python"
>>>                 # in Japanese. Note that trailing character is '\'
>>> 
>>> pickle.dump(s, f)

I assume CVS version of Python fails to load this pickled object because
backslash followed by quote is illegal. This problem may happens for
Japanese ShiftJIS encoding, but I don't know whether there are another
encodings causes same problem or not.

I think this is not a major problem since we can avoid this by using
binary form pickle, or using Unicode for text form pickle. But to
eliminate this problem entirely, Python can have another slot to get a
string representation of object, may be named tp_dumps. 
tp_dumps always returns hex codes for codes for all non-ASCII characters
and is called whenever valid Python string literals are required.

Regards,

--------------------------
Atsuo Ishimoto
ishimoto@gembook.jp
Homepage:http://www.gembook.jp