[Python-Dev] repr vs. str and locales again
Guido van Rossum
guido@python.org
Fri, 19 May 2000 08:06:52 -0700
The email below suggests a simple solution to a problem that
e.g. Fran\347ois Pinard brought up long ago; repr() of a string turns
all non-ASCII chars into \oct escapes. Jyrki's solution: use
isprint(), which makes it locale-dependent. I can live with this.
It needs a Py_CHARMASK() call but otherwise seems to be fine.
Anybody got an opinion on this? I'm +0. I would even be +0 on a
similar patch for unicode strings (once the ASCII proposal is
implemented).
--Guido van Rossum (home page: http://www.python.org/~guido/)
------- Forwarded Message
Date: Fri, 19 May 2000 10:48:29 +0300
From: Jyrki Kuoppala <jkp@kaapeli.fi>
To: guido@python.org
Subject: python bug?: python 1.5.2 fails to print printable 8-bit characters in
strings
I'm not sure if this exactly is a bug, ie. whether python 1.5.2 is
supposed to support locales and 8-bit characters. However, on Linux
Debian "unstable" distribution the diff below makes python 1.5.2
handle printable 8-bit characters as one would expect.
Problem description:
python doesn't properly print printable 8-bit characters for the current locale
.
Details:
With no locale set, 8-bit characters in quoted strings print as
backslash-escapes, which I guess is OK:
$ unset LC_ALL
$ python
Python 1.5.2 (#0, Apr 3 2000, 14:46:48) [GCC 2.95.2 20000313 (Debian GNU/Linu
x)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> a=('foo','kääk')
>>> print a
('foo', 'k\344\344k')
>>>
But with a locale with a printable 'ä' character (octal 344) I get:
$ export LC_ALL=fi_FI
$ python
Python 1.5.2 (#0, Apr 3 2000, 14:46:48) [GCC 2.95.2 20000313 (Debian GNU/Linu
x)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> a=('foo','kääk')
>>> print a
('foo', 'k\344\344k')
>>>
I should be getting (output from python patched with the enclosed patch):
$ export LC_ALL=fi_FI
$ python
Python 1.5.2 (#0, May 18 2000, 14:43:46) [GCC 2.95.2 20000313 (Debian GNU/Linu
x)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> a=('foo','kääk')
>>> print a
('foo', 'kääk')
>>>
This hits for example when Zope with squishdot weblog (squishdot
0.3.2-3 with zope 2.1.6-1) creates a text index from posted articles -
strings with valid Latin1 characters get indexed as backslash-escaped
octal codes, and thus become unsearchable.
I am using debian unstable, kernels 2.2.15pre10 and 2.0.36, libc 2.1.3.
I suggest that the test for printability in python-1.5.2
/Objects/stringobject.c be fixed to use isprint() which takes the
locale into account:
- --- python-1.5.2/Objects/stringobject.c.orig Thu Oct 8 05:17:48 1998
+++ python-1.5.2/Objects/stringobject.c Thu May 18 14:36:28 2000
@@ -224,7 +224,7 @@
c = op->ob_sval[i];
if (c == quote || c == '\\')
fprintf(fp, "\\%c", c);
- - else if (c < ' ' || c >= 0177)
+ else if (! isprint (c))
fprintf(fp, "\\%03o", c & 0377);
else
fputc(c, fp);
@@ -260,7 +260,7 @@
c = op->ob_sval[i];
if (c == quote || c == '\\')
*p++ = '\\', *p++ = c;
- - else if (c < ' ' || c >= 0177) {
+ else if (! isprint (c)) {
sprintf(p, "\\%03o", c & 0377);
while (*p != '\0')
p++;
//Jyrki
------- End of Forwarded Message