The email below suggests a simple solution to a problem that e.g. Fran\347ois Pinard brought up long ago; repr() of a string turns all non-ASCII chars into \oct escapes. Jyrki's solution: use isprint(), which makes it locale-dependent. I can live with this. It needs a Py_CHARMASK() call but otherwise seems to be fine. Anybody got an opinion on this? I'm +0. I would even be +0 on a similar patch for unicode strings (once the ASCII proposal is implemented). --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Fri, 19 May 2000 10:48:29 +0300 From: Jyrki Kuoppala <jkp@kaapeli.fi> To: guido@python.org Subject: python bug?: python 1.5.2 fails to print printable 8-bit characters in strings I'm not sure if this exactly is a bug, ie. whether python 1.5.2 is supposed to support locales and 8-bit characters. However, on Linux Debian "unstable" distribution the diff below makes python 1.5.2 handle printable 8-bit characters as one would expect. Problem description: python doesn't properly print printable 8-bit characters for the current locale . Details: With no locale set, 8-bit characters in quoted strings print as backslash-escapes, which I guess is OK: $ unset LC_ALL $ python Python 1.5.2 (#0, Apr 3 2000, 14:46:48) [GCC 2.95.2 20000313 (Debian GNU/Linu x)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
a=('foo','kääk') print a ('foo', 'k\344\344k')
But with a locale with a printable 'ä' character (octal 344) I get: $ export LC_ALL=fi_FI $ python Python 1.5.2 (#0, Apr 3 2000, 14:46:48) [GCC 2.95.2 20000313 (Debian GNU/Linu x)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
a=('foo','kääk') print a ('foo', 'k\344\344k')
I should be getting (output from python patched with the enclosed patch): $ export LC_ALL=fi_FI $ python Python 1.5.2 (#0, May 18 2000, 14:43:46) [GCC 2.95.2 20000313 (Debian GNU/Linu x)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
a=('foo','kääk') print a ('foo', 'kääk')
This hits for example when Zope with squishdot weblog (squishdot 0.3.2-3 with zope 2.1.6-1) creates a text index from posted articles - strings with valid Latin1 characters get indexed as backslash-escaped octal codes, and thus become unsearchable. I am using debian unstable, kernels 2.2.15pre10 and 2.0.36, libc 2.1.3. I suggest that the test for printability in python-1.5.2 /Objects/stringobject.c be fixed to use isprint() which takes the locale into account: - --- python-1.5.2/Objects/stringobject.c.orig Thu Oct 8 05:17:48 1998 +++ python-1.5.2/Objects/stringobject.c Thu May 18 14:36:28 2000 @@ -224,7 +224,7 @@ c = op->ob_sval[i]; if (c == quote || c == '\\') fprintf(fp, "\\%c", c); - - else if (c < ' ' || c >= 0177) + else if (! isprint (c)) fprintf(fp, "\\%03o", c & 0377); else fputc(c, fp); @@ -260,7 +260,7 @@ c = op->ob_sval[i]; if (c == quote || c == '\\') *p++ = '\\', *p++ = c; - - else if (c < ' ' || c >= 0177) { + else if (! isprint (c)) { sprintf(p, "\\%03o", c & 0377); while (*p != '\0') p++; //Jyrki ------- End of Forwarded Message