[Patches] [ python-Patches-479898 ] Multibyte string on string::string_print

noreply@sourceforge.net noreply@sourceforge.net
Mon, 07 Oct 2002 06:58:10 -0700


Patches item #479898, was opened at 2001-11-09 08:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479898&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Hye-Shik Chang (perky)
Assigned to: Nobody/Anonymous (nobody)
Summary: Multibyte string on string::string_print

Initial Comment:
Many multibyte language users are difficult to see 
native characters on list or dictionary and etc.
This patch allows printing multibyte on UNIX98-
compatible machines; mbtowc() is ISO/IEC 9899:1990 
standard C-API function.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-10-07 15:58

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, committed as

configure 1.343;
configure.in 1.354;
pyconfig.h.in 1.51;
stringobject.c 2.190;

I'm not quite sure that your correction is correct: If we
invoke iswprint, cr is already guaranteed to be >0, since we
otherwise goto nonprintable.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2001-12-10 04:38

Message:
Logged In: YES 
user_id=55188

Oops, one mistake. sorry.

stringobject.c:646

else if (_ISPRINT(c)) {
-> 
else if (cr > 0 && _ISPRINT(c)) {

(to detect whether mbtowc failed to convert)

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2001-12-10 04:26

Message:
Logged In: YES 
user_id=55188

I uploaded 2nd patches which contains configure support.
Unfortunately, Citrus(new generation locale support for 
*BSDs) didn't implemented iswprint() yet. but *BSDs 
supports wide character via Rune Locale isprint() func.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-07 14:21

Message:
Logged In: YES 
user_id=6380

Still, the patch as it exists is unacceptable -- it needs
configure support to decide whether to use mbtowc() and
whether to use iswprint() or isprint() (I would hope on BSD
there is also an iswprint(), to be standard-conforming).


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2001-12-07 07:38

Message:
Logged In: YES 
user_id=55188

Yes, it should be changed to iswprint on Linux systems. 
(but, isprint of BSD systems was designed for wide 
characters)
As loewis told, EUC codes of Korea, Japan, Taiwan doesn't 
use 0x7F-0x9F for printable character. So, I think that 
using mbtowc is unavoidable.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-06 16:12

Message:
Logged In: YES 
user_id=21627

You are right, the code should use iswprint instead.

The point is that multiple subsequent bytes can make up a
single printable character. Not every character above 127 is
necessarily printable (e.g. in Latin-1, only characters
above 160 are printable). Likewise, a single byte may not be
printable, but a combination will print fine. So this code
is supposed to catch only those cases where printing will
actually work.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-04 20:08

Message:
Logged In: YES 
user_id=6380

I don't understand the point of using mbtowc() here.

The code extracts a wide character, but then it uses
isprint() on it, and as far as I know, isprint() is not
defined on wide characters, only on 'unsigned char' (and on
-1).

Isn't what the author wants simply to is isprint(c) instead
of (c < ' ' || c >= 0x7f)???

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-11-09 22:21

Message:
Logged In: YES 
user_id=21627

Even though I think this patch is correct in principle, I
see a few problems with it:
1. Since it doesn't fix a bug, it probably cannot go into 2.2.
2. There is no autoconf test for mbtowc. You should test
this in configure, and then conditionalize your code on
HAVE_MBTOWC.
3. There is too much code duplication. Try to find a
solution which special-cases the escape codes (\something)
only once. For example, you may implement a trivial mbtowc
redefinition if mbtowc is not available, and then use mbtowc
always.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479898&group_id=5470