[ python-Bugs-1452697 ] broken string on mbcs

SourceForge.net noreply at sourceforge.net
Sat Mar 18 05:17:22 CET 2006


Bugs item #1452697, was opened at 2006-03-18 05:07
Message generated for change (Comment added) made by ocean-city
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452697&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: ocean-city (ocean-city)
Assigned to: M.-A. Lemburg (lemburg)
Summary: broken string on mbcs

Initial Comment:
Hello. I noticed unicode conversion from mbcs was
sometimes broken. This happened when I used
codecs.open("foo", "r", "mbcs") as iterator.

# It's OK if I use "shift_jis" or "cp932".

I'll attach the script and text file to reproduce the
problem. I'm using Win2000SP4(Japanese).

Thank you.

----------------------------------------------------------------------

>Comment By: ocean-city (ocean-city)
Date: 2006-03-18 13:17

Message:
Logged In: YES 
user_id=1200846

Probably this patch will fix the problem. (for release24-maint)

Cause: MultiByteToWideChar returns non zero value for
incomplete multibyte character. (ex: if buffer terminates
with leading byte, MultiByteToWideChar returns 1 (not 0) for
it. It should return 0, otherwise result will be broken.

Solution: Set flag MB_ERR_INVALID_CHARS to avoid incorrect
handling of trailing incomplete multibyte part. If error
occurs, removes the trailing byte and tries again.

Caution: I have not tested this so intensibly.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452697&group_id=5470


More information about the Python-bugs-list mailing list