[Patches] [ python-Patches-1455898 ] patch for mbcs codecs

SourceForge.net noreply at sourceforge.net
Wed Mar 22 15:36:07 CET 2006


Patches item #1455898, was opened at 2006-03-22 16:31
Message generated for change (Comment added) made by ocean-city
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1455898&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Windows
Group: Python 2.4
Status: Open
Resolution: None
Priority: 7
Submitted By: ocean-city (ocean-city)
Assigned to: Walter Dörwald (doerwalter)
Summary: patch for mbcs codecs

Initial Comment:
Hello.

I have noticed mbcs codecs sometimes generates broken
string. I'm using Windows(Japanese) so mbcs is mapped
to cp932 (close to shift_jis)

When I run the attached script "a.zip", the entry
"Error 00007"'s message becomes broken like attached
file "b.txt".

I think this happens because the string passed to
PyUnicode_DecodeMBCS() sometimes terminates with
leading byte, and MultiByteToWideChar() counts it for
size of result string.buffer size.

I hope attached patch "mbcs.patch" may fix the problem.
It would be nice if this bug will be fixed in 2.4.3...
Thank you.






----------------------------------------------------------------------

>Comment By: ocean-city (ocean-city)
Date: 2006-03-22 23:36

Message:
Logged In: YES 
user_id=1200846

Sorry, I was stupid.

MSDN
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_0o2t.asp)
saids,

> IsDBCSLeadByte can only indicate a potential lead byte value. 

IsDBCSLeadByte was returning 1 for some trail byte (ex: "æ­´"[1])

The patch "mbcs3a.patch" worked for me, but _mbsbtype is
probably compiler specific. Is that OK?

The patch "mbcs3b.patch" also worked for me and it only uses
Win32API, but I don't have enough faith on this
implementation...



----------------------------------------------------------------------

Comment By: ocean-city (ocean-city)
Date: 2006-03-22 19:31

Message:
Logged In: YES 
user_id=1200846

Sorry, I found problem when tried more long text file...
Please wait. I'll investigate more intensibly.

----------------------------------------------------------------------

Comment By: ocean-city (ocean-city)
Date: 2006-03-22 19:13

Message:
Logged In: YES 
user_id=1200846

Thank you for reply. How about this? (I'm a newbie, I hope
this is right tex format but... can you confirm this? I
created this patch by copy & paste from
PyUnicode_DecodeUTF16Stateful and some modification)


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-03-22 18:12

Message:
Logged In: YES 
user_id=38388

One more nit: the doc patch is missing. Please add a patch
for the API docs.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-03-22 18:11

Message:
Logged In: YES 
user_id=38388

As I understand your comment, the mbcs codec will have a
problem if the input string terminates with a lead byte.

Could you add a comment regarding this to the patch ?!

I can't test the patch, since I don't have a Japanese
Windows to check on, but from looking at the patch, it seems OK.


----------------------------------------------------------------------

Comment By: ocean-city (ocean-city)
Date: 2006-03-22 16:42

Message:
Logged In: YES 
user_id=1200846

I forgot to mention this. "mbcs.patch" is for
release24-maint branch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1455898&group_id=5470


More information about the Patches mailing list