[ python-Bugs-1532726 ] incorrect behaviour of PyUnicode_EncodeMBCS?
SourceForge.net
noreply at sourceforge.net
Wed Aug 2 07:31:33 CEST 2006
Bugs item #1532726, was opened at 2006-08-02 06:20
Message generated for change (Comment added) made by ocean-city
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1532726&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Jan-Willem (jwnmulder)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect behaviour of PyUnicode_EncodeMBCS?
Initial Comment:
Using python 2.4.3
This behaviour is not reproducable on a window or
linux machine. I found the bug when trying to find a
problem on python 2.4.3 ported to the xbox.
running the next two commands
test_string = 'encode me'
print repr(test_string.encode('mbcs'))
results on windows in : 'encode me'
and on the xbox : 'encode me\\x00'
The problem is that 'PyUnicode_EncodeMBCS' returns an
PyStringObject that contains the data 'encode me' but
with an object size of 10.
string_repr(test_string) assumes the string contains
a 0 character and encodes it as '\\x00'
looking at the function 'PyUnicode_EncodeMBCS(const
Py_UNICODE *p, int size, const char *errors)' there
are basicly two functions
{
mbcssize = WideCharToMultiByte(CP_ACP, 0, p, size,
NULL, 0, NULL, NULL);
repr = PyString_FromStringAndSize(NULL, mbcssize);
}
WideCharToMultiByte returns the nummer of bytes
needed for the buffer, because of the string
termination this functions returns 10.
PyString_FromStringAndSize assumes its second
argument to be the number of needed characters, not
bytes. So an easy fix would be
to change
repr = PyString_FromStringAndSize(NULL, mbcssize);
in
repr = PyString_FromStringAndSize(NULL, mbcssize -
1);
Just checked the 2.4.3 svn trunk and it contains the
same bug.
----------------------------------------------------------------------
Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2006-08-02 14:31
Message:
Logged In: YES
user_id=1200846
I think this is not related to that patch.
On my win2000sp4, teminating null character is not passed to
PyUnicode_EncodeMBCS.
//////////////////////////////////////////////
// patch for debug (release24-maint branch)
Index: Objects/unicodeobject.c
===================================================================
--- Objects/unicodeobject.c (revision 51033)
+++ Objects/unicodeobject.c (working copy)
@@ -2782,6 +2782,20 @@
char *s;
DWORD mbcssize;
+{ /* debug */
+
+ int i;
+
+ printf("------------> %d\n", size);
+
+ for (i = 0; i < size; ++i) {
+ printf("%d ", (int)p[i]);
+ }
+
+ printf("\n");
+
+} /* debug */
+
/* If there are no characters, bail now! */
if (size==0)
return PyString_FromString("");
//////////////////////////////////
// a.py
test_string = 'encode me'
print repr(test_string.encode('mbcs'))
//////////////////////////////////
// result
R:\>py a.py
------------> 9
101 110 99 111 100 101 32 109 101
'encode me'
[7660 refs]
And I tried this.
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
void count(LPCWSTR w, int size)
{
char *buf; int i;
const int ret = ::WideCharToMultiByte(
CP_ACP,
0,
w,
size,
NULL,
0,
NULL,
NULL
);
if (ret == 0)
{
printf("error\n");
}
else
{
printf("%d\n", ret);
}
buf = (char*)malloc(ret);
::WideCharToMultiByte(
CP_ACP,
0,
w,
size,
buf,
ret,
NULL,
NULL
);
for (i = 0; i < ret; ++i)
{
printf("%d ", (int)buf[i]);
}
printf("\n");
free(buf);
}
int main()
{
count(L"encode me", 9);
count(L"encode me", 10); /* include null charater */
}
/*
9
101 110 99 111 100 101 32 109 101
10
101 110 99 111 100 101 32 109 101 0
*/
As stated in
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp
, WideCharToMultiByte never output null character if source
string doesn't contain null character. So I think usage of
WideCharToMultiByte is correct.
I don't know why, but probably some behavior difference
should exist between win2000 and xbox. (ie: xbox calls
PyUnicode_EncodeMBCS with size 10 ... or WideCharToMultiByte
on xbox outputs null character even if source string doesn't
contain it?)
Can you try above C code and debug patch on xbox?
----------------------------------------------------------------------
Comment By: Jan-Willem (jwnmulder)
Date: 2006-08-02 06:30
Message:
Logged In: YES
user_id=770969
related to patch 1455898 ?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1532726&group_id=5470
More information about the Python-bugs-list
mailing list