[Python-Dev] Bug in PyLocale_strcoll

Sat Nov 20 21:42:53 CET 2004

Hello,

I think I found a bug in PyLocale_strcoll() (Python 2.3.4). When used
with 2 unicode strings, it converts them to wchar strings and uses
wcscoll. The bug is that the wchar strings are not 0-terminated.
Checking with

assert(ws1[len1-1] == 0 && ws2[len2-1] == 0);

right before the line

result = PyInt_FromLong(wcscoll(ws1, ws2));

confirms the bug. I'm not quite sure what the best fix is.

PyUnicode_AsWideChar() copies the unicode chars, but not the
terminating 0-char of the unicode string (which is not used in python,
but its there anyhow, if I understand the implementation
correctly). So one fix would be to change PyUnicode_AsWideChar to copy
the terminating 0-char if there's enough space in the output
buffer. Another fix would be to terminate the strings in
PyLocale_strcoll() before using them:

----------------------------------------------------------

--- _localemodule.c~	Sat Nov 20 21:33:17 2004
+++ _localemodule.c	Sat Nov 20 21:35:04 2004
@@ -353,15 +353,19 @@
         PyErr_NoMemory();
         goto done;
     }
-    if (PyUnicode_AsWideChar((PyUnicodeObject*)os1, ws1, len1) == -1)
+    len1 = PyUnicode_AsWideChar((PyUnicodeObject*)os1, ws1, len1);
+    if (len1 == -1)
         goto done;
+    ws1[len1-1] = 0;
     ws2 = PyMem_MALLOC(len2 * sizeof(wchar_t));
     if (!ws2) {
         PyErr_NoMemory();
         goto done;
     }
-    if (PyUnicode_AsWideChar((PyUnicodeObject*)os2, ws2, len2) == -1)
+    len2 = PyUnicode_AsWideChar((PyUnicodeObject*)os2, ws2, len2);
+    if (len2 == -1)
         goto done;
+    ws2[len2-1] = 0;
     /* Collate the strings. */
     result = PyInt_FromLong(wcscoll(ws1, ws2));
   done:
----------------------------------------------------------

cheers
Andreas