[Python-Dev] Bug in PyLocale_strcoll
M.-A. Lemburg
mal at egenix.com
Mon Nov 22 09:17:44 CET 2004
Andreas Degert wrote:
> "M.-A. Lemburg" <mal at egenix.com> writes:
>
>
>>Aahz wrote:
>>
>>>On Sat, Nov 20, 2004, Andreas Degert wrote:
>>>
>>>
>>>>I think I found a bug in PyLocale_strcoll() (Python 2.3.4). When used
>>>>with 2 unicode strings, it converts them to wchar strings and uses
>>>>wcscoll. The bug is that the wchar strings are not 0-terminated.
>>>
>>>If you're sure this is a bug, please file on SF and report back the
>>>ID.
>>>(If you're not sure, what until you get confirmation from one of the
>>>Unicode experts and then file the bug. ;-)
>>
>>Please also check that the bug is still present in Python 2.4 and/or
>>CVS. We've corrected a bug in the PyUnicode_*WideChar*() APIs just
>>recently for Python 2.4.
>
>
> The off-by-one error fix in unicodeobject.c (2.228 -> 2.229) is
> correcting a buffer overflow, is just in the same piece of code.
>
> I didn't find a clear statement if the unicode string should be
> 0-terminated or not.
You're right: they are always 0-terminated just like 8-bit strings
and even though it doesn't seem to be necessary since Python
functions will always use the size field when working on
a Unicode object rather than rely on the 0-termination.
> In _PyUnicode_New it's 0-terminated, even in the
> case when it had to call unicode_resize (though there is a comment in
> unicode_resize "Ux0000 terminated -- XXX is this needed ?"). If these
> is the only place where unicode objects are created or modified, they
> seem to be always 0-terminated.
Right.
> wchar strings must be 0-terminated if they are to be used with the
> wcs* functions. So it's not a good idea to return a non-terminated
> string from PyUnicode_AsWideChar. If the unicode strings are always
> 0-terminated (the unicode buffer size is length+1), then we could just
> change
>
> if (size > PyUnicode_GET_SIZE(unicode))
> size = PyUnicode_GET_SIZE(unicode);
>
> to
>
> if (size > PyUnicode_GET_SIZE(unicode)+1)
> size = PyUnicode_GET_SIZE(unicode)+1;
>
> in PyUnicode_AsWideChar to get 0-terminated wchars.
>
> Ok... I'm still not sure if I should file a bug for PyLocale_strcoll
> or PyUnicode_AsWideChar and if the patch for the latter should assume
> that the unicode string buffer is 0-terminated...
I think it's probably wise to fix both:
Looking again, the patch we applied to PyUnicode_AsWideChar()
only fixes the 0-termination problem in the case where
HAVE_USABLE_WCHAR_T is set. This should be extended to
the memcpy() as well.
Still, if the buffer passed to PyUnicode_AsWideChar()
is not big enough, you won't get the 0-termination (due
to truncation), so PyLocale_strcoll() must be either very
careful to allocate a buffer that is always big enough
or apply 0-termination itself.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Nov 22 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list