Undeterministic strxfrm?

Tuomas tuomas.vesterinen at pp.inet.fi
Tue Sep 4 21:18:05 EDT 2007


Peter Otten wrote:
> Python seems to be the culprit as there is a relatively recent
> strxfrm-related bugfix, see

Thanks Peter. Can't find it, do you have the issue number?

> http://svn.python.org/view/python/trunk/Modules/_localemodule.c?rev=54669
> 
> If I understand it correctly the error makes it likely that the resulting
> string has trailing garbage characters.

Reading the rev 54669 it seems to me, that the bug is not fixed. Man says:

STRXFRM(3): ... size_t strxfrm(char *dest, const char *src, size_t n);
... The first n characters of  the  transformed  string
are  placed in dest.  The transformation is based on the program’s 
current locale for category LC_COLLATE.
... The strxfrm() function returns the number of bytes required to 
store  the transformed  string  in dest excluding the terminating ‘\0’ 
character.  If the value returned is n or more, the contents of dest are 
*indeterminate*.

Accordin the man pages Python should know the size of the result it 
expects and don't trust the size strxfrm returns. I don't completely 
understand the collate algorithm, but it should offer different levels 
of collate. So Python too, should offer those levels as a second 
parameter. Hovever strxfrm don't offer more parameters either except 
there is another function strcasecmp. So Python should be able to 
calculate the expected size before calling strxfrm or strcasecmp. I 
don't how it is possible. May be strcoll knows better and I should kick 
strxfrm off and take strcoll instead. It costs converting the seach key 
in every step of the search.

Tuomas

> Peter



More information about the Python-list mailing list