Flexible Collating (feedback please)
Ron Adam
rrr at ronadam.com
Fri Oct 20 18:59:26 EDT 2006
Leo Kislov wrote:
> Ron Adam wrote:
>> Leo Kislov wrote:
>>> Ron Adam wrote:
>>>
>>>> locale.setlocale(locale.LC_ALL, '') # use current locale settings
>>> It's not current locale settings, it's user's locale settings.
>>> Application can actually use something else and you will overwrite
>>> that. You can also affect (unexpectedly to the application)
>>> time.strftime() and C extensions. So you should move this call into the
>>> _test() function and put explanation into the documentation that
>>> application should call locale.setlocale
>> I'll experiment with this a bit, I was under the impression that local.strxfrm
>> needed the locale set for it to work correctly.
>
> Actually locale.strxfrm and all other functions in locale module work
> as designed: they work in C locale before the first call to
> locale.setlocale. This is by design, call to locale.setlocale should be
> done by an application, not by a 3rd party module like your collation
> module.
Yes, I've come to that conclusion also. (reserching as I go) ;-)
I put an example of that in the class doc string so it could easily be found.
>> Maybe it would be better to have two (or more) versions? A string, unicode, and
>> locale version or maybe add an option to __init__ to choose the behavior?
>
> I don't think it should be two separate versions. Unicode support is
> only a matter of code like this:
>
> # in the constructor
> self.encoding = locale.getpreferredencoding()
>
> # class method
> def strxfrm(self, s):
> if type(s) is unicode:
> return locale.strxfrm(s.encode(self.encoding,'replace')
> return locale.strxfrm(s)
>
> and then instead of locale.strxfrm call self.strxfrm. And similar code
> for locale.atof
Thanks for the example.
>> This was the reason for using locale.strxfrm. It should let it work with unicode
>> strings from what I could figure out from the documents.
>>
>> Am I missing something?
>
> strxfrm works only with byte strings encoded in the system encoding.
>
> -- Leo
Windows has an alternative function, wcxfrm. (wide character transform)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_strxfrm.2c_.wcsxfrm.asp
But it's not exposed in Python. I could use ctypes to call it, but it would then
be windows specific and I doubt it would even work as expected.
Maybe a wcsxfrm patch would be good for Python 2.6? Python 3000 will probably
need it anyway.
I've made a few additional changes and will start a new thread after some more
testing to get some additional feedback.
Cheers,
Ron
More information about the Python-list
mailing list