Flexible Collating (feedback please)
rrr at ronadam.com
Sat Oct 21 00:59:26 CEST 2006
Leo Kislov wrote:
> Ron Adam wrote:
>> Leo Kislov wrote:
>>> Ron Adam wrote:
>>>> locale.setlocale(locale.LC_ALL, '') # use current locale settings
>>> It's not current locale settings, it's user's locale settings.
>>> Application can actually use something else and you will overwrite
>>> that. You can also affect (unexpectedly to the application)
>>> time.strftime() and C extensions. So you should move this call into the
>>> _test() function and put explanation into the documentation that
>>> application should call locale.setlocale
>> I'll experiment with this a bit, I was under the impression that local.strxfrm
>> needed the locale set for it to work correctly.
> Actually locale.strxfrm and all other functions in locale module work
> as designed: they work in C locale before the first call to
> locale.setlocale. This is by design, call to locale.setlocale should be
> done by an application, not by a 3rd party module like your collation
Yes, I've come to that conclusion also. (reserching as I go) ;-)
I put an example of that in the class doc string so it could easily be found.
>> Maybe it would be better to have two (or more) versions? A string, unicode, and
>> locale version or maybe add an option to __init__ to choose the behavior?
> I don't think it should be two separate versions. Unicode support is
> only a matter of code like this:
> # in the constructor
> self.encoding = locale.getpreferredencoding()
> # class method
> def strxfrm(self, s):
> if type(s) is unicode:
> return locale.strxfrm(s.encode(self.encoding,'replace')
> return locale.strxfrm(s)
> and then instead of locale.strxfrm call self.strxfrm. And similar code
> for locale.atof
Thanks for the example.
>> This was the reason for using locale.strxfrm. It should let it work with unicode
>> strings from what I could figure out from the documents.
>> Am I missing something?
> strxfrm works only with byte strings encoded in the system encoding.
> -- Leo
Windows has an alternative function, wcxfrm. (wide character transform)
But it's not exposed in Python. I could use ctypes to call it, but it would then
be windows specific and I doubt it would even work as expected.
Maybe a wcsxfrm patch would be good for Python 2.6? Python 3000 will probably
need it anyway.
I've made a few additional changes and will start a new thread after some more
testing to get some additional feedback.
More information about the Python-list