Flexible Collating (feedback please)
Ron Adam
rrr at ronadam.com
Thu Oct 19 09:00:25 EDT 2006
Leo Kislov wrote:
> Ron Adam wrote:
>
>> locale.setlocale(locale.LC_ALL, '') # use current locale settings
>
> It's not current locale settings, it's user's locale settings.
> Application can actually use something else and you will overwrite
> that. You can also affect (unexpectedly to the application)
> time.strftime() and C extensions. So you should move this call into the
> _test() function and put explanation into the documentation that
> application should call locale.setlocale
I'll experiment with this a bit, I was under the impression that local.strxfrm
needed the locale set for it to work correctly.
Maybe it would be better to have two (or more) versions? A string, unicode, and
locale version or maybe add an option to __init__ to choose the behavior?
Multiple versions seems to be the approach of pre-py3k. Although I was trying
to avoid that.
Sigh, of course issues like this is why it is better to have a module to do this
with. If it was as simple as just calling sort() I wouldn't have bothered. ;-)
>> self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE)
>
> [snip]
>
>> if NUMERICAL in self.flags:
>> slist = self.numrex.split(s)
>> for i, x in enumerate(slist):
>> try:
>> slist[i] = float(x)
>> except:
>> slist[i] = locale.strxfrm(x)
>
> I think you should call locale.atof instead of float, since you call
> re.compile with re.LOCALE.
I think you are correct, but it seems locale.atof() is a *lot* slower than
float(). :(
Here's the local.atof() code.
def atof(string,func=float):
"Parses a string as a float according to the locale settings."
#First, get rid of the grouping
ts = localeconv()['thousands_sep']
if ts:
string = string.replace(ts, '')
#next, replace the decimal point with a dot
dd = localeconv()['decimal_point']
if dd:
string = string.replace(dd, '.')
#finally, parse the string
return func(string)
I could set ts and dd in __init__ and just do the replacements in the try...
if NUMERICAL in self.flags:
slist = self.numrex.split(s)
for i, x in enumerate(slist):
if x: # slist may contain null strings
if self.ts:
xx = x.replace(self.ts, '') # remove thousands sep
if self.dd:
xx = xx.replace(self.dd, '.') # replace decimal point
try:
slist[i] = float(xx)
except:
slist[i] = locale.strxfrm(x)
How does that look?
It needs a fast way to determine if x is a number or a string. Any suggestions?
> Everything else looks fine. The biggest missing piece is support for
> unicode strings.
This was the reason for using locale.strxfrm. It should let it work with unicode
strings from what I could figure out from the documents.
Am I missing something?
Thanks,
Ron
More information about the Python-list
mailing list