Flexible Collating (feedback please)

Thu Oct 19 09:00:25 EDT 2006

Leo Kislov wrote:
> Ron Adam wrote:
> 
>> locale.setlocale(locale.LC_ALL, '')  # use current locale settings
> 
> It's not current locale settings, it's user's locale settings.
> Application can actually use something else and you will overwrite
> that. You can also affect (unexpectedly to the application)
> time.strftime() and C extensions. So you should move this call into the
> _test() function and put explanation into the documentation that
> application should call locale.setlocale

I'll experiment with this a bit, I was under the impression that local.strxfrm 
needed the locale set for it to work correctly.

Maybe it would be better to have two (or more) versions?  A string, unicode, and 
locale version or maybe add an option to __init__ to choose the behavior? 
Multiple versions seems to be the approach of pre-py3k.  Although I was trying 
to avoid that.

Sigh, of course issues like this is why it is better to have a module to do this 
with.  If it was as simple as just calling sort() I wouldn't have bothered. ;-)

>>          self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE)
> 
> [snip]
> 
>>          if NUMERICAL in self.flags:
>>              slist = self.numrex.split(s)
>>              for i, x in enumerate(slist):
>>                  try:
>>                      slist[i] = float(x)
>>                  except:
>>                      slist[i] = locale.strxfrm(x)
> 
> I think you should call locale.atof instead of float, since you call
> re.compile with re.LOCALE.

I think you are correct, but it seems locale.atof() is a *lot* slower than 
float(). :(

Here's the local.atof() code.

def atof(string,func=float):
     "Parses a string as a float according to the locale settings."
     #First, get rid of the grouping
     ts = localeconv()['thousands_sep']

     if ts:
         string = string.replace(ts, '')
     #next, replace the decimal point with a dot
     dd = localeconv()['decimal_point']
     if dd:
         string = string.replace(dd, '.')
     #finally, parse the string
     return func(string)

I could set ts and dd in __init__ and just do the replacements in the try...

         if NUMERICAL in self.flags:
             slist = self.numrex.split(s)
             for i, x in enumerate(slist):
                 if x:              # slist may contain null strings
                     if self.ts:
                         xx = x.replace(self.ts, '')   # remove thousands sep
                     if self.dd:
                         xx = xx.replace(self.dd, '.')  # replace decimal point		
                     try:
                         slist[i] = float(xx)
                     except:
                         slist[i] = locale.strxfrm(x)

How does that look?

It needs a fast way to determine if x is a number or a string.  Any suggestions?

> Everything else looks fine. The biggest missing piece is support for
> unicode strings.

This was the reason for using locale.strxfrm. It should let it work with unicode 
strings from what I could figure out from the documents.

Am I missing something?

Thanks,
   Ron