Flexable Collating (feedback please)

Wed Oct 18 22:06:03 EDT 2006

At Wednesday 18/10/2006 21:36, Ron Adam wrote:

> >>          if self.flag & CAPS_FIRST:
> >>              s = s.swapcase()
> >
> > This is just coincidental; it relies on (lowercase)<(uppercase) on the
> > locale collating sequence, and I don't see why it should be always so.
>
>The LC_COLLATE structure (in the python.exe C code I think) controls 
>the order
>of upper and lower case during collating.  I don't know if there is anyway to
>examine it unfortunately.

LC_COLLATE is just a #define'd constant. I don't know how to examine 
the collating definition, either.

>If there was a way to change the LC_COLLATE structure, I wouldn't 
>need to resort
>to tricks like s.swapcase().  But without that info, I don't know of 
>another way.
>
>Maybe changing the CAPS_FIRST to REVERSE_CAPS_ORDER would do?

At least it's a more accurate name.
There is an indirect way: test locale.strcoll("A","a") and see how 
they get sorted. Then define options CAPS_FIRST, LOWER_FIRST 
accordingly. But maybe it's too much trouble...

> > You should try to make this part a bit more generic. If you are
> > concerned about locales, do not use "comma" explicitely. In other
> > countries 10*100=1.000 - and 1,234 is a fraction between 1 and 2.
>
>See the most recent version of this I posted.  It is a bit more generic.
>
>        news://news.cox.net:119/PNxZg.6714$fl.4591@dukeread08
>
>Maybe a 'comma_is_decimal' option?

I'd prefer to use the 'decimal_point' and 'thousands_sep' from the 
locale information. That would be more coherent with the locale usage 
along your module.

>Options are cheep so it's no problem to add them as long as they 
>make sense. ;-)
>
>These options are what I refer to as mid-level options.  The programmer does
>still need to know something about the data they are 
>collating.  They may still
>need to do some preprocessing even with this, but maybe not as much.
>
>In a higher level collation routine, I think you would just need to specify a
>named sort type, such as 'dictionary', 'directory', 'enventory' and 
>it would set
>the options and accordingly.  The problem with that approach is the 
>higher level
>definitions may be different depending on locale or even the field 
>it is used in.

Sure. But your module is a good starting point for building a more 
high-level procedure.

> >>      The NUMERICAL option orders leading and trailing digits as numerals.
> >>
> >>          >>> t = ['a5', 'a40', '4abc', '20abc', 'a10.2', '13.5b', 'b2']
> >>          >>> collated(t, NUMERICAL)
> >>          ['4abc', '13.5b', '20abc', 'a5', 'a10.2', 'a40', 'b2']
> >
> >  From the name "NUMERICAL" I would expect this sorting: b2, 4abc, a5,
> > a10.2, 13.5b, 20abc, a40 (that is, sorting as numbers only).
> > Maybe GROUP_NUMBERS... but I dont like that too much either...
>
>How about 'VALUE_ORDERING' ?
>
>The term I've seen before is called natural ordering, but that is 
>more general
>and can include date, roman numerals, as well as other type.

Sometimes that's the hard part, finding a name which is concise, 
descriptive, and accurately reflects what the code does. A good name 
should make obvious what it is used for (being these option names, or 
class names, or method names...) but in this case it may be difficult 
to find a good one. So users will have to read the documentation (a 
good thing, anyway!)

-- 
Gabriel Genellina
Softlab SRL 

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas