Flexable Collating (feedback please)
gagsl-py at yahoo.com.ar
Thu Oct 19 04:06:03 CEST 2006
At Wednesday 18/10/2006 21:36, Ron Adam wrote:
> >> if self.flag & CAPS_FIRST:
> >> s = s.swapcase()
> > This is just coincidental; it relies on (lowercase)<(uppercase) on the
> > locale collating sequence, and I don't see why it should be always so.
>The LC_COLLATE structure (in the python.exe C code I think) controls
>of upper and lower case during collating. I don't know if there is anyway to
>examine it unfortunately.
LC_COLLATE is just a #define'd constant. I don't know how to examine
the collating definition, either.
>If there was a way to change the LC_COLLATE structure, I wouldn't
>need to resort
>to tricks like s.swapcase(). But without that info, I don't know of
>Maybe changing the CAPS_FIRST to REVERSE_CAPS_ORDER would do?
At least it's a more accurate name.
There is an indirect way: test locale.strcoll("A","a") and see how
they get sorted. Then define options CAPS_FIRST, LOWER_FIRST
accordingly. But maybe it's too much trouble...
> > You should try to make this part a bit more generic. If you are
> > concerned about locales, do not use "comma" explicitely. In other
> > countries 10*100=1.000 - and 1,234 is a fraction between 1 and 2.
>See the most recent version of this I posted. It is a bit more generic.
>Maybe a 'comma_is_decimal' option?
I'd prefer to use the 'decimal_point' and 'thousands_sep' from the
locale information. That would be more coherent with the locale usage
along your module.
>Options are cheep so it's no problem to add them as long as they
>make sense. ;-)
>These options are what I refer to as mid-level options. The programmer does
>still need to know something about the data they are
>collating. They may still
>need to do some preprocessing even with this, but maybe not as much.
>In a higher level collation routine, I think you would just need to specify a
>named sort type, such as 'dictionary', 'directory', 'enventory' and
>it would set
>the options and accordingly. The problem with that approach is the
>definitions may be different depending on locale or even the field
>it is used in.
Sure. But your module is a good starting point for building a more
> >> The NUMERICAL option orders leading and trailing digits as numerals.
> >> >>> t = ['a5', 'a40', '4abc', '20abc', 'a10.2', '13.5b', 'b2']
> >> >>> collated(t, NUMERICAL)
> >> ['4abc', '13.5b', '20abc', 'a5', 'a10.2', 'a40', 'b2']
> > From the name "NUMERICAL" I would expect this sorting: b2, 4abc, a5,
> > a10.2, 13.5b, 20abc, a40 (that is, sorting as numbers only).
> > Maybe GROUP_NUMBERS... but I dont like that too much either...
>How about 'VALUE_ORDERING' ?
>The term I've seen before is called natural ordering, but that is
>and can include date, roman numerals, as well as other type.
Sometimes that's the hard part, finding a name which is concise,
descriptive, and accurately reflects what the code does. A good name
should make obvious what it is used for (being these option names, or
class names, or method names...) but in this case it may be difficult
to find a good one. So users will have to read the documentation (a
good thing, anyway!)
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
More information about the Python-list