[Python-Dev] Some thoughts on the codecs...

Andy Robinson andy@robanal.demon.co.uk
Tue, 16 Nov 1999 23:53:53 -0800 (PST)


--- Mark Hammond <mhammond@skippinet.com.au> wrote:
> Actually, I was thinking even more radically - drop
> the codec registry
> all together, and use modules with "well-known"
> names  (a slight
> precedent, but Python isnt adverse to well-known
> names in general)
> 
> eg:
> iso-8859-1.py:
> 
> import unicodec
> def encode(...):
>   ...
> def decode(...):
>   ...
> 
> iso-8859-2.py:
> from iso-8859-1 import *
> 
This is the simplest if each codec really is likely to
be implemented in a separate module.  But just look at
the data!  All the iso-8859 encodings need identical
functionality, and just have a different mapping table
with 256 elements.  It would be trivial to implement
these in one module.  And the wide variety of Japanese
encodings (mostly corporate or historical variants of
the same character set) are again best treated from
one code base with a bunch of mapping tables and
routines to generate the variants - basically one can
store the deltas.

So the choice is between possibly having a lot of
almost-dummy modules, or having Python modules which
generate and register a logical family of encodings.  

I may have some time next week and will try to code up
a few so we can pound on something.

- Andy



=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com