[Python-Dev] Codecs and StreamCodecs

Thu, 18 Nov 1999 10:37:49 -0500

> The problem is that the encoding names are not Python identifiers,
> e.g. iso-8859-1 is allowed as identifier.

This is easily taken care of by translating each string of consecutive
non-identifier-characters to an underscore, so this would import the
iso_8859_1.py module.  (I also noticed in an earlier post that the
official name for Shift_JIS has an underscore, while most other
encodings use hyphens.)

> This and
> the fact that applications may want to ship their own codecs (which
> do not get installed under the system wide encodings package)
> make the registry necessary.

But it could be enough to register a package where to look for
encodings (in addition to the system package).

Or there could be a registry for encoding search functions.  (See the
import discussion.)

> I don't see a problem with the registry though -- the encodings
> package can take care of the registration process without any
> user interaction. There would only have to be an API for
> looking up an encoding published by the encodings package for
> the Unicode implementation to use. The magic behind that API
> is left to the encodings package...

I think that the collection of encodings will eventually grow large
enough to make it a requirement to avoid doing work proportional to
the number of supported encodings at startup (or even when an encoding
is referenced for the first time).  Any "lazy" mechanism (of which
module search is an example) will do.

> BTW, nothing's wrong with your idea :-) In fact, I like it
> a lot because it keeps the encoding modules out of the
> top-level scope which is good.

Yes.

> PS: we could probably even take the whole codec idea one step
> further and also allow other input/output formats to be registered,
> e.g. stream ciphers or pickle mechanisms. The step in that
> direction is not a big one: we'd only have to drop the specification
> of the Unicode object in the spec and replace it with an arbitrary
> object. Of course, this will still have to be a Unicode object
> for use by the Unicode implementation.

This is a step towards Java's architecture of stackable streams.

But I'm always in favor of tackling what we know we need before
tackling the most generalized version of the problem.

--Guido van Rossum (home page: http://www.python.org/~guido/)