[Python-bugs-list] [ python-Bugs-225476 ] Codec naming scheme and aliasing support

noreply@sourceforge.net noreply@sourceforge.net
Fri, 01 Mar 2002 14:39:15 -0800


Bugs item #225476, was opened at 2000-12-12 14:51
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=225476&group_id=5470

Category: Unicode
Group: Feature Request
Status: Open
Resolution: None
Priority: 3
Submitted By: M.-A. Lemburg (lemburg)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Codec naming scheme and aliasing support

Initial Comment:
The docs should contain a note about the codec naming scheme,
the use of codec packages and how to address them in the
encoding name and some notes about the aliasing support
which is available for codecs which are found by the standard
codec search function in the encodings package.

Here's a starter (actually a posting to python-dev, but it has all
the needed details):
"""
I just wanted to inform you of a change I plan for the standard
encodings search function to enable better support for aliasing
of encoding names.

The current implementation caches the aliases returned from the
codecs .getaliases() function in the encodings lookup cache
rather than in the alias cache. As a consequence, the hyphen to
underscore mapping is not applied to the aliases. A codec would
have to return a list of all combinations of names with hyphens
and underscores in order to emulate the standard lookup 
behaviour.

I have a ptach which fixes this and also assures that aliases
cannot be overwritten by codecs which register at some later
point in time. This assures that we won't run into situations
where a codec import suddenly overrides behaviour of previously
active codecs. [The patch was checked into CVS on 2000-12-12.]

I would also like to propose the use of a new naming scheme
for codecs which enables drop-in installation. As discussed
on the i18n-sig list, people would like to install codecs
without having the users to call a codec registration function
or to touch site.py.

The standard search function in the encodings package has a
nice property (which I only noticed after the fact ;) which
allows using Python package names in the encoding names,
e.g. you can install a package 'japanese' and the access the
codecs in that package using 'japanese.shiftjis' without
having to bother registering a new codec search function
for the package -- the encodings package search function
will redirect the lookup to the 'japanese' package.

Using package names in the encoding name has several
advantages:
* you know where the codec comes from
* you can have mutliple codecs for the same encoding
* drop-in installation without registration is possible
* the need for a non-default encoding package is visible in the
  source code
* you no longer need to drop new codecs into the Python
  standard lib

"""


----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-03-01 22:39

Message:
Logged In: YES 
user_id=31392

Is this a bug?  Or should you just write a PEP?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 10:48

Message:
Logged In: YES 
user_id=38388

This should probably become a PEP...

I'll look into this after I'm back from vacation on the 10.09.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=225476&group_id=5470