[I18n-sig] Codecs for Big Five and GB 2312

M.-A. Lemburg mal@lemburg.com
Mon, 30 Oct 2000 11:43:34 +0100


Tamito KAJIYAMA wrote:
> 
> * Martin v. Loewis
> |
> | Installing into python2.0/encodings/{euc_jp,shift_jis,japanese}
> | doesn't look right to me - add-on packages should be capable of
> | installing into site-packages by default.
> 
> * M.-A. Lemburg
> |
> | The "right" way to install new codec packages is by placing them
> | inside a package which then registers a new search function in the
> | codec registry.
> |
> | Tamito's other does this AFAIR.
> |
> | To be able to use the codecs, a Python script must then import
> | the codecs package (which then registers the search function).
> 
> Beta versions of the Japanese codecs have been implemented as a
> usual add-on package, so applications need to import it before
> using a Japanese codec.  I had provided a module named codecs_ja
> which registers codecs for EUC-JP and Shift_JIS at a time.
> 
> The current version of the codecs has been implemented as a
> special "codecs" package that needs to be installed into
> lib/encodings as well as the standard encodings.
> 
> I think we need an agreement on how non-standard codecs should
> be installed.

They should be installed as separate package and then register
a search function which adds the included codecs to the
codec registry.

Lib/encodings should in all cases be left untouched. Installing
third party software directly into the standard lib directory
is bad practice and not really needed anymore now that we have
distutils.

If you don't want to bother with importing the codec packages
in your application, you can use the sitecustomize.py module
to do the imports at startup time.

Another possibly approach would be creating a new codec top
level package "sitecodecs" which is then used as pool for
all site specific codecs and also searched by the encodings search
function if present.
 
> I prefer the later approach.  I want Python to take care of all
> encoding issues, and if possible I want to write applications
> without considering which encodings can be handled at the core
> language level.  I hope that in the near future Python will
> support all encodings that have mappings from/to Unicode.  If an
> application requires an encoding that is not supported by Python
> at that time, then a LookupError raises; all the application
> needs to do is to catch that exception and to tell the user that
> the encoding is currently not supported.  I think this is not a
> problem, since it is automatically solved without any changes to
> the application once Python supports that encoding.

The standard distribution will probably not include the
Asian codecs -- just like it doesn't include all the other
goodies which people are fond of. Instead, Python distribution
packagers like ActivePython will ship versions of Python which
include these extra packages.

At least that's the idea behind keeping the Python core rather
small and maintainable.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/