[Python-Dev] Some thoughts on the codecs...

M.-A. Lemburg mal@lemburg.com
Tue, 16 Nov 1999 14:15:19 +0100


Andy Robinson wrote:
> 
> --- "M.-A. Lemburg" <mal@lemburg.com> wrote:
> > So I can drop JIS ? [I won't be able to drop the
> > escaped unicode
> > codec because this is needed for u"" and ur"".]
> 
> Drop Japanese from the core language.

Done ... that one was easy ;-)
 
> JIS0208 is a big character set with three popular
> encodings (Shift-JIS, EUC-JP and JIS), and a host of
> slight variations; it has 6879 characters, and there
> are a range of options a user might need to set for it
> to be useful.  So let's assume for now this a separate
> package.  There's a good chance I'll do it but it is
> not a small job.  If you start statically linking in
> tables of 7000 characters for one Asian language,
> you'll have to do the lot.
> 
> As for the single-byte Latin ones, a prototype Python
> module could be whipped up in a couple of evenings,
> and a tiny C function which does single-byte to
> double-byte mappings and vice versa could make it
> fast.  We can have an extensible, data driven solution
> in no time without having to build it into the core.

Perhaps these helper function could be intergrated into
the core to avoid compilation when adding a new codec.

> The way I see it, to claim that python has i18n, a
> serious effort is needed to ensure every major
> encoding in the world is available to Python users.
> But that's separate to the core languages.  Your spec
> should only cover what is going to be hard-coded into
> Python.

Right.
 
> I'd like to see one paragraph in your spec stating
> that our architecture seperates the encodings
> themselves from the core language changes, and that
> getting them sorted is a logically separate (but
> important) project.  Ideally, we could put together a
> separate proposal for the encoding library itself and
> run it by some world class experts in that field, but
> after yours is done.

I've added:
All other encoding such as the CJK ones to support Asian scripts
should be implemented in seperate packages which do not get included
in the core Python distribution and are not a part of this proposal.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    45 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/