[I18n-sig] Re: [XML-SIG] Character encodings and expat

M.-A. Lemburg mal@lemburg.com
Mon, 30 Oct 2000 09:59:49 +0100

"Martin v. Loewis" wrote:
> > I don't use Tcl, Java or X11 and don't know what ICU
> > is, but I do use Python on several platforms and would
> > want to know that the encodings library worked
> > identically on all platforms - i.e. if there are bugs
> > in the codecs, they are consistent and can be fixed
> > consistently.  I think this issue was pretty much settled
> > in MAL's original i18n proposal.
> I sense a certain "reinvent the wheel" attitude here. Why do you
> assume that the codecs developed by somebody else will have bugs?
> While the "we know how character sets work" approach provides
> consistency across platforms, it doesn't provide consistency between
> applications on a single platform. I believe most users are more
> interested in that - they install some codec tables on their system,
> and then all applications recognize these codecs, whether written in C
> or Python.

I don't think that reinventing the wheel for the sake of
cross-platform compatibility is a bad thing. Besides, noone
prevents anyone from writing Python extensions to make the
system codecs available to Python. The problem we face with
these, though, is that they won't be available everywhere. 

The basic design decision we made for Unicode was to have it available
everywhere -- not only on platforms where Unicode is supported.
This includes a usable set of codecs for all common encodings.
The Asian codecs were just left out of the standard dist due
to size problems.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/