[XML-SIG] Character encodings and expat

M.-A. Lemburg mal@lemburg.com
Mon, 30 Oct 2000 09:52:58 +0100

"Frank J.S. Chen" wrote:
> > >
> > > > That's only Shift-JIS and EUC-JP, though.  Is there any concerted
> > > > effort afoot to make a more complete set?  At the very least,
> > > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
> > >
> >
> > Sure would be nice... the only problem I see is that the
> > different codecs for the Asian scripts will most probably
> > behave differently, e.g. there are many issues with private
> > code point areas in Unicode and the various Asian encodings.
> For now, all CJK Unicode characters reside in Basic Multilingual
> Plane(Plane 0).
> It seems no need to consider surrogate area or private use area right now.

But there is a private use area in the BMP as well... and if you
plan to write round-trip safe codecs for corporate character sets,
then you'll have to use these to make the transfer safe.

> What we need is indeed a transcoding interface to convert different locales
> to UTF-8/UTF-16 and then back.

I not sure I understand you here: there are quite a few codecs
available in the std Python lib which are readily usable and
the locale.py module has a database of many default encodings
for the various locales.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/