[XML-SIG] Character encodings and expat
Mon, 30 Oct 2000 09:52:58 +0100
"Frank J.S. Chen" wrote:
> > >
> > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted
> > > > effort afoot to make a more complete set? At the very least,
> > > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
> > >
> > Sure would be nice... the only problem I see is that the
> > different codecs for the Asian scripts will most probably
> > behave differently, e.g. there are many issues with private
> > code point areas in Unicode and the various Asian encodings.
> For now, all CJK Unicode characters reside in Basic Multilingual
> Plane(Plane 0).
> It seems no need to consider surrogate area or private use area right now.
But there is a private use area in the BMP as well... and if you
plan to write round-trip safe codecs for corporate character sets,
then you'll have to use these to make the transfer safe.
> What we need is indeed a transcoding interface to convert different locales
> to UTF-8/UTF-16 and then back.
I not sure I understand you here: there are quite a few codecs
available in the std Python lib which are readily usable and
the locale.py module has a database of many default encodings
for the various locales.
Python Pages: http://www.lemburg.com/python/