[XML-SIG] Character encodings and expat

M.-A. Lemburg mal@lemburg.com
Fri, 27 Oct 2000 12:07:37 +0200


Lars Marius Garshol wrote:
> 
> * Martin v. Loewis
> |
> | Once xmlproc is capable of producing Unicode, it will certainly
> | understand all encodings that the Python 2.0 encoding machinery knows
> | of; that includes "latin1".
> 
> Yup.  I plan to teach xmlproc the IANA registry, so that this should
> not be a problem with xmlproc.

You might want to have a look at the code in  encodings/aliases.py
It includes the aliasing "database" which the encodings package uses
to map encoding names to codec names.

If not all IANA names are included in this list, it would be
a good idea adding them...
 
> However, it is a problem that Python does not support any of the Far
> East encodings yet.  Does anyone know if there are any plans to change
> that?

Tamito KAJIYAMA has written a few Asian cocecs. These are not
high-performance, but fairly complete and also a great
example of how codecs package can be written. More about this
on the i18n-sig mailing list.
 
> | We should also strive for teaching expat to use the Python encoding
> | machinery, but that may be more difficult. Any volunteers?
> 
> I don't think it's really all that difficult.  It should be possible
> to use the Python codec system to produce utf-16, and then you feed
> this to expat and fix the encoding as "utf-16" in the call to
> ParserCreate.
> 
> The only possible stumbling block is when expat discovers an XML
> declaration that says something other than "utf-16"...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/