[XML-SIG] Character encodings and expat
M.-A. Lemburg
mal@lemburg.com
Fri, 27 Oct 2000 12:07:37 +0200
Lars Marius Garshol wrote:
>
> * Martin v. Loewis
> |
> | Once xmlproc is capable of producing Unicode, it will certainly
> | understand all encodings that the Python 2.0 encoding machinery knows
> | of; that includes "latin1".
>
> Yup. I plan to teach xmlproc the IANA registry, so that this should
> not be a problem with xmlproc.
You might want to have a look at the code in encodings/aliases.py
It includes the aliasing "database" which the encodings package uses
to map encoding names to codec names.
If not all IANA names are included in this list, it would be
a good idea adding them...
> However, it is a problem that Python does not support any of the Far
> East encodings yet. Does anyone know if there are any plans to change
> that?
Tamito KAJIYAMA has written a few Asian cocecs. These are not
high-performance, but fairly complete and also a great
example of how codecs package can be written. More about this
on the i18n-sig mailing list.
> | We should also strive for teaching expat to use the Python encoding
> | machinery, but that may be more difficult. Any volunteers?
>
> I don't think it's really all that difficult. It should be possible
> to use the Python codec system to produce utf-16, and then you feed
> this to expat and fix the encoding as "utf-16" in the call to
> ParserCreate.
>
> The only possible stumbling block is when expat discovers an XML
> declaration that says something other than "utf-16"...
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/