[XML-SIG] PyExpat encoding

tpassin@home.com tpassin@home.com
Fri, 2 Jun 2000 08:46:58 -0400


I think this is a good approach.  In other words, the system does what you
usually want if you do nothing to tell it different; you can tell it to do
the other things you want if you need to; and you can find out the
configuration. Perfect.   Using native Python unicode objects also makes
sense.

The one potential downside - version skew because we might need to use a
special version of expat -  may not be too bad.  After all, you could make
sure that pyexpat always uses the copy of expat that lives in the Python
library.  And there is already a potential version issue with all the other
extensions (like tkinter) because they need to be compiled for the right
version of Python as well as the right version of their target c program.
I've been bit by this a few times.  So why would pyexpat/expat be different
in this regard than any other extension?

Tom Passin

Paul Prescod suggests:

> I don't see how we can in good conscience choose not to use Python's
> Unicode type. I am not averse, however, to a flag that returns 8-bit
> strings instead. We can use the Unicode object's features do that
> easily.
>
> So how about, this: we ask Expat 1.1000000001 (our new version) what
> encoding it was compiled with. We can even expose this to the Python
> programmer.
>
> parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"
>
> There is an independent flag that controls the encoding and type of the
> returned objects. You get Unicode objects by default. If you want 8-bit
> strings, you specifically ask for them.
>
> parser.requestUTF8( )
>
> 97% of programmers will never ask Expat what encoding it is using under
> the cover nor will they change the flag to get 8-bit strings. Docs say:
> "Unless you know what you are doing, leave these methods alone. They are
> for performance freaks who know what they are doing only."
>
> A performance freak would probably write code like this:
>
> if parser.nativeEncoding()=="UTF-8":
> parser.requestUTF8()
>
> Now managing the internationalization of the code is their problem.
>
> The Windows binaries should come with a 16-bit-returing Expat.
>
> Still and all, this is getting more complex than just bundling our
> favorite version of Expat with the compile flags set the way we want
> them!!!
>