[XML-SIG] PyExpat encoding
Thu, 01 Jun 2000 22:54:59 -0500
"Andrew M. Kuchling" wrote:
> On the other hand, that means you can't use the system's copy of
> Expat, since who knows what it was compiled with? Actually, this
> seems like a bug in Expat; if I have an Expat library, I have no way
> of figuring out what it'll be outputting:
Adding this feature doesn't sound too tough. We should concentrate on
what we want because the implementation doesn't sound too brutal.
I don't see how we can in good conscience choose not to use Python's
Unicode type. I am not averse, however, to a flag that returns 8-bit
strings instead. We can use the Unicode object's features do that
So how about, this: we ask Expat 1.1000000001 (our new version) what
encoding it was compiled with. We can even expose this to the Python
parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"
There is an independent flag that controls the encoding and type of the
returned objects. You get Unicode objects by default. If you want 8-bit
strings, you specifically ask for them.
97% of programmers will never ask Expat what encoding it is using under
the cover nor will they change the flag to get 8-bit strings. Docs say:
"Unless you know what you are doing, leave these methods alone. They are
for performance freaks who know what they are doing only."
A performance freak would probably write code like this:
Now managing the internationalization of the code is their problem.
The Windows binaries should come with a 16-bit-returing Expat.
Still and all, this is getting more complex than just bundling our
favorite version of Expat with the compile flags set the way we want
Paul Prescod - ISOGEN Consulting Engineer speaking for himself
Simplicity does not precede complexity, but follows it.