[XML-SIG] PyExpat encoding
Paul Prescod
paul@prescod.net
Thu, 01 Jun 2000 22:54:59 -0500
"Andrew M. Kuchling" wrote:
>
> ...
>
> On the other hand, that means you can't use the system's copy of
> Expat, since who knows what it was compiled with? Actually, this
> seems like a bug in Expat; if I have an Expat library, I have no way
> of figuring out what it'll be outputting:
Adding this feature doesn't sound too tough. We should concentrate on
what we want because the implementation doesn't sound too brutal.
I don't see how we can in good conscience choose not to use Python's
Unicode type. I am not averse, however, to a flag that returns 8-bit
strings instead. We can use the Unicode object's features do that
easily.
So how about, this: we ask Expat 1.1000000001 (our new version) what
encoding it was compiled with. We can even expose this to the Python
programmer.
parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"
There is an independent flag that controls the encoding and type of the
returned objects. You get Unicode objects by default. If you want 8-bit
strings, you specifically ask for them.
parser.requestUTF8( )
97% of programmers will never ask Expat what encoding it is using under
the cover nor will they change the flag to get 8-bit strings. Docs say:
"Unless you know what you are doing, leave these methods alone. They are
for performance freaks who know what they are doing only."
A performance freak would probably write code like this:
if parser.nativeEncoding()=="UTF-8":
parser.requestUTF8()
Now managing the internationalization of the code is their problem.
The Windows binaries should come with a 16-bit-returing Expat.
Still and all, this is getting more complex than just bundling our
favorite version of Expat with the compile flags set the way we want
them!!!
--
Paul Prescod - ISOGEN Consulting Engineer speaking for himself
Simplicity does not precede complexity, but follows it.
- http://www.cs.yale.edu/~perlis-alan/quotes.html