[XML-SIG] XML support in Python 1.6

Andrew M. Kuchling akuchlin@mems-exchange.org
Fri, 2 Jun 2000 10:05:26 -0400


On Thu, Jun 01, 2000 at 11:00:57PM -0500, Paul Prescod wrote:
>The PyXML distribution does this today and as far as I know it has been
>the least of our problems. Plus, we are going to have a dependency on a
>pretty new version of Expat. I just don't see what the big deal is.

GvR doesn't like including sizable chunks of outside code in the
Python distribution.  He doesn't care about what gets added to the
PyXML tree, of course.

If Expat had a call to identify what it was compiled to return, we
could handle all 4 cases:

	1) Expat returns UTF-8, PyExpat user wants UTF-8 regular strings
	2) Expat returns UTF-8, PyExpat user wants Unicode strings
	3) Expat returns UTF-16, PyExpat user wants UTF-8 regular strings
	4) Expat returns UTF-16, PyExpat user wants Unicode strings

In cases 1 and 3 no extra work is needed; in cases 2 and 4 the PyExpat
module would have to perform extra work and take a performance hit if
the system Expat library was compiled with the wrong output.  But if
GvR relents and allows incorporating Expat's code later, that copy
could then be compiled any way we like.

So, I propose we ask James Clark to add a C function to determine how
Expat was compiled, and then follow Paul's suggested interface:

parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"
parser.requestUTF8( ) causes the parser to return UTF-8-encoded 8-bit
strings; by default Unicode strings will be returned.

Three questions:
	* Can you call parser.requestUTF8() at any point, even after 
parsing has started?  (I see no reason to forbid this, though it would be strange.)

	* Do we need a .requestUTF16() or .requestUnicode() method to
switch things back?  Or should it be very general, with
.requestOutputEncoding('iso-8859-1' or whatever) instead of just
.requestUTF8? 

	* What do we assume for old versions of Expat?  I guess all we
can do is assume UTF-8, and trust that the strangeness will
be apparent if it was compiled for UTF-16.

If this is approved, I'll implement it this weekend.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
There are no gryphons, no wyverns, no winged horses in the waking world,
raven. Not anymore. But we are here...
  -- The gryphon at the door, in SANDMAN #57: "The Kindly Ones:1"