[XML-SIG] PyExpat encoding (was: XML support in Python 1.6)
Andrew M. Kuchling
akuchlin@mems-exchange.org
Thu, 1 Jun 2000 16:04:16 -0400
On Thu, Jun 01, 2000 at 12:56:28PM -0700, Greg Stein wrote:
>IMO, we should have a fixed output format, which is the Expat default:
>UTF-8.
I don't know; it seems a bit odd to parse a Unicode string and then
have to convert from an 8-bit encoding back to Unicode in your
character data handlers, attributes, etc. The problem is that it's
also odd to parse a regular Python string and get back Unicode.
OTOH, if Latin1-encoded XML has something like <!ENTITY unichar
޴> &unichar; in it, Unicode is the only thing it could possibly
return. Maybe PyExpat could attempt to convert its Unicode output
into an 8-bit string (but using what encoding?), and only return
Unicode if it has to.
Hmmm... on the third hand, XML is a Unicode based standard, and
sometimes returning Unicode and sometimes an 8-bit string is also
strange. Maybe it's best to just always return Unicode, and leave
further conversion to the caller.
I think I'd go for the third option: always returning Unicode strings.
--
A.M. Kuchling http://starship.python.net/crew/amk/
I was somebody else once. I... I... don't think I was a very good person.
-- The detective in THE MYSTERY PLAY