[XML-SIG] Strings or Unicode ?

Martin v. Loewis martin@v.loewis.de
Thu, 8 Nov 2001 19:33:50 +0100


> The question still remains, though: is this an acceptable approach 
> in practice ? (Converting Unicode back to strings has its cost and
> it might be worthwhile having the 8-bit string approach available
> too.)

That depends on the processing you want to do after parsing. In
general, I'd argue that not having to deal with encodings, but being
able to rely that everything is Unicode simplifies the application.
The only known drawback is that it may be difficult to restore the
original document: You may forget what the input encoding was,
and in what places character references had been used.

I personally don't consider this as a drawback: In most applications,
it is good thing if the application "normalizes" the XML documents,
since that reduces the hassles in later processing stages.

In the early days after Python 2, there was a desire to make Unicode
optional in PyXML, by propagating a "wants_unicode" flag throughout
the processing chain. Event though this is still supported in a number
of places, I doubt it is used much.

Regards,
Martin