[XML-SIG] Strings or Unicode ?

Martin v. Loewis martin@v.loewis.de
Fri, 9 Nov 2001 17:55:39 +0100


> One thing I'd be curious to know is whether it's common practice
> to restrict XML tag names and attribute names to Latin-1 or even
> ASCII... ? 

This is indeed common. xmlproc in PyXML 0.6 restricts those names to
latin-1, and I believe some of the DOM implementations may also react
"funny" when confronted with non-ASCII names. OTOH, pyexpat converts
everything to Unicode.

> I've used the approach of using Latin-1 8-bit strings if possible
> and reverting to Unicode objects for cases where this doesn't
> work. I've never seen an XML file with non-ASCII tag names, so I
> suppose the Unicode case is artificial.

It probably is. Following the Python convention, I'd suggest to use
byte strings only in the ASCII case, and convert non-ASCII Latin-1 to
Unicode. It will be simpler that way *if* you have Latin-1 element
names, since ASCII autoconverts, whereas full Latin-1 doesn't.

Regards,
Martin