elementtree XML() unicode

Tue Nov 3 22:02:24 EST 2009

On Nov 4, 1:06 pm, Kee Nethery <k... at kagi.com> wrote:
> On Nov 3, 2009, at 5:27 PM, John Machin wrote:
>
>
>
> > On Nov 4, 11:01 am, Kee Nethery <k... at kagi.com> wrote:

> >> Why is this not working and what do I need to do to use Elementtree
> >> with unicode?
>
> > What you need to do is NOT feed it unicode. You feed it a str object
> > and it gets decoded according to the encoding declaration found in the
> > first line.
>
> That it uses "the encoding declaration found in the first line" is the  
> nugget of data that is not in the documentation that has stymied me  
> for days. Thank you!

And under the "don't repeat" principle, it shouldn't be in the
Elementtree docs; it's nothing special about ET -- it's part of the
definition of an XML document (which for universal loss-free
transportability naturally must be encoded somehow, and the document
must state what its own encoding is (if it's not the default
(UTF-8))).

> The other thing that has been confusing is that I've been using "dump"  
> to view what is in the elementtree instance and the non-ASCII  
> characters have been displayed as "numbered  
> entities" (<city>柏市</city>) and I know that is not the  
> representation I want the data to be in. A co-worker suggested that  
> instead of "dump" that I use "et.tostring(theResponseXml,  
> encoding='utf-8')" and then print that to see the characters. That  
> process causes the non-ASCII characters to display as the glyphs I  
> know them to be.
>
> If there was a place in the official docs for me to append these  
> nuggets of information to the sections for  
> "xml.etree.ElementTree.XML(text)" and  
> "xml.etree.ElementTree.dump(elem)" I would absolutely do so.

I don't understand ... tostring() is in the same section as dump(),
about two screen-heights away. You want to include the tostring() docs
in the dump() docs? The usual idea is not to get bogged down in the
first function that looks at first glance like it might do what you
want ("look at the glyphs") but doesn't (it writes a (transportable)
XML stream) but press on to the next plausible candidate.