[XML-SIG] python, xml, html tags

Tue Mar 29 18:40:32 CEST 2005

On Mon, 2005-03-28 at 21:01 +0300, Necati DEMiR wrote:
>Hi,
>I can't do something with Python and XML.
>
>i have the following file;
>
><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <test>
>  <content> Hello </content>
>  <content> <b> Hello </b> </content>
> </test>

[...]

> Thanks. But i want the output as the following;
>
> Hello
> <b> Hello </b>

Try this (with d as your document object):

print d.getElementsByTagName("content")[0].childNodes[0].data
print d.getElementsByTagName("content")[1].childNodes[0].toxml()

The first statement just grabs the first "content" element and then grabs the 
first child node, which is a text node, and prints its value. You have to be 
a bit lucky to achieve this, for various odd reasons, and it can be a good 
idea to do this first:

d.getElementsByTagName("content")[0].normalize()

This makes sure that inside the first "content" element there is only one big 
text node rather than (potentially) more than one.

The second statement grabs the second "content" element and then grabs the 
first child node, which is the "b" element in this case, and then prints its 
contents. You also have to be lucky to achieve this since the "b" element 
might not be the first child node of the "content" element. Instead, you 
could try this:

print \
d.getElementsByTagName("content")[1].getElementsByTagName("b")[0].toxml()

This explicitly gets the first "b" element under the "content" element.

A few notes:

 * After parsing, start and end tags are not textual content. What you're
   really trying to get is the serialised form of a part of a document.

 * The toxml method, which gets the serialised form of a part of a document,
   is not really standard - minidom supports it, but the other DOM
   implementations probably don't. However, you could use a printing (or
   prettyprinting) function which understands more-or-less standard nodes to
   do the printing.

 * After a while, you'll start to appreciate things like XPath in cases like
   this. Unfortunately, convenient xpath methods have only just appeared in
   PyXML, but other DOM implementations also support them - this saves you
   from having to import xml.xpath and to call various functions, although
   it's not that hard.

Hope this helps!

Paul