[XML-SIG] python, xml, html tags
Paul Boddie
paul at boddie.org.uk
Tue Mar 29 18:40:32 CEST 2005
On Mon, 2005-03-28 at 21:01 +0300, Necati DEMiR wrote:
>Hi,
>I can't do something with Python and XML.
>
>i have the following file;
>
><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <test>
> <content> Hello </content>
> <content> <b> Hello </b> </content>
> </test>
[...]
> Thanks. But i want the output as the following;
>
> Hello
> <b> Hello </b>
Try this (with d as your document object):
print d.getElementsByTagName("content")[0].childNodes[0].data
print d.getElementsByTagName("content")[1].childNodes[0].toxml()
The first statement just grabs the first "content" element and then grabs the
first child node, which is a text node, and prints its value. You have to be
a bit lucky to achieve this, for various odd reasons, and it can be a good
idea to do this first:
d.getElementsByTagName("content")[0].normalize()
This makes sure that inside the first "content" element there is only one big
text node rather than (potentially) more than one.
The second statement grabs the second "content" element and then grabs the
first child node, which is the "b" element in this case, and then prints its
contents. You also have to be lucky to achieve this since the "b" element
might not be the first child node of the "content" element. Instead, you
could try this:
print \
d.getElementsByTagName("content")[1].getElementsByTagName("b")[0].toxml()
This explicitly gets the first "b" element under the "content" element.
A few notes:
* After parsing, start and end tags are not textual content. What you're
really trying to get is the serialised form of a part of a document.
* The toxml method, which gets the serialised form of a part of a document,
is not really standard - minidom supports it, but the other DOM
implementations probably don't. However, you could use a printing (or
prettyprinting) function which understands more-or-less standard nodes to
do the printing.
* After a while, you'll start to appreciate things like XPath in cases like
this. Unfortunately, convenient xpath methods have only just appeared in
PyXML, but other DOM implementations also support them - this saves you
from having to import xml.xpath and to call various functions, although
it's not that hard.
Hope this helps!
Paul
More information about the XML-SIG
mailing list