[XML-SIG] python, xml, html tags

Tue Mar 29 16:45:59 CEST 2005

On Mon, 2005-03-28 at 21:01 +0300, Necati DEMiR wrote:
> Hi,
> I can't do something with Python and XML.
> 
> i have the following file;
> 
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>  <test>
>   <content> Hello </content>
>   <content> <b> Hello </b> </content>
>  </test>
> 
> Ok. it is simple :)
> 
> And i have the following python codes;
> 
> #!/usr/bin/python
> from xml.dom import minidom
> 
> file = open("test.xml","r")
> xml = minidom.parse(file)
> print xml.childNodes[0].getElementsByTagName("content")[0].firstChild.data
> print xml.childNodes[0].getElementsByTagName("content")[1].firstChild.data
> 
> Again simple one :)
> 
> But when i run these codes, i have the following output;
> Hello
> 
> How can i access the second one.

DOM is not very good for this sort of thing.  You could do:

print xml.getElementsByTagName("content")[0].firstChild.data
print xml.getElementsByTagName("content")[1].getElementsByTagName
("b").firstChild.data

But that's silly :-)

More useful thoughts below...

> Yes, i know it contains html tags so it 
> doesn't give me the result.

Your b element happens to have the same name as one used in HTML, but
that doesn't really make it an HTML tag.  In this case, it's clearly an
XML tag.

> I wanna get whole of the content as data. 
> How can i do this?

Use something like the string_value function, listing 5 of the following
article:

http://www.xml.com/pub/a/2003/01/08/py-xml.html

Or use something with XPath support, which makes this easy.  Using Amara
( http://www.xml.com/pub/a/2005/01/19/amara.html ), your code would be

from amara import binderytools
doc = binderytools.bind_file("test.xml")
print doc.xml_xpath(u'string(//content[1])')
print doc.xml_xpath(u'string(//content[2])')

Which prints

 Hello
  Hello

The string XPath function gets all text nodes, even within contained
elements.  Notice that XPath uses 1 as the first index, while Python
uses 0.

-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
State of the art in XML modeling - http://www.ibm.com/developerworks/xml/library/x-think30.html