xml parsing escape characters

Luis P. Mendes luisXX_lupe2XX at netvisaoXX.pt
Wed Jan 19 15:02:18 EST 2005

Hash: SHA1


I only know a little bit of xml and I'm trying to parse a xml document
in order to save its elements in a file (dictionaries inside a list).

When I access a url from python 2.3.3 running in Linux with the
following lines:
	resposta = urllib.urlopen(url)
	xmldoc = minidom.parse(resposta)

I get the following result:

<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://www......"><DataSet>
~  <Order>
~    <Customer>439</Customer>
	(... others ...)
~  </Order>

In the lines below, I try to get all the child nodes from string, first
by counting them, and then ignoring the /n ones:

stringNode = xmldoc.childNodes[0]
print stringNode.toxml()
dataSetNode = stringNode.childNodes[0]
numNos = len(dataSetNode.childNodes)
for no in range(numNos):
	todosNos[no] = dataSetNode.childNodes[no].toxml()
	posicaoXml = [no for no in todosNos.keys() if len(todosNos[no])>4]
	print posicaoXml

(I'm almost sure there's a simpler way to do this...)

I don't get any elements.  But, if I access the same url via a browser,
the result in the browser window is something like:

<string xmlns="http://www......">
~  <DataSet>
~    <Order>
~      <Customer>439</Customer>
	(... others ...)
~    </Order>
~  </DataSet>

and the lines I posted work as intended.

I already browsed the web, I know it's about the escape characters, but
I didn't find a simple solution for this.

I tried to use LL2XML.py and unescape function with a simple replace
text = text.replace("<", "<")
but I had to convert the xml document to string and then I could not (or
don't know) how to convert it back to xml object.

How can I solve this?  Please, explain it having in mind that I'm just
beggining with Xml and I'm not very experienced in Python, too.

Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org


More information about the Python-list mailing list