[Tutor] trying to parse an xml file

Steven D'Aprano steve at pearwood.info
Sat Dec 14 23:22:09 CET 2013


On Sat, Dec 14, 2013 at 09:29:00AM -0500, bruce wrote:
> Hi.
> 
> Looking at a file -->>
> http://www.marquette.edu/mucentral/registrar/snapshot/fall13/xml/BIOL_bysubject.xml
> 
> The file is generated via online/web url, and appears to be XML.
> 
> However, when I use elementtree:
>   document = ElementTree.parse( '/apps/parseapp2/testxml.xml' )
> 
> I get an invalid error : not well-formed (invalid token):

I cannot reproduce that error. Perhaps you have inadvertently corrupted 
the file when downloading it? What did you use to download the file?

I used the wget command under Linux:

wget http://www.marquette.edu/mucentral/registrar/snapshot/fall13/xml/BIOL_bysubject.xml

And then I tried parsing it using ElementTree two different ways, both 
ways successfully with no errors:

py> import xml.etree.cElementTree as ET
py> tree = ET.ElementTree(file='BIOL_bysubject.xml')
py> root = tree.getroot()
py> for node in root:
...     print node.tag, node.attrib
...
STAMP {}
RECORD {}
RECORD {}
RECORD {}
[... snip lots more output for brevity ...]
py> tree = ET.parse('BIOL_bysubject.xml')
py> for node in tree.iter():
...     print node.tag, node.attrib
... 
[... snip even more output ...]


Both worked fine and gave no errors. I'm using Python 2.7. If you need 
additional help, I'm afraid that you're going to have to give more 
detail on what you actually did. Please show how you downloaded the 
file, what code you used to parse it, and the full error you receive. 
Copy and paste the entire traceback.



-- 
Steven


More information about the Tutor mailing list