"Not clear from your question whether your goal is to learn to parse XML in python<br>
or to solve a particular problem. If your goal is to learn python XML processing,<br>
then go right ahead -- however, it looks like you are using SAX below, and the sort<br>
of thing you describe might be done better using a DOM parser ( or maybe etree )" - It's a bit of both - learning XML parsing through solving a problem. I started with SAX because that's how the book I have does it.<br>
<br>I have looked up ElementTree and this looks like a much easier and much more elegant solution to my problem.<br><br>"Not that it can't be done in SAX -- it's just that, as you discovered, low level<br>
SAX parsing requires that you keep track of the containment hierarchy yourself,<br>
which is a lot of work to solve a simple problem." - I see now that I was doing a lot more work than I really needed to to accomplish my goal.<br><br>Thanks a lot Steve for the in-depth (from my perspective) explanation of all the solutions available to me. I appreciate the help.<br>
<br>Bryan<br><br><div class="gmail_quote">On Sat, Feb 7, 2009 at 2:18 AM, Steve Majewski <span dir="ltr"><<a href="mailto:sdm7g@mac.com">sdm7g@mac.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Not clear from your question whether your goal is to learn to parse XML in python<br>
or to solve a particular problem. If your goal is to learn python XML processing,<br>
then go right ahead -- however, it looks like you are using SAX below, and the sort<br>
of thing you describe might be done better using a DOM parser ( or maybe etree )<br>
<br>
If what you want is not just to select some info from the xml file, but to get it<br>
into a Python object so that you can then manipulate it further, then DOM or etree<br>
is also probably a better model. It will parse the XML ( likely using SAX underneath )<br>
and give you an object that encodes the whole file.<br>
<br>
[ Not that it can't be done in SAX -- it's just that, as you discovered, low level<br>
SAX parsing requires that you keep track of the containment hierarchy yourself,<br>
which is a lot of work to solve a simple problem. ]<br>
<br>
<br>
If you're just trying to work with XML, then most folks don't write XML parsers for<br>
that sort of thing, but use higher level tools: XSLT, XPATH and or XQUERY.<br>
<br>
The Mac has xsltproc as a built-in xslt (1.0) processor.<br>
There is a xpath program written in perl in Leopard/10.5. ( /usr/bin/xpath )<br>
And Saxon is easily downloaded and does xslt 2.0 and xquery 1.0 .<br>
<br>
<br>
The following XSLT 1.0 stylesheet:<br>
<br>
<?xml version="1.0" encoding="UTF-8"?><br>
<xsl:stylesheet xmlns:xsl="<a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a>" version="1.0"><br>
<xsl:output method="text"/><br>
<br>
<xsl:template match="/"><br>
<xsl:apply-templates select="/topalbums/album[@rank &lt; 6]"/><br>
<!-- just select the top 5 albums --><br>
</xsl:template><br>
<br>
<xsl:template match="/topalbums/album" ><br>
album: <xsl:value-of select="name"/><br>
artist: <xsl:value-of select="artist/name"/><br>
count=<xsl:value-of select="playcount"/><br>
<xsl:text><br>
</xsl:text> <!-- this is here to insert the blank line break --><br>
</xsl:template><br>
<br>
</xsl:stylesheet><br>
<br>
<br>
Will, when run on that file, produce this output:<br>
~$ xsltproc Untitled1.xsl topalbums.xml<br>
<br>
album: Vheissu<br>
artist: Thrice<br>
count=332<br>
<br>
album: The Artist in the Ambulance<br>
artist: Thrice<br>
count=289<br>
<br>
album: Appeal To Reason<br>
artist: Rise Against<br>
count=286<br>
<br>
album: Favourite Worst Nightmare<br>
artist: Arctic Monkeys<br>
count=210<br>
<br>
album: The Sufferer & The Witness<br>
artist: Rise Against<br>
count=206<br>
<br>
[ Not sure if that's anything like what you want. ]<br>
<br>
<br>
I'm sure that the whole thing would reduce to an even more concise XQuery request.<br>
<br>
I was trying to do the whole thing as an xpath one liner, but it didn't like<br>
my attempts to include alternates in parenthesis. I think this is an xpath 1.0<br>
vs. xpath 2.0 issue. Saxon is the only thing that supports 2.0. The perl, python<br>
and java libraries only support xpath 1.0.<br>
<br>
This sort of expression did work using xpath 2.0 (in oxygen editor):<br>
<br>
//album[@rank < 6]/(name|playcount|artist/name)<br>
<br>
But I couldn't figure out a 1.0 syntax that would grab all three fields.<br>
<br>
( and the perl xpath seems to have a bug that interprets '@rank < 6' as less-than-or-equal! )<br>
<br>
<br>
-- Steve Majewski<div><div></div><div class="Wj3C7c"><br>
<br>
<br>
<br>
On Feb 6, 2009, at 11:00 PM, Bryan Smith wrote:<br>
<br>
</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="Wj3C7c">
Hi everyone,<br>
<br>
I have another question I'm hoping someone would be kind enough to answer. I am new to parsing XML (not to mention much of Python itself) and I am trying to parse an XML file. The file I am trying to parse is this one: <a href="http://ws.audioscrobbler.com/2.0/user/bryansmith/topalbums.xml" target="_blank">http://ws.audioscrobbler.com/2.0/user/bryansmith/topalbums.xml</a>.<br>
<br>
So far, I have written up a class for parsing this file in my attempts to present to the user a list of top albums on their <a href="http://last.fm" target="_blank">last.fm</a> profile. If you note, the artist name and album name are both signified by the <name> tag which makes my job harder. If the tag names were different, I wouldn't have a problem. Listed below is the class I have written to parse the file. My question then is this: is there a way I can say something like "if tag_name == album name tag then....elif tag_name == artist name tag....". I hope this is clear.<br>
<br>
As it stands right now, if I parse this file and print the results, this is what I get (understandably) if I try to print out in the following fashion - album (playcount): Vheissu (332), Thrice (289), The Artist in the Ambulance (286), Thrice (210) and so on. Thrice is the artist name. I want to be able to differentiate between the "artist" name tag and the "album" name tag.<br>
<br>
<br>
Class as it stands right now:<br>
<br>
class GetTopAlbums(ContentHandler):<br>
<br>
in_album_tag = False<br>
in_playcount_tag = False<br>
<br>
def __init__(self, album, playcount):<br>
ContentHandler.__init__(self)<br>
self.album = album<br>
self.playcount = playcount<br>
self.data = []<br>
<br>
def startElement(self, tag_name, attr):<br>
if tag_name == "name":<br>
self.in_album_tag = True<br>
elif tag_name == "playcount":<br>
self.in_playcount_tag = True<br>
<br>
def endElement(self, tag_name):<br>
if tag_name == "name":<br>
content = "".join(self.data)<br>
self.data = []<br>
self.album.append(content)<br>
self.in_album_tag = False<br>
elif tag_name == "playcount":<br>
content = "".join(self.data)<br>
self.data = []<br>
self.playcount.append(content)<br>
self.in_playcount_tag = False<br>
<br>
def characters(self, string):<br>
if self.in_album_tag == True:<br>
self.data.append(string)<br>
elif self.in_playcount_tag == True:<br>
self.data.append(string)<br>
<br>
Thanks in advance!<br>
Bryan<br></div></div>
_______________________________________________<br>
Pythonmac-SIG maillist - <a href="mailto:Pythonmac-SIG@python.org" target="_blank">Pythonmac-SIG@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/pythonmac-sig" target="_blank">http://mail.python.org/mailman/listinfo/pythonmac-sig</a><br>
</blockquote>
<br>
</blockquote></div><br>