&quot;Not clear from your question whether your goal is to learn to parse XML in python<br>

or to solve a particular problem. If your goal is to learn python XML processing,<br>

then go right ahead -- however, it looks like you are using SAX below, and the sort<br>

of thing you describe might be done better using a DOM parser ( or maybe etree )&quot; - It&#39;s a bit of both - learning XML parsing through solving a problem. I started with SAX because that&#39;s how the book I have does it.<br>

<br>I have looked up ElementTree and this looks like a much easier and much more elegant solution to my problem.<br><br>&quot;Not that it can&#39;t be done in SAX -- it&#39;s just that, as you discovered, low level<br>

 &nbsp;SAX parsing requires that you keep track of the containment hierarchy yourself,<br>

 &nbsp;which is a lot of work to solve a simple problem.&quot; - I see now that I was doing a lot more work than I really needed to to accomplish my goal.<br><br>Thanks a lot Steve for the in-depth (from my perspective) explanation of all the solutions available to me. I appreciate the help.<br>

<br>Bryan<br><br><div class="gmail_quote">On Sat, Feb 7, 2009 at 2:18 AM, Steve Majewski <span dir="ltr">&lt;<a href="mailto:sdm7g@mac.com">sdm7g@mac.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Not clear from your question whether your goal is to learn to parse XML in python<br>

or to solve a particular problem. If your goal is to learn python XML processing,<br>

then go right ahead -- however, it looks like you are using SAX below, and the sort<br>

of thing you describe might be done better using a DOM parser ( or maybe etree )<br>

<br>

If what you want is not just to select some info from the xml file, but to get it<br>

into a Python object so that you can then manipulate it further, then DOM or etree<br>

is also probably a better model. It will parse the XML ( likely using SAX underneath )<br>

and give you an object that encodes the whole file.<br>

<br>

[ Not that it can&#39;t be done in SAX -- it&#39;s just that, as you discovered, low level<br>

 &nbsp;SAX parsing requires that you keep track of the containment hierarchy yourself,<br>

 &nbsp;which is a lot of work to solve a simple problem. ]<br>

<br>

<br>

If you&#39;re just trying to work with XML, then most folks don&#39;t write XML parsers for<br>

that sort of thing, but use higher level tools: XSLT, XPATH and or XQUERY.<br>

<br>

The Mac has xsltproc as a built-in xslt (1.0) processor.<br>

There is a xpath program written in perl in Leopard/10.5. ( /usr/bin/xpath )<br>

And Saxon is easily downloaded and does xslt 2.0 and xquery 1.0 .<br>

<br>

<br>

The following XSLT 1.0 stylesheet:<br>

<br>

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;<br>

&lt;xsl:stylesheet xmlns:xsl=&quot;<a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a>&quot; version=&quot;1.0&quot;&gt;<br>

&lt;xsl:output method=&quot;text&quot;/&gt;<br>

<br>

&lt;xsl:template match=&quot;/&quot;&gt;<br>

 &nbsp; &nbsp;&lt;xsl:apply-templates select=&quot;/topalbums/album[@rank &amp;lt; 6]&quot;/&gt;<br>

 &nbsp; &nbsp;&lt;!-- just select the top 5 albums --&gt;<br>

&lt;/xsl:template&gt;<br>

<br>

&lt;xsl:template match=&quot;/topalbums/album&quot; &gt;<br>

 &nbsp;album: &lt;xsl:value-of select=&quot;name&quot;/&gt;<br>

 &nbsp;artist: &lt;xsl:value-of select=&quot;artist/name&quot;/&gt;<br>

 &nbsp;count=&lt;xsl:value-of select=&quot;playcount&quot;/&gt;<br>

 &nbsp;&lt;xsl:text&gt;<br>

 &nbsp;&lt;/xsl:text&gt; &lt;!-- this is here to insert the blank line break --&gt;<br>

&lt;/xsl:template&gt;<br>

<br>

&lt;/xsl:stylesheet&gt;<br>

<br>

<br>

Will, when run on that file, produce this output:<br>

~$ xsltproc Untitled1.xsl &nbsp;topalbums.xml<br>

<br>

 &nbsp;album: Vheissu<br>

 &nbsp;artist: Thrice<br>

 &nbsp;count=332<br>

<br>

 &nbsp;album: The Artist in the Ambulance<br>

 &nbsp;artist: Thrice<br>

 &nbsp;count=289<br>

<br>

 &nbsp;album: Appeal To Reason<br>

 &nbsp;artist: Rise Against<br>

 &nbsp;count=286<br>

<br>

 &nbsp;album: Favourite Worst Nightmare<br>

 &nbsp;artist: Arctic Monkeys<br>

 &nbsp;count=210<br>

<br>

 &nbsp;album: The Sufferer &amp; The Witness<br>

 &nbsp;artist: Rise Against<br>

 &nbsp;count=206<br>

<br>

[ Not sure if that&#39;s anything like what you want. ]<br>

<br>

<br>

I&#39;m sure that the whole thing would reduce to an even more concise XQuery request.<br>

<br>

I was trying to do the whole thing as an xpath one liner, but it didn&#39;t like<br>

my attempts to include alternates in parenthesis. I think this is an xpath 1.0<br>

vs. xpath 2.0 issue. Saxon is the only thing that supports 2.0. The perl, python<br>

and java libraries only support xpath 1.0.<br>

<br>

This sort of expression did work using xpath 2.0 (in oxygen editor):<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp;//album[@rank &lt; 6]/(name|playcount|artist/name)<br>

<br>

But I couldn&#39;t figure out a 1.0 syntax that would grab all three fields.<br>

<br>

( and the perl xpath seems to have a bug that interprets &#39;@rank &lt; 6&#39; as less-than-or-equal! )<br>

<br>

<br>

-- Steve Majewski<div><div></div><div class="Wj3C7c"><br>

<br>

<br>

<br>

On Feb 6, 2009, at 11:00 PM, Bryan Smith wrote:<br>

<br>

</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="Wj3C7c">

Hi everyone,<br>

<br>

I have another question I&#39;m hoping someone would be kind enough to answer. I am new to parsing XML (not to mention much of Python itself) and I am trying to parse an XML file. The file I am trying to parse is this one: <a href="http://ws.audioscrobbler.com/2.0/user/bryansmith/topalbums.xml" target="_blank">http://ws.audioscrobbler.com/2.0/user/bryansmith/topalbums.xml</a>.<br>


<br>

So far, I have written up a class for parsing this file in my attempts to present to the user a list of top albums on their <a href="http://last.fm" target="_blank">last.fm</a> profile. If you note, the artist name and album name are both signified by the &lt;name&gt; tag which makes my job harder. If the tag names were different, I wouldn&#39;t have a problem. Listed below is the class I have written to parse the file. My question then is this: is there a way I can say something like &quot;if tag_name == album name tag then....elif tag_name == artist name tag....&quot;. I hope this is clear.<br>


<br>

As it stands right now, if I parse this file and print the results, this is what I get (understandably) if I try to print out in the following fashion - album (playcount): Vheissu (332), Thrice (289), The Artist in the Ambulance (286), Thrice (210) and so on. Thrice is the artist name. I want to be able to differentiate between the &quot;artist&quot; name tag and the &quot;album&quot; name tag.<br>


<br>

<br>

Class as it stands right now:<br>

<br>

class GetTopAlbums(ContentHandler):<br>

<br>

 &nbsp; &nbsp;in_album_tag = False<br>

 &nbsp; &nbsp;in_playcount_tag = False<br>

<br>

 &nbsp; &nbsp;def __init__(self, album, playcount):<br>

 &nbsp; &nbsp; &nbsp; &nbsp;ContentHandler.__init__(self)<br>

 &nbsp; &nbsp; &nbsp; &nbsp;self.album = album<br>

 &nbsp; &nbsp; &nbsp; &nbsp;self.playcount = playcount<br>

 &nbsp; &nbsp; &nbsp; &nbsp;self.data = []<br>

<br>

 &nbsp; &nbsp;def startElement(self, tag_name, attr):<br>

 &nbsp; &nbsp; &nbsp; &nbsp;if tag_name == &quot;name&quot;:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.in_album_tag = True<br>

 &nbsp; &nbsp; &nbsp; &nbsp;elif tag_name == &quot;playcount&quot;:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.in_playcount_tag = True<br>

<br>

 &nbsp; &nbsp;def endElement(self, tag_name):<br>

 &nbsp; &nbsp; &nbsp; &nbsp;if tag_name == &quot;name&quot;:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;content = &quot;&quot;.join(self.data)<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.data = []<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.album.append(content)<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.in_album_tag = False<br>

 &nbsp; &nbsp; &nbsp; &nbsp;elif tag_name == &quot;playcount&quot;:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;content = &quot;&quot;.join(self.data)<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.data = []<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.playcount.append(content)<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.in_playcount_tag = False<br>

<br>

 &nbsp; &nbsp;def characters(self, string):<br>

 &nbsp; &nbsp; &nbsp; &nbsp;if self.in_album_tag == True:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.data.append(string)<br>

 &nbsp; &nbsp; &nbsp; &nbsp;elif self.in_playcount_tag == True:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self.data.append(string)<br>

<br>

Thanks in advance!<br>

Bryan<br></div></div>

_______________________________________________<br>

Pythonmac-SIG maillist &nbsp;- &nbsp;<a href="mailto:Pythonmac-SIG@python.org" target="_blank">Pythonmac-SIG@python.org</a><br>

<a href="http://mail.python.org/mailman/listinfo/pythonmac-sig" target="_blank">http://mail.python.org/mailman/listinfo/pythonmac-sig</a><br>

</blockquote>

<br>

</blockquote></div><br>