<br><br><div><span class="gmail_quote">On 3/3/08, <b class="gmail_sendername">Mike D</b> <<a href="mailto:42flicks@gmail.com">42flicks@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex">
Hello,<br><br>I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss reader.<br><br>I can parse the file but am unsure how to extract the elements I require. For example: For each <item> element I want the title and description.<br>
<br>I have some stub code; I want to create a list of objects which include a title and description.<br><br>I have the following code (a bit hacked up):<br><br>import sys<br>from xml.sax import make_parser<br>from xml.sax import handler<br>
<br>class rssObject(object):<br> objectList=[]<br> def addObject(self,object):<br> rssObject.objectList.append(object)<br><br>class rssObjectDetail(object): <br> title = ""<br> content = ""<br>
<br><br>class SimpleHandler(handler.ContentHandler):<br> def startElement(self,name,attrs):<br> print name <br><br> def endElement(self,name):<br> print name<br><br> def characters(self,data):<br>
print data<br> <br> <br>class SimpleDTDHandler(handler.DTDHandler):<br> def notationDecl(self,name,publicid,systemid):<br> print "Notation: " , name, publicid, systemid<br><br>
def unparsedEntityDecl(self,name,publicid,systemid):<br> print "UnparsedEntity: " , name, publicid, systemid, ndata<br><br>p= make_parser()<br>c = SimpleHandler()<br>p.setContentHandler(c)<br>p.setDTDHandler(SimpleDTDHandler())<br>
p.parse('topstories.xml')<br><br>And am using this xml file:<br><br><?xml version="1.0"?><br><rss version="2.0"><br> <channel><br> <title><a href="http://Stuff.co.nz" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Stuff.co.nz</a> - Top Stories</title><br>
<link><a href="http://www.stuff.co.nz" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.stuff.co.nz</a></link><br> <description>Top Stories from <a href="http://Stuff.co.nz" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Stuff.co.nz</a>. New Zealand, world, sport, business & entertainment news on <a href="http://Stuff.co.nz" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Stuff.co.nz</a>. </description><br>
<language>en-nz</language><br> <copyright>Fairfax New Zealand Ltd.</copyright><br> <ttl>30</ttl><br> <image><br> <url>/static/images/logo.gif</url><br>
<title>Stuff News</title><br> <link><a href="http://www.stuff.co.nz" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.stuff.co.nz</a></link><br> </image><br>
<br><item id="4423924" count="1"><br> <title>Prince Harry 'wants to live in Africa'</title><br><link><a href="http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303</a></link><br>
<description>For Prince Harry it must be the ultimate dark irony: to be in such a privileged position and have so much opportunity, and yet be unable to fulfil a dream of fighting for the motherland.</description><br>
<author>EDMUND TADROS</author><br><guid isPermaLink="false"><a href="http://stuff.co.nz/4423924" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">stuff.co.nz/4423924</a></guid><br>
<pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate><br> </item><br><br> </channel><br></rss><br><br>Is there something I'm missing? I can't figure out how to correctly interpret the document using the SAX parser. I'm sure I;'m missing something obvious :)<br>
<br>Any tips or advice would be appreciated! Also advice on correctly implementing what I want to achieve would be appreciated as using objectList=[] in the ContentHandler seems like a hack.<br><br>Thanks!<br></blockquote>
</div><br>My mistake, The provided example is a SAX object, which can be parsed with DOM manipulation. I'll be able to do it now :)<br><br>Oh, I also posted a hacked up implementation, I understand my classes look awful! <br>