Trouble using XML Reader
Mike D
42flicks at gmail.com
Tue Mar 4 04:14:38 EST 2008
On 3/3/08, Mike D <42flicks at gmail.com> wrote:
>
> Hello,
>
> I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss
> reader.
>
> I can parse the file but am unsure how to extract the elements I require.
> For example: For each <item> element I want the title and description.
>
> I have some stub code; I want to create a list of objects which include a
> title and description.
>
> I have the following code (a bit hacked up):
>
> import sys
> from xml.sax import make_parser
> from xml.sax import handler
>
> class rssObject(object):
> objectList=[]
> def addObject(self,object):
> rssObject.objectList.append(object)
>
> class rssObjectDetail(object):
> title = ""
> content = ""
>
>
> class SimpleHandler(handler.ContentHandler):
> def startElement(self,name,attrs):
> print name
>
> def endElement(self,name):
> print name
>
> def characters(self,data):
> print data
>
>
> class SimpleDTDHandler(handler.DTDHandler):
> def notationDecl(self,name,publicid,systemid):
> print "Notation: " , name, publicid, systemid
>
> def unparsedEntityDecl(self,name,publicid,systemid):
> print "UnparsedEntity: " , name, publicid, systemid, ndata
>
> p= make_parser()
> c = SimpleHandler()
> p.setContentHandler(c)
> p.setDTDHandler(SimpleDTDHandler())
> p.parse('topstories.xml')
>
> And am using this xml file:
>
> <?xml version="1.0"?>
> <rss version="2.0">
> <channel>
> <title>Stuff.co.nz - Top Stories</title>
> <link>http://www.stuff.co.nz</link>
> <description>Top Stories from Stuff.co.nz. New Zealand, world, sport,
> business & entertainment news on Stuff.co.nz. </description>
> <language>en-nz</language>
> <copyright>Fairfax New Zealand Ltd.</copyright>
> <ttl>30</ttl>
> <image>
> <url>/static/images/logo.gif</url>
> <title>Stuff News</title>
> <link>http://www.stuff.co.nz</link>
> </image>
>
> <item id="4423924" count="1">
> <title>Prince Harry 'wants to live in Africa'</title>
> <link>http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303
> </link>
> <description>For Prince Harry it must be the ultimate dark irony: to be in
> such a privileged position and have so much opportunity, and yet be unable
> to fulfil a dream of fighting for the motherland.</description>
> <author>EDMUND TADROS</author>
> <guid isPermaLink="false">stuff.co.nz/4423924</guid>
> <pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate>
> </item>
>
> </channel>
> </rss>
>
> Is there something I'm missing? I can't figure out how to correctly
> interpret the document using the SAX parser. I'm sure I;'m missing something
> obvious :)
>
> Any tips or advice would be appreciated! Also advice on correctly
> implementing what I want to achieve would be appreciated as using
> objectList=[] in the ContentHandler seems like a hack.
>
> Thanks!
>
My mistake, The provided example is a SAX object, which can be parsed with
DOM manipulation. I'll be able to do it now :)
Oh, I also
posted a hacked up implementation, I understand my classes look awful!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080304/8751a175/attachment-0001.html>
More information about the Python-list
mailing list