Splitting SAX results
IamIan
iansan at gmail.com
Tue Jun 12 15:16:45 EDT 2007
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.
The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
tracker = [] # Option 1
tracker = {} # Option 2
class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0
def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''
def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value
def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText
parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')
print tracker
Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]
Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack
Thank you for the help!
More information about the Python-list
mailing list