How to read between xml tags?

Miki Tebeka miki.tebeka at zoran.com
Wed Mar 10 03:12:16 EST 2004


Hello Anthony,

> 1. The read operation must either read a full tag or
> ignore the tag.
> 
> 2. If the read operation reads between <P> and </P>,
> then it must reads the whole thing between those 2
> tags all at once.
> 
> How can I achieve this please?
I think the xml.sax module is what you're looking for.
A small, briefly tested something might be:
---
#!/usr/bin/env python

from xml.sax.handler import ContentHandler
from xml.sax import parse

class ArticleHandler(ContentHandler):
     def __init__(self, *ignore):
         ContentHandler.__init__(self)
         self.data = "" # Data buffer
         self.get = 0 # Get flag
         # Ignore hash
         self.ignore = {}.fromkeys([i.lower() for i in ignore])

     def startElement(self, name, attrs):
         if name.lower() in self.ignore:
             self.get = 0
         else:
             self.get = 1

     def endElement(self, name):
         self.get = 0

     def characters(self, content):
         if self.get:
             self.data += content


from sys import argv
handler = ArticleHandler()
parse(argv[1], handler)
print handler.data # Will print full data

handler = ArticleHandler("headline")
parse(argv[1], handler)
print handler.data # Will print data without headlines
---

HTH.
Miki




More information about the Python-list mailing list