How to read between xml tags?

Miki Tebeka miki.tebeka at
Wed Mar 10 09:12:16 CET 2004

Hello Anthony,

> 1. The read operation must either read a full tag or
> ignore the tag.
> 2. If the read operation reads between <P> and </P>,
> then it must reads the whole thing between those 2
> tags all at once.
> How can I achieve this please?
I think the xml.sax module is what you're looking for.
A small, briefly tested something might be:
#!/usr/bin/env python

from xml.sax.handler import ContentHandler
from xml.sax import parse

class ArticleHandler(ContentHandler):
     def __init__(self, *ignore):
         ContentHandler.__init__(self) = "" # Data buffer
         self.get = 0 # Get flag
         # Ignore hash
         self.ignore = {}.fromkeys([i.lower() for i in ignore])

     def startElement(self, name, attrs):
         if name.lower() in self.ignore:
             self.get = 0
             self.get = 1

     def endElement(self, name):
         self.get = 0

     def characters(self, content):
         if self.get:
    += content

from sys import argv
handler = ArticleHandler()
parse(argv[1], handler)
print # Will print full data

handler = ArticleHandler("headline")
parse(argv[1], handler)
print # Will print data without headlines


More information about the Python-list mailing list