How to read between xml tags?
Miki Tebeka
miki.tebeka at zoran.com
Wed Mar 10 03:12:16 EST 2004
Hello Anthony,
> 1. The read operation must either read a full tag or
> ignore the tag.
>
> 2. If the read operation reads between <P> and </P>,
> then it must reads the whole thing between those 2
> tags all at once.
>
> How can I achieve this please?
I think the xml.sax module is what you're looking for.
A small, briefly tested something might be:
---
#!/usr/bin/env python
from xml.sax.handler import ContentHandler
from xml.sax import parse
class ArticleHandler(ContentHandler):
def __init__(self, *ignore):
ContentHandler.__init__(self)
self.data = "" # Data buffer
self.get = 0 # Get flag
# Ignore hash
self.ignore = {}.fromkeys([i.lower() for i in ignore])
def startElement(self, name, attrs):
if name.lower() in self.ignore:
self.get = 0
else:
self.get = 1
def endElement(self, name):
self.get = 0
def characters(self, content):
if self.get:
self.data += content
from sys import argv
handler = ArticleHandler()
parse(argv[1], handler)
print handler.data # Will print full data
handler = ArticleHandler("headline")
parse(argv[1], handler)
print handler.data # Will print data without headlines
---
HTH.
Miki
More information about the Python-list
mailing list