How to read between xml tags?
Anthony Liu
antonyliu2002 at yahoo.com
Wed Mar 10 17:06:41 EST 2004
Yes, Miki, your code works great to strip the XML tags
and return a clean text file.
But the thing is, I want to process the part between
tags each time it is read in.
For example if I have a tagged XML doc like so:
<tag1>Something here</tag1>
<tag2>something else here</tag2>
I want to get "Something here" in one read operation
and process it before I move on to get "Something else
here".
So any way to go about this?
--- Miki Tebeka <miki.tebeka at zoran.com> wrote:
> Hello Anthony,
>
> > 1. The read operation must either read a full tag
> or
> > ignore the tag.
> >
> > 2. If the read operation reads between <P> and
> </P>,
> > then it must reads the whole thing between those 2
> > tags all at once.
> >
> > How can I achieve this please?
> I think the xml.sax module is what you're looking
> for.
> A small, briefly tested something might be:
> ---
> #!/usr/bin/env python
>
> from xml.sax.handler import ContentHandler
> from xml.sax import parse
>
> class ArticleHandler(ContentHandler):
> def __init__(self, *ignore):
> ContentHandler.__init__(self)
> self.data = "" # Data buffer
> self.get = 0 # Get flag
> # Ignore hash
> self.ignore = {}.fromkeys([i.lower() for i
> in ignore])
>
> def startElement(self, name, attrs):
> if name.lower() in self.ignore:
> self.get = 0
> else:
> self.get = 1
>
> def endElement(self, name):
> self.get = 0
>
> def characters(self, content):
> if self.get:
> self.data += content
>
>
> from sys import argv
> handler = ArticleHandler()
> parse(argv[1], handler)
> print handler.data # Will print full data
>
> handler = ArticleHandler("headline")
> parse(argv[1], handler)
> print handler.data # Will print data without
> headlines
> ---
>
> HTH.
> Miki
> --
> http://mail.python.org/mailman/listinfo/python-list
__________________________________
Do you Yahoo!?
Yahoo! Search - Find what youre looking for faster
http://search.yahoo.com
More information about the Python-list
mailing list