how to get text between HTML tags with URLLIB??
Paolo G. Cantore
paolo.cantore at freenet.de
Sun Aug 20 11:30:44 EDT 2000
Roy Katz wrote:
>
> def start_a( self, attrs ):
> self.in_link = 1
> self.linkbuf = link_type( attrs )
>
> def end_a( self ):
> self.in_link = 0
> self.linkbuf.name = self.strbuf
> self.links.append( self.linkbuf )
> self.strbuf = ''
>
> def handle_data( self, data ):
> if self.in_link == 1:
> self.strbuf = self.strbuf + data
>
> This approach works fairly well; furthermore, the 'in_link' flag
> ensures that strbuf will contain *only* the text between the <a href> and
> </a> tags. There is a problem with this approach, however. I meant for
> link_type.links to be a list of strings corresponding to the placement
> of the link within the bookmark heirarchy; however, given Netscape's
> bookmark format, I see that it will take me a lot more code than I
> thought. I *am* building an in-memory model. So why re-invent the
> wheel? You're right, I'll look at DOM, I just need a few examples of how
> to use it effectively.
>
> Roey
Your in_link processing is already provided by the two parser-methods
save_bgn() and save_end(). Your code would look like:
def start_a(self, attrs):
self.save_bgn()
self.linkbuf=link_type(attrs)
def end_a(self):
self.linkbuf.name=self.save_end()
self.links.append(self.linkbuf)
that's all
--
More information about the Python-list
mailing list