[Tutor] Clarified: Best way to alter sections of a string which
match dictionary keys?
Karl Pflästerer
sigurd at 12move.de
Sat Jan 3 10:21:29 EST 2004
On 3 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:
> It's much faster and I would have never thought of it. I'm still
> thinking mostly in intermediate Perl and that means I never
> considered the possibility that the replacement expression could be
> anything more than a plain vanilla string.
I thought of another solution (I found the problem interesting); it may
be a bit slower (since it processes the string more often) but it's much
safer
********************************************************************
from HTMLParser import HTMLParser
import sre
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.val = []
self.va = self.val.append
self.hsh = {} # insert your hash here ore use a function which
# returns the hash table
self.reg = sre.compile(r'(\.|\^|\$|\*|\+|\?)')
def regexp_quote(self, s):
return self.reg.sub(r'\\\1', s)
def handle_starttag(self, tag, attrs):
if tag == "a":
for elem in attrs:
if 'href' in elem and elem[1] in self.hsh:
data = r"%s" % (self.get_starttag_text())
self.va((r"%s%s" %
('<img src="http://_proxy/checkmark.png">', data),
self.regexp_quote(data)))
break
def change(self, stream):
self.reset()
self.val = []
self.feed(stream)
for exp, reg in self.val:
stream = sre.sub(reg, exp, stream)
return stream
********************************************************************
I'm sure something like the function regexp_quote does exist somewhere
(in XEmacs it's a builtin function) for Python. Perhaps you have to
augment it a bit.
For above code you only have to instantiate a MyHTMLParser object once.
You can use it then to process all pages.
It's used like that:
>>> parser = MyHTMLParser()
>>> string = ''.join(open('index.html').readlines())
>>> string2 = ''.join(open('kontakt.html').readlines())
>>> string2 = parser.change(string2)
>>> string = parser.change(string)
etc.
As I said it will be a bit slower but since it uses HTMLParser you can
hope that less false matches will happen.
Karl
--
Please do *not* send copies of replies to me.
I read the list
More information about the Tutor
mailing list