[Tutor] Clarified: Best way to alter sections of a string which match dictionary keys?

Sat Jan 3 10:21:29 EST 2004

On  3 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:

>  It's much faster and I would have never thought of it. I'm still
>  thinking mostly in intermediate Perl and that means I never
>  considered the possibility that the replacement expression could be
>  anything more than a plain vanilla string.

I thought of another solution (I found the problem interesting); it may
be a bit slower (since it processes the string more often) but it's much
safer

********************************************************************
from HTMLParser import HTMLParser
import sre

class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.val = []
        self.va = self.val.append
        self.hsh = {} # insert your hash here ore use a function which
                      # returns the hash table 
        self.reg = sre.compile(r'(\.|\^|\$|\*|\+|\?)')
    def regexp_quote(self, s):
        return self.reg.sub(r'\\\1', s)
    def handle_starttag(self, tag, attrs):
         if tag == "a":
             for elem in attrs:
                 if 'href' in elem and elem[1] in self.hsh:
                     data = r"%s" % (self.get_starttag_text())
                     self.va((r"%s%s" %
                              ('<img src="http://_proxy/checkmark.png">', data),
                              self.regexp_quote(data)))
                     break
    def change(self, stream):
        self.reset()
        self.val = []
        self.feed(stream)
        for exp, reg in self.val:
            stream = sre.sub(reg, exp, stream)
        return stream
********************************************************************

I'm sure something like the function regexp_quote does exist somewhere
(in XEmacs it's a builtin function) for Python.  Perhaps you have to
augment it a bit.

For above code you only have to instantiate a MyHTMLParser object once.
You can use it then to process all pages.

It's used like that:

>>> parser = MyHTMLParser()
>>> string = ''.join(open('index.html').readlines())
>>> string2 = ''.join(open('kontakt.html').readlines())
>>> string2 = parser.change(string2)
>>> string = parser.change(string)

etc.

As I said it will be a bit slower but since it uses HTMLParser you can
hope that less false matches will happen.

   Karl
-- 
Please do *not* send copies of replies to me.
I read the list