[Tutor] Clarified: Best way to alter sections of a string which match dictionary keys?

SSokolow from_python_tutor at SSokolow.com
Fri Jan 2 17:16:33 EST 2004


OK. Here's my attempt at fixing my question:
I am making a modified version of amit's proxy 4 
(http://theory.stanford.edu/~amitp/proxy.html) in order to provide some 
client-side enhancements to the Anime Addventure 
(http://addventure.bast-enterprises.de/) such as non-expiring page history.

It does currently work and here's the tested functional code to put a 
checkmark beside each visited link:

        for url_key in episodesViewed.keys():
            string = re.sub(r'(?i)(<a.*?href="' + url_key[1:] + 
r'".*?)', r'<img src="http://_proxy/checkmark.png">\1', string)

each key in episodesViewed is a URL such as "/10523.html" and the 
variable name string is not my choice. It is a standard convention for 
all transport-level decoding modules in proxy 4 (I haven't figured out 
how to hook this code in at the content level so I'm improvising)

The reason that this will take too long with the regular expression 
shown is because it will run once for every item in the dictionary. This 
will be a scalability problem since the dictionary will grow to contain 
over 15,000 URLs. I haven't been able to time it but I do know that 
running dictionary.has_key() for no more than 10 substrings is a lot 
faster than running that regular expression 15,000 times.

The result is that a hyperlink such as <a href="10523.html">Episode 
10523</a> will become <img src="http://_proxy/checkmark.png"><a 
href="10523.html">Episode 10523</a> but only if /10523.html is a key in 
the episodesViewed dictionary.

Currently, the code runs the regular expression for each item in the 
dictionary (which, as I said, will grow to be over 15,000 keys). What I 
want is some code that loops through each link in the page (the string 
variable holds the contents of an HTML file) and uses 
everMemory.has_key() to figure out whether it should put <img 
src="http://_proxy/checkmark.png"> beside the link.

Hope this is a little more understandable

Stephan Sokolow



More information about the Tutor mailing list