[Tutor] Clarified: Best way to alter sections of a string which
match dictionary keys?
SSokolow
from_python_tutor at SSokolow.com
Fri Jan 2 17:16:33 EST 2004
OK. Here's my attempt at fixing my question:
I am making a modified version of amit's proxy 4
(http://theory.stanford.edu/~amitp/proxy.html) in order to provide some
client-side enhancements to the Anime Addventure
(http://addventure.bast-enterprises.de/) such as non-expiring page history.
It does currently work and here's the tested functional code to put a
checkmark beside each visited link:
for url_key in episodesViewed.keys():
string = re.sub(r'(?i)(<a.*?href="' + url_key[1:] +
r'".*?)', r'<img src="http://_proxy/checkmark.png">\1', string)
each key in episodesViewed is a URL such as "/10523.html" and the
variable name string is not my choice. It is a standard convention for
all transport-level decoding modules in proxy 4 (I haven't figured out
how to hook this code in at the content level so I'm improvising)
The reason that this will take too long with the regular expression
shown is because it will run once for every item in the dictionary. This
will be a scalability problem since the dictionary will grow to contain
over 15,000 URLs. I haven't been able to time it but I do know that
running dictionary.has_key() for no more than 10 substrings is a lot
faster than running that regular expression 15,000 times.
The result is that a hyperlink such as <a href="10523.html">Episode
10523</a> will become <img src="http://_proxy/checkmark.png"><a
href="10523.html">Episode 10523</a> but only if /10523.html is a key in
the episodesViewed dictionary.
Currently, the code runs the regular expression for each item in the
dictionary (which, as I said, will grow to be over 15,000 keys). What I
want is some code that loops through each link in the page (the string
variable holds the contents of an HTML file) and uses
everMemory.has_key() to figure out whether it should put <img
src="http://_proxy/checkmark.png"> beside the link.
Hope this is a little more understandable
Stephan Sokolow
More information about the Tutor
mailing list