[Tutor] Clarified: Best way to alter sections of a string which
match dictionary keys?
SSokolow
from_python_tutor at SSokolow.com
Sat Jan 3 16:54:20 EST 2004
Karl Pflästerer wrote:
>On 3 Jan 2004, Karl Pflästerer <- sigurd at 12move.de wrote:
>
>
>
>>On 3 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:
>>
>>
>
>
>
>>> It's much faster and I would have never thought of it. I'm still
>>> thinking mostly in intermediate Perl and that means I never
>>> considered the possibility that the replacement expression could be
>>> anything more than a plain vanilla string.
>>>
>>>
>
>
>
>>I thought of another solution (I found the problem interesting); it may
>>be a bit slower (since it processes the string more often) but it's much
>>safer
>>
>>
>
>And had a little bug. here is a better (and nicer formatted) solution.
>
>[snip: Code Sample]
>
>Now multiple occurences of the same link which are furthermore written
>exaktly equal won't be entered in the list of links to replace.
>
>I used simply a list here for the found links since I think that there
>won't be a lot of them in one document. So the overhead of building a
>hash table might make make things slower compared to the slow operation
>of finding a matching value in a list. But that's guessed not measured.
>
>
> Karl
>
>
Your reply is confusing me but as I understand it, there are three
problems with this:
1. I do want all occurrences of a recognized link to be changed. This is
supposed to be a non-expiring version of the browser history that is
only executed while the user is browsing
http://addventure.bast-enterprises.de/ (the more advanced features will
build upon this)
2. What do you mean safer? The situation may not apply to this specific
site and speed is of the utmost importance in this case. I forgot that
this also indexes the comments pages (the dictionary will eventually
hold well over 40,000 entries) and will often be used to view the index
pages (the largest index page has about 15,000 links to be checked
against the dict and is over 1MB of HTML)
I also forgot to mention that the variable string does not hold the
entire file. This is run for each chunk of data as it's received from
the server. (I don't know how to build content-layer filtering into the
proxy code I'm extending so I hooked it in at the content layer. testing
has shown that some links lie across chunk boundaries like this:
[continued from previous chunk]is some link text</a>
.
.
.
<a href="whatever">This is th[continued in next chunk]
and I don't know if the HTML parser might stumble on an unclosed <a> tag
pair.
I already have a pair of regular expression substitutions converting all
links that stay on the server into root reference relative links (
/where/whatever.html as opposed to whatever.html if it's in the same
directory) and I was planning to see if they could be safely and easily
integrated into your first submission for extra speed.
I will also keep this code but I have been testing through normal use at
a rate that could be considered a stress test and I have yet to see a
problem. Since the site is a large choose-your-own-adventure book to
which anybody can add, the proxy has to run this code almost instantly
since I (and anyone else who would use this) wouldn't have the patience
to wait a long time for a block of text that's 50K at most. (With the
exception of the indices which also cannot be noticeably slower to load)
In fact, any wait time greater than one or two seconds becomes irritating.
Thanks and I'll keep trying to find the problem with the regular
expression you sent (as soon as I've done some more reading, err...testing.)
Stephan Sokolow
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20040103/05d05ade/attachment.html
More information about the Tutor
mailing list