[Tutor] Clarified: Best way to alter sections of a string which match dictionary keys?

SSokolow from_python_tutor at SSokolow.com
Sat Jan 3 16:54:20 EST 2004


Karl Pflästerer wrote:

>On  3 Jan 2004, Karl Pflästerer <- sigurd at 12move.de wrote:
>
>  
>
>>On  3 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:
>>    
>>
>
>  
>
>>> It's much faster and I would have never thought of it. I'm still
>>> thinking mostly in intermediate Perl and that means I never
>>> considered the possibility that the replacement expression could be
>>> anything more than a plain vanilla string.
>>>      
>>>
>
>  
>
>>I thought of another solution (I found the problem interesting); it may
>>be a bit slower (since it processes the string more often) but it's much
>>safer
>>    
>>
>
>And had a little bug.  here is a better (and nicer formatted) solution.
>
>[snip: Code Sample]
>
>Now multiple occurences of the same link which are furthermore written
>exaktly equal won't be entered in the list of links to replace.  
>
>I used simply a list here for the found links since I think that there
>won't be a lot of them in one document.  So the overhead of building a
>hash table might make make things slower compared to the slow operation
>of finding a matching value in a list.  But that's guessed not measured.
>
>
>   Karl
>  
>
Your reply is confusing me but as I understand it, there are three 
problems with this:

1. I do want all occurrences of a recognized link to be changed. This is 
supposed to be a non-expiring version of the browser history that is 
only executed while the user is browsing 
http://addventure.bast-enterprises.de/ (the more advanced features will 
build upon this)
2. What do you mean safer? The situation may not apply to this specific 
site and speed is of the utmost importance in this case. I forgot that 
this also indexes the comments pages (the dictionary will eventually 
hold well over 40,000 entries) and will often be used to view the index 
pages (the largest index page has about 15,000 links to be checked 
against the dict and is over 1MB of HTML)

 I also forgot to mention that the variable string does not hold the 
entire file. This is run for each chunk of data as it's received from 
the server. (I don't know how to build content-layer filtering into the 
proxy code I'm extending so I hooked it in at the content layer. testing 
has shown that some links lie across chunk boundaries like this:

[continued from previous chunk]is some link text</a>
.
.
.
<a href="whatever">This is th[continued in next chunk]

and I don't know if the HTML parser might stumble on an unclosed <a> tag 
pair.

I already have a pair of regular expression substitutions converting all 
links that stay on the server into root reference relative links ( 
/where/whatever.html  as opposed to whatever.html if it's in the same 
directory) and I was planning to see if they could be safely and easily 
integrated into your first submission for extra speed.

I will also keep this code but I have been testing through normal use at 
a rate that could be considered a stress test and I have yet to see a 
problem. Since the site is a large choose-your-own-adventure book to 
which anybody can add, the proxy has to run this code almost instantly 
since I (and anyone else who would use this) wouldn't have the patience 
to wait a long time for a block of text that's 50K at most. (With the 
exception of the indices which also cannot be noticeably slower to load) 
In fact, any wait time greater than one or two seconds becomes irritating.

Thanks and I'll keep trying to find the problem with the regular 
expression you sent (as soon as I've done some more reading, err...testing.)

Stephan Sokolow
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20040103/05d05ade/attachment.html


More information about the Tutor mailing list