Regular Expressions and Threads?

Brian Wisti wbrian2 at uswest.net
Wed Jun 14 14:46:15 EDT 2000


Hi all,

I was wondering if anybody could help me out with using regular
expressions within a thread.  I am extremely new to threads, so pleas be
patient with me.


My current project involves grabbing a large number of HTML files from
different servers around the Web (sort of a meta-search thing). These
files are then parsed into a single file. Since speed of response was an
issue, I thought it would be best to use a separate
threading.Thread-derived object to grab and parse each page.  The code
to parse the HTML is kept in separate files which get compiled and
exec()'d inside the thread. (I plan on adjusting that so the code gets
compiled at startup & exec()'d in the thread once I iron this oddness
out)

On the first run, it works great.  Everything is grabbed, everything is
parsed, and it comes out looking all nice and pretty for the user.

Subsequent runs are another story, though.  Everything is still grabbed
and parsed and made pretty, but the results from the previous request
are sent as well.  After three separate requests, you are looking at a
very long page indeed!  It ends up looking sort of like this:

--request 1--
<results from request 1>

--request 2--
<results from request 1>
<results from request 2>

--request 3--
<results from request 1>
<results from request 2>
<results from request 3>

I've been beating my head against the monitor for a few days on this,
and I _think_ I've narrowed it down to the regular expression objects
which are compiled within the exec'd code (in its own thread,
remember).  If I just dump the raw HTML rather than parsing it, there's
no repetition.

I thought I'd be clever and del() the regular expression object at the
end of the code to be exec()'d, but that has unpredictable results. 
Sometimes it works, sometimes it doesn't.

No luck finding anything helpful via searching python.org, or any of the
search engines I frequent.  Mostly what I've found is 're is
automatically threadsafe.'

So ... umm... what am I probably missing about threads and / or the
Python re module?  Anything would be helpful right now, even which Fine
Manual I should read.

Thanks,
Brian Wisti
 
----------------------------------------------------------------------
| "All you need is ignorance and confidence; then success is sure."  |
|                                                    - Mark Twain    |
----------------------------------------------------------------------
| Brian Wisti   |  wbrian2 at uswest.net  | http://www.COOLNAMEHERE.com |
----------------------------------------------------------------------




More information about the Python-list mailing list