Threads + Re(gex) Baffled!?!?!?

Cliff Daniel cdaniel at level3.net
Wed Sep 15 18:43:27 EDT 1999


On Wed, 15 Sep 1999 09:10:25 -0400, "Gordon McMillan"
<gmcm at hypernet.com> wrote:

>The problem could be re, or a platform problem not exposed 
>until you use multiple CPUs, or a combination of these.

   I not sure if I mentioned this either but running with 1 thread
doesn't seem to cause the problem.  I can't think of any reason
1 thread or 10 should bring out this behaviour since the lwp
accosiated with the thread can be scheduled on any given cpu at any
give time.  Something's not adding up.

>A solution might be to dedicated one thread to running the 
>regexes, and use Queue to pass information to him from the 
>socket-threads. Assuming that your regex exhibits decent 
>behavior, the bottleneck is probably in the network, anyway.

I ran three different tests implementing 3 different solutions.  The
results are pretty crazy.

In test #1 I implemented a thread that is dedicated to handling regex
requests.  All WorkerThreads() will consult the RegexThread (via a
Queue) and wait for it's result.  This someone serializes but it's not
that bad.  I'm somewhat confused as how the 5 thread test takes
longer than the 5 thread test on #2.  Possibly my Queing methods since
it was the first stab at one.

1. Dedicated RegexThread()
=======================
		CPU	Completed	Kernel CPU
5 Threads:	3.0%	118 seconds	~5% kernel
10 Threads:	7.5%	62 seconds	~7% kernel
15 Threads:	9.5%	47 seconds	~9% kernel

In test #2 I went without the RegexThread and let each Worker enter
the 're' routines after obtaining a global Mutex Lock.  This prevents
more than 1 thread at a time from calling the 're' routines.
Amazingly, the 10 and 15 thread test kind of proves a theory that the
problem is function of the number of threads making calls into the
're' library?

2. One Mutex Lock around regex calls (No Dedicated RegexThread)
======================================================
		CPU	Completed	Kernel CPU
5 Threads:	6.0%	85 seconds	~10% kernel
10 Threads:	18.0%	56 seconds	~13% kernel
15 Threads:	27.0%	59 seconds	~22% kernel

In test #3 there is no locks or dedicated RegexThread.  This obviously
would be the ideal solution for any program, but unfortunately the
side-affects in 're' and/or the platform prohibit me from doing it
this way.  These stats are pretty horrible :-)

3. No locking around regex calls (shouldn't have to, No RegexThread)
=======================================================
		CPU	Completed	Kernel CPU
5 Threads:	15%	99 seconds	~15% kernel
10 Threads:	35.5%	184 seconds	~30% kernel
15 Threads:	50.5%	261 seconds	~40% kernel


I'm not sure why I went through all this trouble testing these because
it confuses the hell out of me.  I just hope that someone can make
heads or tails out of what's going on.  Meanwhile I'll just have to
stick to the RegexThread() work-around.

Regards,

Cliff





More information about the Python-list mailing list