Threads + Re(gex) Baffled!?!?!?

Cliff Daniel cdaniel at xcom.net
Wed Sep 15 02:58:57 EDT 1999


I'm totally baffled right now, wrt threads.  I have built a test
program using sockets to pull down just a few lines of text,
parse it and junk the result.  

For practical purposes I have spawned 15 worker threads who
each will connect to a different host (480 Hosts total).  They would
then execute code similar to this:

---------------------------------------
Sock.send('some-command')
Reply = ReadUntil('> ', Sock, Timeout)

if Reply == '':
    return(-1)
else:
    tmp = re.split('\r\n', Reply)
    for line in tmp:
	m = pat.match(line)
	.etc.

Upon running the script you see nice and speedy results at first with
very little cpu usage.  However, the cpu usage grows continously until
the program ends.  I have noticed that when the bloating occurs the
a lot of the cpu time is spent in the kernel which would lead me to
believe this is some sort of locking issue hidden within the 're'
module?  When I put a 'continue' in front of the pat.match() every
thing works just fine.  

With re:
CPU states: 12.5% idle, 18.8% user, 46.4% kernel
26.54% test.py

Without:
CPU states: 19.1% idle, 35.8% user, 10.6% kernel,
0.76% test.py

The pattern I'm matching is as follows:
pat = re.compile('\s+(\w+)\s+\(.+?\):\s+(\d+)\s+(\d+)')

Is there any known issues with locking and 're'?

The system is an UE5000 with 8 cpus, Solaris 2.6, Python 1.5.2

If anyone has any ideas about this I desperately need some help.  I
hate to have to dump this entire project for performance reasons.
I've stripped out EVERYTHING in the program just to reproduct this to
rule out my code.

Regards,
Cliff





More information about the Python-list mailing list