Semaphore Techniques

David Bolen db3l.net at gmail.com
Tue Jul 28 19:30:21 EDT 2009


John D Giotta <jdgiotta at gmail.com> writes:

> I'm looking to run a process with a limit of 3 instances, but each
> execution is over a crontab interval. I've been investigating the
> threading module and using daemons to limit active thread objects, but
> I'm not very successful at grasping the documentation.
>
> Is it possible to do what I'm trying to do and if so anyone know of a
> useful example to get started?

Does it have to be built into the tool, or are you open to handling the
restriction right in the crontab entry?

For example, a crontab entry like:

  * * * * * test `pidof -x script.py | wc -w` -ge 4 || <path>/script.py

should attempt to run script.py every minute (adjust period as
required) unless there are already four of them running.  And if pidof
isn't precise enough you can put anything in there that would
accurately check your processes (grep a ps listing or whatever).

This works because if the test expression is true it returns 0 which
terminates the logical or (||) expression.

There may be some variations based on cron implementation (the above
was tested against Vixie cron), but some similar mechanism should be
available.

If you wanted to build it into the tool, it can be tricky in terms of
managing shared state (the count) amongst purely sibling/cooperative
processes.  It's much easier to ensure no overlap (1 instance), but
once you want 'n' instances you need an accurate process-wide counter.
I'm not positive, but don't think Python's built-in semaphores or
shared memory objects are cross-process.  (Maybe something in
multiprocessing in recent Python versions would work, though they may
need the sharing processes to all have been executed from a parent
script)

I do believe there are some third party interfaces (posix_ipc,
shm/shm_wrapper) that would provide access to posix shared-process
objects.  A semaphore may still not work as I'm not sure you can
obtain the current count.  But you could probably do something with
a shared memory counter in conjunction with a mutex of some sort, as
long as you were careful to clean it up on exit.

Or, you could stick PIDs into the shared memory and count PIDs on
a new startup (double checking against running processes to help
protect against process failures without cleanup).

You could also use the filesystem - have a shared directory where each
process dumps its PID, after first counting how many other PIDs are in
the directory and exiting if too many.

Of course all of these (even with a PID check) are risky in the
presence of unexpected failures.  It would be worse with something
like C code, but it should be reasonably easy to ensure that your
script has cleanup code even on an unexpected termination, and it's
not that likely the Python interpreter itself would crash.  Then
again, something external could kill the process.  Ensuring accuracy
and cleanup of shared state can be non-trivial.

You don't mention if you can support a single master daemon, but if
you could, then it can get a little easier as it can maintain and
protect access to the state - you could have each worker process
maintain a socket connection of some sort with the master daemon so it
could detect when they terminate for the count, and it could just
reject such connections from new processes if too many are running
already.  Of course, if the master daemon goes away then nobody would
run, which may or may not be an acceptable failure mode.

All in all, unless you need the scripts to enforce this behavior even
in the presence of arbitrary use, I'd just use an appropriate crontab
entry and move on to other problems :-)

-- David



More information about the Python-list mailing list