[Mailman-Developers] race condition in locking ?

Thomas Wouters thomas@xs4all.net
Wed, 2 Feb 2000 22:53:46 +0100


On Wed, Feb 02, 2000 at 02:00:42PM +0100, Thomas Wouters wrote:

> This all led me to believe there was something wrong with locking, and I
> spent the mornings' traintrip from home to work (1.5 hours) reading
> LockFile.py and thinking about the locking. And I think the logic of
> LockFile.lock() is flawed :P

And today I figured out what exactly is flawed. The locking mechanism is
backwards ! Maybe it's intended that way, it seems that the instructions for
this type of locking is removed from 'the' linux open manpage (my rh6 boxes
no longer have any talk about locking) so I can't double-check, but I doubt
this is what it describes.

We use a very similar locking scheme in our slightly adjusted mail.local
(originally sendmails) which, if memory serves, was suggested by some
manpage or other. But not Linux' open() manpage. In any case, we use it the
other way 'round from mailman, and with great success.

Mailman currently locks by creating a link from the lockfile to a private
lockfile, and checking that the link-count is 2. If it isn't, it reads the
file to see who is owning the lockfile, to check on their health etc. The
problem is that, if a lot of processes are doing the same thing, you will
very often have more than 2 links to the file. Doing some stresstesting with
the default timeout values I saw as much as 25 concurrent links to the file,
and because every mailman process uses the exact same sleep time, the
chances of breaking out of it are not as large as they could be. The end
result is a lot of processes trying to lock the file, which is unlocked the
whole time, because the processes trying to grab it keep headbutting as they
try to grab the lock.

The way we do locking in mail.local (and this works at least C, on 4
different operating systems, over NFS ;) is that every process that tries to
obtain the lock, creates a private lockfile, and then links from the private
lockfile to the intended one. Each process can then very easily see wether
they have the lock or not by stat()ing their _own_ lockfile and looking at
the link count.

Because link() will fail when the new file to link already exists, it is
impossible to overwrite someone else's lock. And because the private
lockfile is created in advance, it's also possible to write the lock-info
mailman uses, into it _in_advance_, removing another race condition in the
locking mechanism. (namely the code checking to see if the contents is
valid, and if not, removing it.)

I'm busy rewriting LockFile.py to use the new locking (actually, I've
rewritten it, I've just not tested it yet ;) and I should be able to post it
tomorrow... Unless anyone has objects to this locking method. Did I miss
anything ? Is there anyone I should talk to before changing things, like the
original authors ?

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!