[Mailman-Users] LockFile.py problems + patch.
Brian Greenberg
grnbrg at gmail.com
Tue Oct 12 21:31:44 CEST 2004
I've been getting periodic entries in .../mailman/logs/locks that show:
Oct 08 08:33:50 2004 (6969) listname.lock unexpected linkcount: -1
Oct 08 08:33:50 2004 (6969) listname.lock lifetime has expired, breaking
Lots of error messages, but no apparent problems with list delivery.
I probably would not have noticed but for an "oops" that tried to
gateway 30,000+ news messages in to a test list. This flooded the log
nicely, and caught my attention....
Final analysis is that while waiting for a lock to be freed, a waiting
process enter a race condition when the holding process releases the
lock, and the result is that the non-existant lock file is checked for
it's link count (__linkcount returns -1), and then has it's lifetime
checked (__releasetime() returns -1, which results in an expired
lifetime).
The patch:
-------------------------------------------------
*** mailman-2.1.5/Mailman/LockFile.py Mon Mar 31 22:28:16 2003
--- LockFile.py Tue Oct 12 14:05:21 2004
***************
*** 264,269 ****
--- 264,271 ----
# The link failed for some reason, possibly because someone
# else already has the lock (i.e. we got an EEXIST), or for
# some other bizarre reason.
+ self.__writelog ('Link attempt failed. OSError is %s' %
+ os.strerror(e.errno))
if e.errno == errno.ENOENT:
# TBD: in some Linux environments, it is possible to get
# an ENOENT, which is truly strange, because this means
***************
*** 283,290 ****
elif self.__linkcount() <> 2:
# Somebody's messin' with us! Log this, and try again
# later. TBD: should we raise an exception?
self.__writelog('unexpected linkcount: %d' %
! self.__linkcount(), important=True)
elif self.__read() == self.__tmpfname:
# It was us that already had the link.
self.__writelog('already locked')
--- 285,297 ----
elif self.__linkcount() <> 2:
# Somebody's messin' with us! Log this, and try again
# later. TBD: should we raise an exception?
+ links = self.__linkcount()
+ if links == -1: # The lock was cleared already!
+ self.__writelog(
+ 'No lockfile after a lockfile exists error?')
+ continue
self.__writelog('unexpected linkcount: %d' %
! links, important=True)
elif self.__read() == self.__tmpfname:
# It was us that already had the link.
self.__writelog('already locked')
***************
*** 299,305 ****
raise TimeOutError
# Okay, we haven't timed out, but we didn't get the lock. Let's
# find if the lock lifetime has expired.
! if time.time() > self.__releasetime() + CLOCK_SLOP:
# Yes, so break the lock.
self.__break()
self.__writelog('lifetime has expired, breaking',
--- 306,317 ----
raise TimeOutError
# Okay, we haven't timed out, but we didn't get the lock. Let's
# find if the lock lifetime has expired.
! rel_time = self.__releasetime()
! if (rel_time == -1): # Lock does not exist anymore?
! self.__writelog(
! 'Checked the release time of a non-existant lock.')
! continue
! elif time.time() > rel_time + CLOCK_SLOP:
# Yes, so break the lock.
self.__break()
self.__writelog('lifetime has expired, breaking',
--------------------------------------------------
Brian.
--
Brian Greenberg
grnbrg at gmail.com
More information about the Mailman-Users
mailing list