A number of people have experienced corruption of config.db files when the disk gets full. While I was away on vacation, we experienced the same thing on python.org. The ugly thing is that when the disk gets full, enough of config.db can be written so that it is a valid marshal of an empty dictionary. This means that lists with corrupt config.db files will still load, but will start having very cryptic errors such as attributes missing on the MailList objects, etc.
The fix in those cases is to free up space on the disk and then copy config.db.last to config.db for all the corrupted lists. I think this is suboptimal. I'd rather have config.db /only/ be changed if we can guarantee (as best as possible) that the marshal was written completely and successfully.
Attached is a patch to do this. I am not checking it into the CVS tree at the moment because it's such a critical section of code. I want to stress it a little more myself, but I would greatly appreciate comments, or testing by y'all. You might want to use a non-production, or well backed up installation. I feel pretty confident about the change, but still, get this wrong and your lists are toast.
The idea is that the marshal is first written to a temp file called config.db.tmp.<mypid>. If this succeeds completely, then
1) config.db.last is unlinked
2) a hard link config.db.last -> config.db is created
3) a hard link config.db -> config.db.tmp.<mypid> is created
4) config.db.tmp.<mypid> is unlinked
Comments? -Barry
Index: MailList.py
RCS file: /projects/cvsroot/mailman/Mailman/MailList.py,v retrieving revision 1.126 diff -c -r1.126 MailList.py *** MailList.py 1999/06/15 07:50:54 1.126 --- MailList.py 1999/06/30 15:52:16
*** 743,765 ****
# pretty hosed. That's a good reason to make this a daemon not a
# program.
self.IsListInitialized()
! fname = os.path.join(self._full_path, 'config.db')
! fname_last = fname + ".last"
! file = aside_new(fname, fname_last, reopen=1)
! dict = {}
for key, value in self.__dict__.items():
if key[0] <> '_':
dict[key] = value
try:
! marshal.dump(dict, file)
! file.close()
! except IOError, status:
! # Darn - try to resurrect the old config.db.
! file = aside_new(fname_last, fname, reopen=0)
! self.LogMsg("error",
! "Failed config file write '%s',"
! " old config resurrected." % status.args
)
! Utils.reraise()
self.CheckHTMLArchiveDir()
def Load(self, check_version = 1):
--- 743,780 ---- # pretty hosed. That's a good reason to make this a daemon not a # program. self.IsListInitialized() ! # copy all public attributes to marshalable dictionary ! dict = {} for key, value in self.__dict__.items(): if key[0] <> '_': dict[key] = value
# we want to write this dict in a marshal, but be especially paranoid
# about how we write the config.db, so we'll be as robust as possible
# in the face of, e.g. disk full errors. The idea is that we want to
# guarantee that config.db is always valid. The old way had the bad
# habit of writing an incomplete file, which happened to be a valid
# (but bogus) marshal.
fname = os.path.join(self._full_path, 'config.db')
fname_tmp = fname + '.tmp.' + `os.getpid()`
fname_last = fname + ".last"
omask = os.umask(007) try:
! try: ! fp = open(fname_tmp, 'w') ! marshal.dump(dict, fp) ! fp.close() ! except IOError, status: ! os.unlink(fname_tmp) ! self.LogMsg('error', ! 'Failed config.db file write, retaining old state' ! '\n %s' %
status.args
) ! Utils.reraise() ! # now move config.db -> config.db.last ! # then move config.db.tmp.xxx -> config.db ! aside_new(fname, fname_last) ! aside_new(fname_tmp, fname) ! finally: ! os.umask(omask) self.CheckHTMLArchiveDir()def Load(self, check_version = 1):
participants (1)
-
Barry A. Warsaw