[Mailman-Developers] patch candidate for MailList.Save()

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Wed, 30 Jun 1999 12:56:14 -0400 (EDT)


A number of people have experienced corruption of config.db files when
the disk gets full.  While I was away on vacation, we experienced the
same thing on python.org.  The ugly thing is that when the disk gets
full, enough of config.db can be written so that it is a valid marshal
of an empty dictionary.  This means that lists with corrupt config.db
files will still load, but will start having very cryptic errors such
as attributes missing on the MailList objects, etc.

The fix in those cases is to free up space on the disk and then copy
config.db.last to config.db for all the corrupted lists.  I think this
is suboptimal.  I'd rather have config.db /only/ be changed if we can
guarantee (as best as possible) that the marshal was written
completely and successfully.

Attached is a patch to do this.  I am not checking it into the CVS
tree at the moment because it's such a critical section of code.  I
want to stress it a little more myself, but I would greatly appreciate
comments, or testing by y'all.  You might want to use a
non-production, or well backed up installation.  I feel pretty
confident about the change, but still, get this wrong and your lists
are toast.

The idea is that the marshal is first written to a temp file called
config.db.tmp.<mypid>.  If this succeeds completely, then

    1) config.db.last is unlinked
    2) a hard link config.db.last -> config.db is created
    3) a hard link config.db -> config.db.tmp.<mypid> is created
    4) config.db.tmp.<mypid> is unlinked

Comments?
-Barry

Index: MailList.py
===================================================================
RCS file: /projects/cvsroot/mailman/Mailman/MailList.py,v
retrieving revision 1.126
diff -c -r1.126 MailList.py
*** MailList.py	1999/06/15 07:50:54	1.126
--- MailList.py	1999/06/30 15:52:16
***************
*** 743,765 ****
  	# pretty hosed.  That's a good reason to make this a daemon not a
  	# program.
  	self.IsListInitialized()
!         fname = os.path.join(self._full_path, 'config.db')
!         fname_last = fname + ".last"
!         file = aside_new(fname, fname_last, reopen=1)
! 	dict = {}
  	for key, value in self.__dict__.items():
  	    if key[0] <> '_':
  		dict[key] = value
          try:
!             marshal.dump(dict, file)
!             file.close()
!         except IOError, status:
!             # Darn - try to resurrect the old config.db.
!             file = aside_new(fname_last, fname, reopen=0)
!             self.LogMsg("error",
!                         "Failed config file write '%s',"
!                         " old config resurrected." % `status.args`)
!             Utils.reraise()
          self.CheckHTMLArchiveDir()
  
      def Load(self, check_version = 1):
--- 743,780 ----
  	# pretty hosed.  That's a good reason to make this a daemon not a
  	# program.
  	self.IsListInitialized()
!         # copy all public attributes to marshalable dictionary
!         dict = {}
  	for key, value in self.__dict__.items():
  	    if key[0] <> '_':
  		dict[key] = value
+         # we want to write this dict in a marshal, but be especially paranoid
+         # about how we write the config.db, so we'll be as robust as possible
+         # in the face of, e.g. disk full errors.  The idea is that we want to
+         # guarantee that config.db is always valid.  The old way had the bad
+         # habit of writing an incomplete file, which happened to be a valid
+         # (but bogus) marshal.
+         fname = os.path.join(self._full_path, 'config.db')
+         fname_tmp = fname + '.tmp.' + `os.getpid()`
+         fname_last = fname + ".last"
+         omask = os.umask(007)
          try:
!             try:
!                 fp = open(fname_tmp, 'w')
!                 marshal.dump(dict, fp)
!                 fp.close()
!             except IOError, status:
!                 os.unlink(fname_tmp)
!                 self.LogMsg('error',
!                             'Failed config.db file write, retaining old state'
!                             '\n %s' % `status.args`)
!                 Utils.reraise()
!             # now move config.db -> config.db.last
!             # then move config.db.tmp.xxx -> config.db
!             aside_new(fname, fname_last)
!             aside_new(fname_tmp, fname)
!         finally:
!             os.umask(omask)
          self.CheckHTMLArchiveDir()
  
      def Load(self, check_version = 1):