Uncaught exceptions on corrupt files (mailman 2.1.4)

Hi Guys,
Due to running out mailman server on dodgy old hardware with a geriatric version of XFS we are quite often seeing corrupt .db an .pck files. This is Mailman 2.1.4 on Python 2.3.3
This usually results in something like:
Jan 06 16:18:11 2004 (1134) Uncaught runner exception: invalid load key, '^@'. Jan 06 16:18:11 2004 (1134) Traceback (most recent call last): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/usr/local/mailman/Mailman/Queue/Runner.py", line 130, in _onefile mlist = self._open_list(listname) File "/usr/local/mailman/Mailman/Queue/Runner.py", line 175, in _open_list mlist = MailList.MailList(listname, lock=0) File "/usr/local/mailman/Mailman/MailList.py", line 124, in __init__ self.Load() File "/usr/local/mailman/Mailman/MailList.py", line 583, in Load dict, e = self.__load(file) File "/usr/local/mailman/Mailman/MailList.py", line 556, in __load dict = loadfunc(fp) UnpicklingError: invalid load key, '^@'.
So looking at MailList.py I see:
try:
try:
dict = loadfunc(fp)
if type(dict) <> DictType:
return None, 'Load() expected to return a dictionary'
except (EOFError, ValueError, TypeError, MemoryError,
cPickle.PicklingError), e:
return None, e
finally:
fp.close()
# Update timestamp
self.__timestamp = mtime
return dict, None
in the __Load method of MailList Class.
that except statement should include cPickle.UnpicklingError? I think the error catching in Runner.py will then print out the name of the list it died on although it would be nice if MailList.py logged the filename it choked on.
e.g:
try: dict = loadfunc(fp) if type(dict) <> DictType: return None, 'Load() expected to return a dictionary' except (EOFError, ValueError, TypeError, MemoryError, cPickle.PicklingError, cPickle.UnpicklingError), e: syslog('error', 'problem reading file: %s', dbfile) return None, e
does that make sense or have I misunderstood something?
We see the same sort of thing with corrupt archive pickles. Feb 25 21:16:39 2004 (4537) Uncaught runner exception: bad marshal data Feb 25 21:16:39 2004 (4537) Traceback (most recent call last): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 110, in _oneloop self._onefile(msg, msgdata) File "/usr/local/mailman/Mailman/Queue/Runner.py", line 160, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/usr/local/mailman/Mailman/Queue/ArchRunner.py", line 73, in _dispose mlist.ArchiveMail(msg) File "/usr/local/mailman/Mailman/Archiver/Archiver.py", line 215, in ArchiveMa il h.processUnixMailbox(f) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 569, in processU nixMailbox self.add_article(a) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 615, in add_arti cle article.parentID = parentID = self.get_parent_info(arch, article) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 647, in get_pare nt_info article.subject) : File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 311, in getO ldestArticle self.__openIndices(archive) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 251, in __op enIndices t = DumbBTree(os.path.join(arcdir, archive + '-' + i)) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 65, in __ini t__ self.load() File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 170, in load self.dict = marshal.load(fp) ValueError: bad marshal data
So in HyperDatabase.py the load method of DumbBTree
def load(self):
try:
fp = open(self.path)
try:
self.dict = marshal.load(fp)
finally:
fp.close()
except IOError, e:
if e.errno <> errno.ENOENT: raise
pass
except EOFError:
pass
else:
self.__sort(dirty=1)
Nothing there to catch ValueError:
I have to say that I don't understand HyperDatabase.py because it doesn't appear to do anything with exceptions.
Can someone enlighten me?
I'm going to try the change to MailList.py on a test server at some point over the next week.
TIA, Huw
-- | Huw Lynes | The Moving Picture Company | | System Administrator | 127 Wardour Street | |.........................| London, W1F 0NL |

On Thu, 2004-02-26 at 08:09, Huw Lynes wrote:
that except statement should include cPickle.UnpicklingError?
Yep, I added that in cvs. Note that the Load() method should already log the file that couldn't be loaded, which you should see once that exception is also caught.
Side note: You might want to turn on SYNC_AFTER_WRITE.
I haven't checked this one in, but you should try adding that exception clause and see if it fixes your problem. But you're right that HyperDatabase.py might need other fixes, which I don't have time for right now.
-Barry

On Thu, 2004-02-26 at 08:09, Huw Lynes wrote:
that except statement should include cPickle.UnpicklingError?
Yep, I added that in cvs. Note that the Load() method should already log the file that couldn't be loaded, which you should see once that exception is also caught.
Side note: You might want to turn on SYNC_AFTER_WRITE.
I haven't checked this one in, but you should try adding that exception clause and see if it fixes your problem. But you're right that HyperDatabase.py might need other fixes, which I don't have time for right now.
-Barry
participants (2)
-
Barry Warsaw
-
Huw Lynes