disaster recovery help

My filesystem recently crashed, breaking some aspects of my mailman installation.
The email portion of the list itself is still functioning (for non-digest subscribers), and all the archives are intact. But I seem to be missing some config pickles that are preventing digest delivery, emergency moderation, and probably other features that I haven't stumbled upon yet.
All of the broken items generate the following message in the logs:
Apr 21 09:00:01 2011 (29813) couldn't load config file /var/lib/mailman/lists/.keep/config.pck Apr 21 09:00:01 2011 (29813) couldn't load config file /var/lib/mailman/lists/.keep/config.pck.last [Errno 2] No such file or directory: '/var/lib/mailman/lists/.keep/config.pck.last' Apr 21 09:00:01 2011 (29813) couldn't load config file /var/lib/mailman/lists/.keep/config.db [Errno 2] No such file or directory: '/var/lib/mailman/lists/.keep/config.db' Apr 21 09:00:01 2011 (29813) couldn't load config file /var/lib/mailman/lists/.keep/config.db.last EOF read where object expected Apr 21 09:00:01 2011 (29813) All .keep fallbacks were corrupt, giving up
Indeed, these files were lost in the filesystem crash, and I do not have backups of them.
What are my options here? Can I do something like:
- export list of users
- move the broken list out of the way
- create a new list with the same name
- resubscribe members
- copy the old archives back into the new list
What gotchas am I going to run across trying to do something like the above?
Any other suggestions?
Appreciate any guidance ...
Cheers,
-C-

Chris Haumesser wrote:
If there are or were any config.db* files, they were left after migration from Mailman 2.0.x to 2.1.x and contained old data from before the migration.
Normally it is good to remove them because if they exist and are useable, in a situation such as this Mailman may fall back to using one which is not what you want.
The above seems to indicate that there is a /var/lib/mailman/lists/.keep/config.pck, but it can't be unpickled for some reason. Is that the case?
Given the above, I am amazed that the .keep list works at all, or is it some other list?
If the list you are talking about is the list named .keep, I don't think you will even be able to export a list of users. If it is some other list, I suggest you move the /var/lib/mailman/lists/.keep/ directory somewhere else (out of the /var/lib/mailman/lists/ directory), or, if .keep was not one of your lists, maybe just remove the /var/lib/mailman/lists/.keep/ directory and its contents.
That in itself may be sufficient to fix the digests problem with other lists (because cron/senddigests is dying on the .keep list and doesn't get to the others). I'm not sure about emergency moderation, but at least (re)move that .keep/ directory, and then see what problems remain and what error log messages might be associated with them.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 4/22/11 7:52 AM, Mark Sapiro wrote:
There are no config.db files in /var/lib/mailman, they are only mentioned in the log.
Veritably so. It is an empty file (0 bytes).
Given the above, I am amazed that the .keep list works at all, or is it some other list?
I never created a list called .keep, and indeed the problems are with a different list called 'camp', which is not mentioned anywhere in the error log.
I was assuming the .keep folder had something to do with mailman internals. After reading your message, I infer that is not the case, and now suspect it is perhaps a remnant from running fsck on the filesystem.
Interestingly, none of my actual lists seem to be missing their config.pck file.
Gotta love filesystem corruption. (New backup plan is now in place ... )
Seems reasonable, I'll give it a shot.
That could certainly explain the digests problem, anyway.
Thanks for your help!
-C-

Chris Haumesser writes:
I was assuming the .keep folder had something to do with mailman internals.
That looks like a distro device to make sure that the data directories don't get deleted if you delete the package.
Possibly what is happening is that the distro's version is patched to ignore distro housekeeping. Or perhaps the disk corruption flipped the "I am a directory" bit on that, and the recovery process (fsck) actually populated it. Then mailman decided it was a list, and created a config.pck (empty) for it.

As Mark suggested, I removed /var/lib/mailman/.keep, and this resolved the immediate errors.
However, upon trying to visit the moderation page for another list, I was still getting (a now different) error in my logs about insecure permissions on a pickle. Using dumpdb, I found that pending.pck and request.pck for the list were corrupt, but config.pck was intact.
Luckily I don't think there was much, if anything, in pending.pck or request.pck. So I just removed them, and my list now seems to be back up and running. Hooray!
Thanks for your help guys.
Cheers,
-C-

Chris Haumesser wrote:
If there are or were any config.db* files, they were left after migration from Mailman 2.0.x to 2.1.x and contained old data from before the migration.
Normally it is good to remove them because if they exist and are useable, in a situation such as this Mailman may fall back to using one which is not what you want.
The above seems to indicate that there is a /var/lib/mailman/lists/.keep/config.pck, but it can't be unpickled for some reason. Is that the case?
Given the above, I am amazed that the .keep list works at all, or is it some other list?
If the list you are talking about is the list named .keep, I don't think you will even be able to export a list of users. If it is some other list, I suggest you move the /var/lib/mailman/lists/.keep/ directory somewhere else (out of the /var/lib/mailman/lists/ directory), or, if .keep was not one of your lists, maybe just remove the /var/lib/mailman/lists/.keep/ directory and its contents.
That in itself may be sufficient to fix the digests problem with other lists (because cron/senddigests is dying on the .keep list and doesn't get to the others). I'm not sure about emergency moderation, but at least (re)move that .keep/ directory, and then see what problems remain and what error log messages might be associated with them.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 4/22/11 7:52 AM, Mark Sapiro wrote:
There are no config.db files in /var/lib/mailman, they are only mentioned in the log.
Veritably so. It is an empty file (0 bytes).
Given the above, I am amazed that the .keep list works at all, or is it some other list?
I never created a list called .keep, and indeed the problems are with a different list called 'camp', which is not mentioned anywhere in the error log.
I was assuming the .keep folder had something to do with mailman internals. After reading your message, I infer that is not the case, and now suspect it is perhaps a remnant from running fsck on the filesystem.
Interestingly, none of my actual lists seem to be missing their config.pck file.
Gotta love filesystem corruption. (New backup plan is now in place ... )
Seems reasonable, I'll give it a shot.
That could certainly explain the digests problem, anyway.
Thanks for your help!
-C-

Chris Haumesser writes:
I was assuming the .keep folder had something to do with mailman internals.
That looks like a distro device to make sure that the data directories don't get deleted if you delete the package.
Possibly what is happening is that the distro's version is patched to ignore distro housekeeping. Or perhaps the disk corruption flipped the "I am a directory" bit on that, and the recovery process (fsck) actually populated it. Then mailman decided it was a list, and created a config.pck (empty) for it.

As Mark suggested, I removed /var/lib/mailman/.keep, and this resolved the immediate errors.
However, upon trying to visit the moderation page for another list, I was still getting (a now different) error in my logs about insecure permissions on a pickle. Using dumpdb, I found that pending.pck and request.pck for the list were corrupt, but config.pck was intact.
Luckily I don't think there was much, if anything, in pending.pck or request.pck. So I just removed them, and my list now seems to be back up and running. Hooray!
Thanks for your help guys.
Cheers,
-C-
participants (3)
-
Chris Haumesser
-
Mark Sapiro
-
Stephen J. Turnbull