(Note PS at bottom!)
Hi. I'm prepping to migrate a bunch of lists (one at a time, due to huge number of lists and huge size of archives) from one server to another, and I've hit a snag with the first list I'm trying. After migrating the list (as described below), I can go to the lists admindb page on the new server and get the list of pending requests that was on the old server, but immediately the request database gets truncated. (It stays a valid pickle file, it just gets all the requests emptied out of it, so the file itself is much shorter but non-zero length.)
This happens *when I load* the admindb page, not when I submit the form. That seems really weird to me, since I'd expect a page load would only read the pickle, not write it.
The old server is running Debian and Mailman 2.1.13 (from the Debian package). The new server is running Ubuntu and Mailman 2.1.16 (from the Ubuntu Trusty package; we need to run Trusty for now for complex and uninteresting reasons; I'd rather run 2.1.18, and may look into running that on Trusty once I get the basic migration issues resolved).
Relevant UIDs and GIDs (www-data:www-data and list:list) are the same on both systems.
Short version: I rsync -aSHov /var/lib/mailman/lists/$listname/ new-server:/var/lib/mailman/lists/$listname and similarly copy the public and private archives (preserving symlinks as needed). check_perms on both systems reveals similar errors which look cosmetic (things like rotated logs, temporary directories where I've copied things, and the like), but I haven't yet let it run to completion because of the volume of our archives. Then I change host_name via the web interface and m.web_page_url interactively with withlist (using fix_url seems not to work when changing http: to https:) and m.Save().
One *possibly* relevant detail is that the new host doesn't currently have a valid certificate. (It's using the old host's cert, and I manually allow the exception in my web browser for testing.) But for Mailman 2, the only http{,s} traffic should be sent from my browser, right?
This kind of has the feel of a permissions problem, but clearly the CGI scripts can read from and write to the request.pck database. (And changes to the list config data in config.pck seem to be working normally.) As I said, check_perms hasn't run to completion yet because it's plowing through the (already pre-rsync'ed) archives, but it got through the things in /var/lib/mailman/lists and didn't find anything wrong with this list.
There's nothing interesting in the Mailman logs (which Debian/Ubuntu put in /var/log/mailman), and the only thing in the Apache error logs is a warning that the cert it has configured doesn't match its hostname.
Anybody have any ideas?
Jay
PS -- I composed this all last night. Today, the behavior has changed: This morning, a new message was received by the list (forwarded from the old list server to the new list server, and added to request.pck on the new server by the new Mailman installation). Now, when I load the admindb page, the old requests (which were in the request.pck copied from the old server) are all immediately thrown away (although displayed in the admindb form) but the new request which came in this morning remains. So it kind of looks like something about the old requests causes the list to think they're invalid and discard them when it loads them. I initially saw this behavior with "require_explicit_destination" on and "acceptable_aliases" empty, but turning off "require_explicit_destination" and putting just the local part of the list address in "acceptable_aliases" doesn't make any difference.
-- Jay Sekora Linux system administrator and postmaster, The Infrastructure Group MIT Computer Science and Artificial Intelligence Laboratory