[Bug 1184376] [NEW] REST server crashes on "reopen"
Public bug reported: Mailman's REST server crashes on the "mailman reopen" command. Here are the only lines I get in the logs: May 26 12:29:01 2013 (30830) Master started May 26 12:29:05 2013 (30844) RESTRunner runner started. May 26 12:29:05 2013 (30844) Starting REST server May 26 12:29:05 2013 (30837) DigestRunner runner started. May 26 12:29:05 2013 (30843) RetryRunner runner started. May 26 12:29:05 2013 (30848) CommandRunner runner started. May 26 12:29:05 2013 (30845) BounceRunner runner started. May 26 12:29:05 2013 (30846) LMTPRunner runner started. May 26 12:29:05 2013 (30850) NNTPRunner runner started. May 26 12:29:05 2013 (30838) PipelineRunner runner started. May 26 12:29:06 2013 (30839) ArchiveRunner runner started. May 26 12:29:06 2013 (30849) VirginRunner runner started. May 26 12:29:06 2013 (30836) IncomingRunner runner started. May 26 12:29:06 2013 (30840) OutgoingRunner runner started. May 26 12:29:33 2013 (30830) Master watcher caught SIGHUP. Re-opening log files. May 26 12:29:33 2013 (30848) CommandRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30843) RetryRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30838) PipelineRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30837) DigestRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30839) ArchiveRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30836) IncomingRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30846) LMTPRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30849) VirginRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30845) BounceRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30850) NNTPRunner runner caught SIGHUP. Reopening logs. May 26 12:29:33 2013 (30840) OutgoingRunner runner caught SIGHUP. Reopening logs. As you can see in the "caught SIGHUP" lines, there is nothing about the REST server, and indeed there is no "runner --runner=rest:0:1" process running anymore. I have to restart mailman to get the REST server back online. ** Affects: mailman Importance: Undecided Status: New ** Tags: mailman3 rest -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
After looking at the source code, it seems to come from the fact that the REST runner has the intercept_signals attribute set to False. Any reason for that ? -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
To be honest, I don't remember why I added that. Note that the REST server is the only one that doesn't intercept signals. I tried commenting out that line (since the default is True) and cli testing seemed just fine. The test suite passes too. I wonder if it was due to a problem in Python 2.6 which perhaps is fixed now in 2.7? Try commenting out line 43 in rest.py and see if things work better for you. If so, then perhaps we can just remove this mis-feature. I sure wish I remembered why I added it though. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
I've been running without this line for a few days already, and it seems to work fine. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
Okay, I've commented out the line in RESTRunner, but not removed the basic functionality to override interception of signals. That way, if I remember why I added that in the first place, it will be easier to re- enable. OTOH, if nothing bad happens, then we should eventually remove the flag. ** Changed in: mailman Milestone: None => 3.0.0b4 ** Changed in: mailman Assignee: (unassigned) => Barry Warsaw (barry) ** Changed in: mailman Importance: Undecided => High ** Changed in: mailman Status: New => Fix Committed -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
OK, now I know why you added this flag. Removing it does not work, the REST server still exits on reopen (I don't know how I missed this, it used to work). But it's an exit, not a crash. In the logs I get "REST server exiting", because the server gets an interrupted system call error. This is very probably linked to http://bugs.python.org/issue7978. In python upstream they suggested to simply restart the select call, so I'll prepare a patch to restart the serve_forever() call. Can you think of a reason where we would really want the REST server to exit on an interrupted system call ? ** Bug watch added: Python Roundup #7978 http://bugs.python.org/issue7978 ** Changed in: mailman Status: Fix Committed => In Progress -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
Here's the patch to restart the REST server on an interrupted system call. I tried to make use of the existing runner infrastructure so it's a short patch, but it may not be a good idea (I don't want to hijack something not designed for this purpose). ** Patch added: "mailman-1184376.patch" https://bugs.launchpad.net/mailman/+bug/1184376/+attachment/3693446/+files/m... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
I generally like that you're using the runner infrastructure for this. Is there any chance you can come up with a unittest case for this? (I'm still working on one but haven't gotten it working yet.) -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
Here's a testcase that I finally managed to get working. Download the attached file to src/mailman/runners/tests/test_rest.py. I'm not sure it's the best way to test for such a case, but at least it works (it fails without my patch and passes with it). ** Attachment added: "test_rest.py" https://bugs.launchpad.net/mailman/+bug/1184376/+attachment/3699803/+files/t... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
This turned out to be a more extensive branch than anticipated because of weirdness in three competing constraints. To properly shutdown the WSGI server, the .shutdown() method must be called in a separate thread from the server. But because of SQLite constraints, the server must run in the main thread, and because of Python signal handling semantics, the signal handlers must run in the main thread. I solved this, by running a subthread that just waits on an Event and then calls the .shutdown(). The signal handler fires the Event. I also greatly simplified the bin/runner implementation and removed some command line options that weren't really being used. Thanks for the test code! I did a little bit of refactoring, but it was very helpful. ** Changed in: mailman Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
** Changed in: mailman Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1184376 Title: REST server crashes on "reopen" To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1184376/+subscriptions
participants (2)
-
Aurélien Bompard -
Barry Warsaw