data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
I have encountered a strange issue on both lists.mailman3.org and mail.python.org. Namely, Mailman3 is not started upon reboot.
If I reboot the server Mailman 3 is shut down normally but is never started when the server comes up. A typical mailman.log excerpt is
(the '%s' should be fixed by https://gitlab.com/mailman/mailman/merge_requests/311). The entries from 10:12:07 are the normal shutdown from the reboot. The very next thing is at 10:21:24 when I manually did 'service mailman start'. The /etc/init.d/mailman3 script is attached as mailman.txt. This has not changed recently. The only thing that has changed is the /etc/init.d/qcluster script (attached as qcluster.txt). This existed before, but was changed to use start-stop-daemon and to add a 'stop' function.
I also did 'update-rc.d qcluster defaults' to create the rc*.d entries for qcluster which I had neglected to do before. The entries for mailman3 are there (unchanged in over a year).
Does anyone have an idea why Mailman 3 is not starting on reboot?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 08:50 AM, Mark Sapiro wrote:
The issue seems unrelated to qcluster. As a test, I stopped qcluster, removed /etc/init.d/qcluster and the /etc/rc*.d entries and rebooted the server and mailman still wasn't started on reboot.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/500b6/500b6db67c37c4615bc60a35e5ade42e0af5ac6f" alt=""
It would be interesting to know if the problem is in the init script or Mailman. You may have to crank up logging of both to find out. Also, I wonder if the same problem occurs if you use systemd?
Other than that I don’t have any ideas.
-Barry
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 12:30 PM, Barry Warsaw wrote:
It would be interesting to know if the problem is in the init script or Mailman. You may have to crank up logging of both to find out. Also, I wonder if the same problem occurs if you use systemd?
It appears to be in Mailman. I added some logging to the init.d/mailman script as follows: --- /etc/init.d/mailman.bak 2017-09-03 00:55:54.720770063 +0000 +++ /etc/init.d/mailman 2017-09-03 00:41:49.569395653 +0000 @@ -15,6 +15,8 @@ USER=mailman GROUP=mailman +date >> /etc/mm_script +echo $@ >> /etc/mm_script # Exit if the package is not installed [ -x "$DAEMON" ] || exit 0 @@ -28,9 +30,11 @@ case "$1" in start) + echo 'starting' >> /etc/mm_script [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME" # use --force to remove a stale lock. sudo -u $USER $DAEMON start --force + echo 'started' >> /etc/mm_script ;; stop) [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME" Then I rebooted and got this in /etc/mm_script Sun Sep 3 00:42:25 UTC 2017 stop Sun Sep 3 00:43:26 UTC 2017 start starting started but Mailman wasn't running so I did 'sudo service mailman start' and got an additional Sun Sep 3 00:44:38 UTC 2017 start starting started in /etc/mm_script, after which Mailman was running. mailman.log has ... Sep 03 00:42:33 2017 (2636) nntp runner caught SIGTERM. Stopping. Sep 03 00:42:33 2017 (2636) nntp runner exiting. Sep 03 00:42:33 2017 (2639) rest runner caught SIGTERM. Stopping. Sep 03 00:42:34 2017 (2639) rest runner exiting. Sep 03 00:42:34 2017 (2616) Master stopped Sep 03 00:44:44 2017 (2572) Master started Sep 03 00:44:50 2017 (2590) in runner started. Sep 03 00:44:50 2017 (2596) retry runner started. Sep 03 00:44:51 2017 (2595) rest runner started. Sep 03 00:44:51 2017 (2588) bounces runner started. ... note that the 'stop' from the reboot produced messages ending with Sep 03 00:42:34 2017 (2616) Master stopped and the next message in the log is Sep 03 00:44:44 2017 (2572) Master started from the manual start. This issue began after I installed changes including commit ae0042a90220119414f61aeb20c6b58bfacb8af2 affecting mailman/bin/mailman.py and mailman/bin/master.py. I haven't gone through those changes in detail, but I suspect they have something to do with it. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 06:27 PM, Mark Sapiro wrote:
This get's stranger and stranger.
I removed the logging from the init.d script which invokes /opt/mailman/mailman-bundler/bin/mailman with the appropriate start/stop option. I then added logging to it. Here it is:
#!/bin/bash unset PYTHONSTARTUP
echo date
At bundler/bin/mailman >> /opt/mailman/temp_log
exec /opt/mailman/mailman-bundler/venv-3.4/bin/mailman
-C "/opt/mailman/mailman-bundler/deployment/mailman.cfg"
$@
so when invoked it prints a timestamp and 'At bundler/bin/mailman'.
I also added logging to /opt/mailman/mailman-bundler/venv-3.4/bin/mailman as follows:
#!/usr/local/bin/python3 # EASY-INSTALL-ENTRY-SCRIPT: 'mailman==3.2.0a1','console_scripts','mailman' __requires__ = 'mailman==3.2.0a1' import re import sys import time from pkg_resources import load_entry_point
if __name__ == '__main__': with open('/opt/mailman/temp_log', 'a') as fp: print('{} At venv bin/mailman __main__'.format(time.asctime()), file=fp) sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]) sys.exit( load_entry_point('mailman==3.2.0a1', 'console_scripts', 'mailman')() )
so it prints a time stamp and 'At venv bin/mailman __main__' in the if __name__ == '__main__': block.
Finally, I added a bit to the main() function in mailman/bin/mailman.py so it reads (a bunch of decorators omitted):
def main(ctx, config_file):
# XXX https://github.com/pallets/click/issues/303
"""
The GNU Mailman mailing list management system
Copyright 1998-2017 by the Free Software Foundation, Inc.
http://www.list.org
"""
# click handles dispatching to the subcommand via the Subcommands class.
import time
with open('/opt/mailman/temp_log', 'a') as fp:
print('{} At mailman.main'.format(time.asctime()), file=fp)
Then I rebooted and subsequently manually ran sudo service mailman start. I got this in temp_log
Tue Sep 5 03:10:57 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:10:58 2017 At venv bin/mailman __main__ Tue Sep 5 03:11:57 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:11:58 2017 At venv bin/mailman __main__ Tue Sep 5 03:15:17 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:15:18 2017 At venv bin/mailman __main__ Tue Sep 5 03:15:22 2017 At mailman.main
The 03:10:.. entries are from the 'stop' from the reboot. The absence of 'At mailman.main' may be significant. All the normal shutdown messages are in mailman.log about 6-7 seconds later. But, if I do 'sudo service mailman stop' there is an 'At mailman.main' message logged.
Then the 3:11:.. entries are from init running the init.d/mailman script and indicate that /opt/mailman/mailman-bundler/bin/mailman and /opt/mailman/mailman-bundler/venv-3.4/bin/mailman were both invoked, but there is nothing in mailman.log and no 'At mailman.main' message. Then the 03:15:.. entries are from when I manually started Mailman.
It appears that
load_entry_point('mailman==3.2.0a1', 'console_scripts', 'mailman')()
is being executed by /opt/mailman/mailman-bundler/venv-3.4/bin/mailman, but either somehow the 'start' argument is lost or something after that fails. I don't know how the 'start' argument could be getting lost, but I don't know what else could be failing either.
Any hints?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
I'm beginning to develop some understanding of this. I think it is not related to the changes for plugins, but rather is related to the use of click and possibly something to do with run levels.
I have put logging in /opt/mailman/mailman-bundler/venv-3.4/bin/mailman to report the command that invoked it before it in turn invokes mailman.bin.mailman.main(), and I put logging there to report the fact that control got there.
The log looks like
Tue Sep 5 15:42:57 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'stop'] Tue Sep 5 15:42:58 2017 mailman.py - main() Tue Sep 5 15:43:08 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:43:09 2017 mailman.py - main() Tue Sep 5 15:47:16 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'stop'] Tue Sep 5 15:48:13 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:58:39 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:58:47 2017 mailman.py - main()
The 15:42:.. entries are from a manual stop. The 15:43:.. entries are from a manual start. The 15:47:16 entry is the 'stop' from a reboot - note no 'mailman.py - main()' here, and the 15:48:13 entry is the 'start' from the reboot - again no 'mailman.py - main()' here. Finally, the 15:58:.., entries are from a manual start.
There are several 'click' decorators on the mailman.bin.mailman.main() function, and I suspect something there is detecting the run level and just quitting if a shutdown or startup is in progress.
I would like to try systemd instead, but both lists.mailman3.org and mail.python.org are on Ubuntu 14.04 which doesn't support systemd.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/4ee39/4ee39736a7f329639040405ca61a25f11cd96358" alt=""
Hi Mark,
I did just a quick search and read about issues related to Click and init scripts. See below.
On Tue, Sep 5, 2017, at 09:19 AM, Mark Sapiro wrote:
I *looks* like your observation about the click *might* be correct, I'd have to test more to find out which can't for next few days. Here is an excerpt from the click documentation1:
Click in Python 3 does all the Unicode handling in the standard library and is subject to its behavior. In Python 2, Click does all the Unicode handling itself, which means there are differences in error behavior.
The most glaring difference is that in Python 2, Unicode will “just work”, while in Python 3, it requires extra care. The reason for this is that in Python 3, the encoding detection is done in the interpreter, and on Linux and certain other operating systems, its encoding handling is problematic.
The biggest source of frustration is that Click scripts invoked by init systems (sysvinit, upstart, systemd, etc.), deployment tools (salt, puppet), or cron jobs (cron) will refuse to work unless a Unicode locale is exported.
If Click encounters such an environment it will prevent further execution to force you to set a locale. This is done because Click cannot know about the state of the system once it’s invoked and restore the values before Python’s Unicode handling kicked in.
It does mention that an error trace will show up when there are locale problems with Python 3, but I guess you don't see any of that. Maybe it is suppressed somewhere?
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/04/2017 02:46 PM, Simon Hanna wrote:
What are you running? I think the issue is caused by
commit ae0042a90220119414f61aeb20c6b58bfacb8af2
are you running with that?
That isn't the issue here.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 08:50 AM, Mark Sapiro wrote:
The issue seems unrelated to qcluster. As a test, I stopped qcluster, removed /etc/init.d/qcluster and the /etc/rc*.d entries and rebooted the server and mailman still wasn't started on reboot.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/500b6/500b6db67c37c4615bc60a35e5ade42e0af5ac6f" alt=""
It would be interesting to know if the problem is in the init script or Mailman. You may have to crank up logging of both to find out. Also, I wonder if the same problem occurs if you use systemd?
Other than that I don’t have any ideas.
-Barry
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 12:30 PM, Barry Warsaw wrote:
It would be interesting to know if the problem is in the init script or Mailman. You may have to crank up logging of both to find out. Also, I wonder if the same problem occurs if you use systemd?
It appears to be in Mailman. I added some logging to the init.d/mailman script as follows: --- /etc/init.d/mailman.bak 2017-09-03 00:55:54.720770063 +0000 +++ /etc/init.d/mailman 2017-09-03 00:41:49.569395653 +0000 @@ -15,6 +15,8 @@ USER=mailman GROUP=mailman +date >> /etc/mm_script +echo $@ >> /etc/mm_script # Exit if the package is not installed [ -x "$DAEMON" ] || exit 0 @@ -28,9 +30,11 @@ case "$1" in start) + echo 'starting' >> /etc/mm_script [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME" # use --force to remove a stale lock. sudo -u $USER $DAEMON start --force + echo 'started' >> /etc/mm_script ;; stop) [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME" Then I rebooted and got this in /etc/mm_script Sun Sep 3 00:42:25 UTC 2017 stop Sun Sep 3 00:43:26 UTC 2017 start starting started but Mailman wasn't running so I did 'sudo service mailman start' and got an additional Sun Sep 3 00:44:38 UTC 2017 start starting started in /etc/mm_script, after which Mailman was running. mailman.log has ... Sep 03 00:42:33 2017 (2636) nntp runner caught SIGTERM. Stopping. Sep 03 00:42:33 2017 (2636) nntp runner exiting. Sep 03 00:42:33 2017 (2639) rest runner caught SIGTERM. Stopping. Sep 03 00:42:34 2017 (2639) rest runner exiting. Sep 03 00:42:34 2017 (2616) Master stopped Sep 03 00:44:44 2017 (2572) Master started Sep 03 00:44:50 2017 (2590) in runner started. Sep 03 00:44:50 2017 (2596) retry runner started. Sep 03 00:44:51 2017 (2595) rest runner started. Sep 03 00:44:51 2017 (2588) bounces runner started. ... note that the 'stop' from the reboot produced messages ending with Sep 03 00:42:34 2017 (2616) Master stopped and the next message in the log is Sep 03 00:44:44 2017 (2572) Master started from the manual start. This issue began after I installed changes including commit ae0042a90220119414f61aeb20c6b58bfacb8af2 affecting mailman/bin/mailman.py and mailman/bin/master.py. I haven't gone through those changes in detail, but I suspect they have something to do with it. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/02/2017 06:27 PM, Mark Sapiro wrote:
This get's stranger and stranger.
I removed the logging from the init.d script which invokes /opt/mailman/mailman-bundler/bin/mailman with the appropriate start/stop option. I then added logging to it. Here it is:
#!/bin/bash unset PYTHONSTARTUP
echo date
At bundler/bin/mailman >> /opt/mailman/temp_log
exec /opt/mailman/mailman-bundler/venv-3.4/bin/mailman
-C "/opt/mailman/mailman-bundler/deployment/mailman.cfg"
$@
so when invoked it prints a timestamp and 'At bundler/bin/mailman'.
I also added logging to /opt/mailman/mailman-bundler/venv-3.4/bin/mailman as follows:
#!/usr/local/bin/python3 # EASY-INSTALL-ENTRY-SCRIPT: 'mailman==3.2.0a1','console_scripts','mailman' __requires__ = 'mailman==3.2.0a1' import re import sys import time from pkg_resources import load_entry_point
if __name__ == '__main__': with open('/opt/mailman/temp_log', 'a') as fp: print('{} At venv bin/mailman __main__'.format(time.asctime()), file=fp) sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]) sys.exit( load_entry_point('mailman==3.2.0a1', 'console_scripts', 'mailman')() )
so it prints a time stamp and 'At venv bin/mailman __main__' in the if __name__ == '__main__': block.
Finally, I added a bit to the main() function in mailman/bin/mailman.py so it reads (a bunch of decorators omitted):
def main(ctx, config_file):
# XXX https://github.com/pallets/click/issues/303
"""
The GNU Mailman mailing list management system
Copyright 1998-2017 by the Free Software Foundation, Inc.
http://www.list.org
"""
# click handles dispatching to the subcommand via the Subcommands class.
import time
with open('/opt/mailman/temp_log', 'a') as fp:
print('{} At mailman.main'.format(time.asctime()), file=fp)
Then I rebooted and subsequently manually ran sudo service mailman start. I got this in temp_log
Tue Sep 5 03:10:57 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:10:58 2017 At venv bin/mailman __main__ Tue Sep 5 03:11:57 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:11:58 2017 At venv bin/mailman __main__ Tue Sep 5 03:15:17 UTC 2017 At bundler/bin/mailman Tue Sep 5 03:15:18 2017 At venv bin/mailman __main__ Tue Sep 5 03:15:22 2017 At mailman.main
The 03:10:.. entries are from the 'stop' from the reboot. The absence of 'At mailman.main' may be significant. All the normal shutdown messages are in mailman.log about 6-7 seconds later. But, if I do 'sudo service mailman stop' there is an 'At mailman.main' message logged.
Then the 3:11:.. entries are from init running the init.d/mailman script and indicate that /opt/mailman/mailman-bundler/bin/mailman and /opt/mailman/mailman-bundler/venv-3.4/bin/mailman were both invoked, but there is nothing in mailman.log and no 'At mailman.main' message. Then the 03:15:.. entries are from when I manually started Mailman.
It appears that
load_entry_point('mailman==3.2.0a1', 'console_scripts', 'mailman')()
is being executed by /opt/mailman/mailman-bundler/venv-3.4/bin/mailman, but either somehow the 'start' argument is lost or something after that fails. I don't know how the 'start' argument could be getting lost, but I don't know what else could be failing either.
Any hints?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
I'm beginning to develop some understanding of this. I think it is not related to the changes for plugins, but rather is related to the use of click and possibly something to do with run levels.
I have put logging in /opt/mailman/mailman-bundler/venv-3.4/bin/mailman to report the command that invoked it before it in turn invokes mailman.bin.mailman.main(), and I put logging there to report the fact that control got there.
The log looks like
Tue Sep 5 15:42:57 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'stop'] Tue Sep 5 15:42:58 2017 mailman.py - main() Tue Sep 5 15:43:08 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:43:09 2017 mailman.py - main() Tue Sep 5 15:47:16 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'stop'] Tue Sep 5 15:48:13 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:58:39 2017 ['/opt/mailman/mailman-bundler/venv-3.4/bin/mailman', '-C', '/opt/mailman/mailman-bundler/deployment/mailman.cfg', 'start', '--force'] Tue Sep 5 15:58:47 2017 mailman.py - main()
The 15:42:.. entries are from a manual stop. The 15:43:.. entries are from a manual start. The 15:47:16 entry is the 'stop' from a reboot - note no 'mailman.py - main()' here, and the 15:48:13 entry is the 'start' from the reboot - again no 'mailman.py - main()' here. Finally, the 15:58:.., entries are from a manual start.
There are several 'click' decorators on the mailman.bin.mailman.main() function, and I suspect something there is detecting the run level and just quitting if a shutdown or startup is in progress.
I would like to try systemd instead, but both lists.mailman3.org and mail.python.org are on Ubuntu 14.04 which doesn't support systemd.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/4ee39/4ee39736a7f329639040405ca61a25f11cd96358" alt=""
Hi Mark,
I did just a quick search and read about issues related to Click and init scripts. See below.
On Tue, Sep 5, 2017, at 09:19 AM, Mark Sapiro wrote:
I *looks* like your observation about the click *might* be correct, I'd have to test more to find out which can't for next few days. Here is an excerpt from the click documentation1:
Click in Python 3 does all the Unicode handling in the standard library and is subject to its behavior. In Python 2, Click does all the Unicode handling itself, which means there are differences in error behavior.
The most glaring difference is that in Python 2, Unicode will “just work”, while in Python 3, it requires extra care. The reason for this is that in Python 3, the encoding detection is done in the interpreter, and on Linux and certain other operating systems, its encoding handling is problematic.
The biggest source of frustration is that Click scripts invoked by init systems (sysvinit, upstart, systemd, etc.), deployment tools (salt, puppet), or cron jobs (cron) will refuse to work unless a Unicode locale is exported.
If Click encounters such an environment it will prevent further execution to force you to set a locale. This is done because Click cannot know about the state of the system once it’s invoked and restore the values before Python’s Unicode handling kicked in.
It does mention that an error trace will show up when there are locale problems with Python 3, but I guess you don't see any of that. Maybe it is suppressed somewhere?
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
On 09/04/2017 02:46 PM, Simon Hanna wrote:
What are you running? I think the issue is caused by
commit ae0042a90220119414f61aeb20c6b58bfacb8af2
are you running with that?
That isn't the issue here.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Abhilash Raj
-
Barry Warsaw
-
Mark Sapiro
-
Simon Hanna