Hi all,
I have this in my error log:
Nov 27 09:53:19 2008 post(18252): Traceback (most recent call last): post(18252): File "/usr/lib/mailman/scripts/post", line 69, in ? post(18252): main() post(18252): File "/usr/lib/mailman/scripts/post", line 64, in main post(18252): tolist=1, _plaintext=1) post(18252): File "/usr/lib/mailman/Mailman/Queue/Switchboard.py", line 137, in enqueue post(18252): os.fsync(fp.fileno()) post(18252): OSError : [Errno 5] Input/output error post(19319): File "/usr/lib/mailman/scripts/post", line 64, in main post(19319): tolist=1, _plaintext=1) post(19319): File "/usr/lib/mailman/Mailman/Queue/Switchboard.py", line 137, in enqueue post(19319): os.fsync(fp.fileno()) post(19319): OSError : [Errno 5] Input/output error Nov 28 11:49:40 2008 post(927): Traceback (most recent call last): post(927): File "/usr/lib/mailman/scripts/post", line 69, in ? post(927): main() post(927): File "/usr/lib/mailman/scripts/post", line 64, in main post(927): tolist=1, _plaintext=1) post(927): File "/usr/lib/mailman/Mailman/Queue/Switchboard.py", line 137, in enqueue post(927): os.fsync(fp.fileno()) post(927): OSError : [Errno 5] Input/output error
And my in looks like:
./in: total 3.4M -rw-rw---- 1 root mailman 456K Nov 27 09:48 1227775710.928695+a44f4a2c5af5927c529c6c4a5589ab71759a1f86.pck.tmp -rw-rw---- 1 root mailman 120K Nov 27 14:38 1227793170.26085+afbc8b7ceee95547b83dc3e69f055d67e53acb33.pck.tmp -rw-rw---- 1 root mailman 16M Nov 28 11:43 1227861580.003396+d73b39bd22372e715d2d80786654ef362e65fae0.pck.tmp
Things I checked are:
- free disk space: more than enough
- free inodes: more than enough
- no SELinux or other security/restriction framework installed
- bin/check_perms does not list any errors
- other email is delivered
Any other ideas what to check? Upgrading Mailmain is an option, but I would like to avoid that, if possible.
Thanks for all feedback, Richard
Further info:
Restarting mailman gives me
# /etc/init.d/mailman restart Shutting down mailman
done
rm: cannot remove /var/lib/mailman/locks/*': No such file or directory Starting mailmanrm: cannot remove
/var/lib/mailman/locks/*': No such
file or directory
# ls -l /var/lib/mailman/locks/
total 8
-rw-rw-r-- 2 mailman mailman 49 Dec 2 2008 master-qrunner
-rw-rw-r-- 2 mailman mailman 49 Dec 2 2008 master-qrunner.plesk1.6444
#
I tried stopping mailman, moving the messages to shunt/, starting Mailman again and then running bin/unshunt, but that does not work, either. Neither does this produce any new log output.
Any and all ideas appreciated.. Richard
Even more info (should have stated that in the first email): This is a NFS share. So it might 'just' be a crappy net connection. Any thoughts about this patch which I plan to apply locally? Beware evil GMail linebreaks.. --- /usr/lib/mailman/Mailman/Queue/Switchboard.py.orig 2008-12-01 11:46:31.524425955 +0100 +++ /usr/lib/mailman/Mailman/Queue/Switchboard.py 2008-12-01 11:48:39.676765175 +0100 @@ -134,7 +134,19 @@ fp.write(msgsave) cPickle.dump(data, fp, protocol) fp.flush() - os.fsync(fp.fileno()) + # os.fsync(fp.fileno()) + # Sometimes, the sync to our NFS share fails. This retries + # nine times and then gives up -- RichiH 081201 + for trial in xrange(10): + try: + os.fsync(fp.fileno()) + except OSError, e: + if trial == 9 or e.errno != errno.EIO: + raise + time.sleep(1) + continue + else: + break finally: fp.close() finally: Thanks, Richars
Richard Hartmann wrote:
Even more info (should have stated that in the first email):
This is a NFS share. So it might 'just' be a crappy net connection.
Any thoughts about this patch which I plan to apply locally? Beware evil GMail linebreaks..
--- /usr/lib/mailman/Mailman/Queue/Switchboard.py.orig 2008-12-01 11:46:31.524425955 +0100 +++ /usr/lib/mailman/Mailman/Queue/Switchboard.py 2008-12-01 11:48:39.676765175 +0100 @@ -134,7 +134,19 @@ fp.write(msgsave) cPickle.dump(data, fp, protocol) fp.flush() - os.fsync(fp.fileno()) + # os.fsync(fp.fileno()) + # Sometimes, the sync to our NFS share fails. This retries + # nine times and then gives up -- RichiH 081201 + for trial in xrange(10): + try: + os.fsync(fp.fileno()) + except OSError, e: + if trial == 9 or e.errno != errno.EIO: + raise + time.sleep(1) + continue + else: + break finally: fp.close() finally:
The patch looks OK. Let us know how it works. -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, 1 Dec 2008, Richard Hartmann wrote:
Restarting mailman gives me
# /etc/init.d/mailman restart Shutting down mailman
done
rm: cannot remove
/var/lib/mailman/locks/*': No such file or directory Starting mailmanrm: cannot remove
/var/lib/mailman/locks/*': No such file or directory
try this:
./mailmanctl stop
./mailmanctl --stale-lock-cleanup start
-- Yours, J.A. Terranson sysadmin_at_mfn.org 0xpgp_key_mgmt_is_broken-dont_bother
"Never belong to any party, always oppose privileged classes and public plunderers, never lack sympathy with the poor, always remain devoted to the public welfare, never be satisfied with merely printing news, always be drastically independent, never be afraid to attack wrong, whether by predatory plutocracy or predatory poverty."
Joseph Pulitzer 1907 Speech
On Mon, Dec 1, 2008 at 15:14, J.A. Terranson measl@mfn.org wrote:
try this:
./mailmanctl stop ./mailmanctl --stale-lock-cleanup start
Nope, the warnings are still there. I don't really care about those, though. My main concern is to make Mailman reliably relay list mail.
Thanks though! Rihcard
Richard Hartmann wrote:
Further info:
Restarting mailman gives me
# /etc/init.d/mailman restart Shutting down mailman
done
rm: cannot remove
/var/lib/mailman/locks/*': No such file or directory Starting mailmanrm: cannot remove
/var/lib/mailman/locks/*': No such file or directory # ls -l /var/lib/mailman/locks/ total 8 -rw-rw-r-- 2 mailman mailman 49 Dec 2 2008 master-qrunner -rw-rw-r-- 2 mailman mailman 49 Dec 2 2008 master-qrunner.plesk1.6444 #
First of all, your /etc/init.d/mailman script come from plesk or some other packager. Our suggested script doesn't attempt to remove lock files.
Second, I'm guessing, but if the rm /var/lib/mailman/locks/* comes after "mailmanctl stop" it is normal that there are no locks, although your ls seems to show a master lock which may be stale. Is there a pid 6444 running?
Still guessing, but perhaps your script does something like
mailmanctl stop rm /var/lib/mailman/locks/* ... rm /var/lib/mailman/locks/* mailmanctl start
in order to remove all locks after stoping and before starting. Then tha absence of locks would be normal, and the locks you see from 'ls' are the ones just created when Mailman started.
Finally, if "rm /var/lib/mailman/locks/*" says "cannot remove `/var/lib/mailman/locks/*': No such file or directory" when there are files, this is not a Mailman question.
I tried stopping mailman, moving the messages to shunt/, starting Mailman again and then running bin/unshunt, but that does not work, either. Neither does this produce any new log output.
You indicated files named *.pck.tmp in /in/
These won't be processed by unshunt. If you want to reprocess them, just leave them in the /in/ queue and remove the .tmp from the name. Then IncomingRunner will process them. However, since the write to flush and sync the file failed, the file may be incomplete. You should dump the file with "bin/dumpdb -p" to make sure it contains both a message and a metadata object before renaming it.
Do not unshunt files that weren't shunted to begin with.
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, Dec 1, 2008 at 18:01, Mark Sapiro mark@msapiro.net wrote:
First of all, your /etc/init.d/mailman script come from plesk or some other packager. Our suggested script doesn't attempt to remove lock files.
Yes, it's Plesk (unfortunately).
You should dump the file with "bin/dumpdb -p" to make sure it contains both a message and a metadata object before renaming it.
# /usr/lib/mailman/bin/dumpdb -p /var/lib/mailman/qfiles/shunt/1227775710.928695+a44f4a2c5af5927c529c6c4a5589ab71759a1f86.pck.tmp [----- start pickle file -----] Traceback (most recent call last): File "/usr/lib/mailman/bin/dumpdb", line 159, in <module> msg = main() File "/usr/lib/mailman/bin/dumpdb", line 139, in main obj = load(fp) ValueError: insecure string pickle # /usr/lib/mailman/bin/dumpdb -p /var/lib/mailman/qfiles/shunt/1227793170.26085+afbc8b7ceee95547b83dc3e69f055d67e53acb33.pck.tmp [----- start pickle file -----] Traceback (most recent call last): File "/usr/lib/mailman/bin/dumpdb", line 159, in <module> msg = main() File "/usr/lib/mailman/bin/dumpdb", line 139, in main obj = load(fp) ValueError: insecure string pickle # /usr/lib/mailman/bin/dumpdb -p /var/lib/mailman/qfiles/shunt/1227861580.003396+d73b39bd22372e715d2d80786654ef362e65fae0.pck.tmp [----- start pickle file -----] Traceback (most recent call last): File "/usr/lib/mailman/bin/dumpdb", line 159, in <module> msg = main() File "/usr/lib/mailman/bin/dumpdb", line 139, in main obj = load(fp) ValueError: insecure string pickle #
Do not unshunt files that weren't shunted to begin with.
OK, thanks. I added your hint to my personal wiki. Very useful!
Richard
Richard Hartmann wrote:
You should dump the file with "bin/dumpdb -p" to make sure it contains both a message and a metadata object before renaming it.
# /usr/lib/mailman/bin/dumpdb -p /var/lib/mailman/qfiles/shunt/1227775710.928695+a44f4a2c5af5927c529c6c4a5589ab71759a1f86.pck.tmp [----- start pickle file -----] Traceback (most recent call last): File "/usr/lib/mailman/bin/dumpdb", line 159, in <module> msg = main() File "/usr/lib/mailman/bin/dumpdb", line 139, in main obj = load(fp) ValueError: insecure string pickle
It looks like the files are corrupt. You may be able to see something of the contents with 'strings', but Mailman won't be able to process them as they are.
If strings allows you to actually recover a message from the file, you can re-post that message with bin/inject.
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, Dec 1, 2008 at 18:24, Mark Sapiro mark@msapiro.net wrote:
It looks like the files are corrupt. You may be able to see something of the contents with 'strings', but Mailman won't be able to process them as they are.
If strings allows you to actually recover a message from the file, you can re-post that message with bin/inject.
Thank you very much! For reference, how should the output of dumpdb -p look like if the files were OK?
Richard
Richard Hartmann wrote:
For reference, how should the output of dumpdb -p look like if the files were OK?
Something like the following:
[----- start pickle file -----] <----- start object 1 ----->
(this section has the raw message text)
<----- start object 2 -----> { '_parsemsg': True, 'listname': 'list1', 'received_time': 1199323852.640625, 'tolist': 1, 'version': 3} [----- end pickle file -----]
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, Dec 1, 2008 at 18:39, Mark Sapiro mark@msapiro.net wrote:
Something like the following:
Thanks yet again!
Richard
participants (3)
-
J.A. Terranson
-
Mark Sapiro
-
Richard Hartmann