Not getting aliases, owner notification on newlist

Hi all, hope someone can help...
I've been using Mailman 2.1.6 for some time now that I happily
compiled and installed from source on my own server. Now I've needed
to move to another server and at first I took the opportunity to
upgrade to 2.1.9. But the OutgoingRunner NEVER worked - logging is
not very verbose for all of this but when I tried running it manually
- qrunner -v -r Outgoing I got "segmentation fault".
Now I'm running a pretty standard Debian Sarge setup so I thought I'd
try the Debian package, which is 2.1.5. Now at least the Outgoing
qrunner is starting and staying up but, when I created my first, and
any subsequent lists, I do not get the list of four or five aliases
before the prompt saying "hit return to notify the list owner" nor,
when I do hit return, do I get notified of my new list.
What can I tell you...I'm using Postfix as my MTA and virtual domains
(though this problem seems to occur whether I use the default domain,
as for the site-list or any of the virtual domains, so I haven't got
as far as editing the mailman-virtuals database yet as I don't seem
to have gone that far yet.
I did follow through the FAQ entry 3.14. Check_perms did find
problems (of course I hadn't actually run that as I would with an
install I had compiled myself). I, naturally, installed the package,
using aptitude, running as root, and all the symlinks in /var/lib/
mailman pointing at /usr/lib/mailman/* had group root, instead of
group list, and check_perms -f was not able to fix those so I did so
manually.
I then deleted and recreated the site-list and still got the same
symptoms.
I notice in the logs/error I seem to get:
Mar 27 08:19:14 2007 (22935) Master qrunner detected subprocess exit
(pid: 26312, sig: 11, sts: None, class: OutgoingRunner, slice: 1/1)
[restarting]
Mar 27 08:19:14 2007 (26757) OutgoingRunner qrunner started.
Every time it tries to send a message - that one coincided with me
deleting and recreating the site list.
Can I get more verbose output than that?
There's nothing left over in the qfiles directories, though both in
and out are shown as modified at the same time as that OutgoingRunner
error above.
I watch my postfix logs while I am doing all this and see absolutely
nothing. If I try to send a message either to the site list, or to
mailman-owner, it shows up in the postfix logs passing it onto the
post command and deleting it from the postfix queues after which it
just seems to get lost, neither coming back out to the 'real' mailman-
owner nor getting syphoned off in shunt or anything like that - just
quietly dying.
Any thoughts?
TIA, Jock
-- Jock Coats Warden's Flat 1e, J Block Morrell Hall, OXFORD, OX3 0FF w: +44 (0)1865 483353 h: +44 (0)1865 485019 m: +44 (0)7769 695767 e: jock.coats@jcsolutions.co.uk www: http://jockcoats.blogspot.com/

Jock Coats wrote:
I'd go back to 2.1.9 because you have the same problem with the Debian package.
<snip>
This looks like it's from the qrunner log, not the error log, but it says OutgoingRunner was killed with signal 11 - SIGSEGV segmentation violation.
Since your outgoing runner dies in the same way with both versions, perhaps this is a Python problem, not a mailman problem.
See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.073.htp> for some debugging techniques.
With 2.1.9, there will be a .bak file left in the 'out' queue, but it will be reprocessed when outgoing runner restarts which will repeat the failure in a loop until the retry limit for qrunners is reached.
Because Mailman is unable to send any outgoing mail. It tries, OutgoingRunner is killed because of the segmentation violation and the out queue entry is lost.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 27 Mar 2007, at 16:45, Mark Sapiro wrote:
I'd go back to 2.1.9 because you have the same problem with the Debian package.
Okay - I've done that. In fact I also installed Python 2.5 from
source rather than relying on Sarge's 2.4 port just in case it was that.
I appear to have similar problems but...
Since your outgoing runner dies in the same way with both versions, perhaps this is a Python problem, not a mailman problem.
I do not suspect Python any more, but Postfix itself.,,
Hacking the SMTPDirect.py handler to increase the debugging
information produced the following lines in logs/error (for sure this
time - you were right it was logs/qrunner last time I quoted!):
Mar 27 20:04:00 2007 qrunner(21277): connect: ('localhost', 25)
...ten times for each attempt to deliver anything, then the
OutgoingRunner process is terminated. But when I deleted anything in
qfiles (there was the "virgin" message from setting up the site list
sitting there I think it was) and tried bin/qrunner -v -r Outgoing it
workd and just sat there in foreground apparently. Until, in another
shell, I removed and recreated the site list prompting it to try to
send an outgoing message to the listowner. As soon as I bin/
mailmanctle start - ed after that Outgoing once again collapses with
those same messages in logs/error.
So, I tried telneting to port 25 on localhost and it drops the
connection immediately. I tried telneting to hostname port 25 and it
worked. So I've got postfix for some reason that I cannot fathom
dropping connections from localhost to localhost. Postfix logs
"warning: process /usr/lib/postfix/smtpd pid 21392 killed by signal
11" at this point so I think the answer is not in this forum, but
something to do with postfix. I guess I could maybe change the
outbound mailhost somewhere in Mailman's config could I to see if
that gets round it for now?
Thanks again,
Jock
-- Jock Coats Warden's Flat 1e, J Block Morrell Hall, OXFORD, OX3 0FF w: +44 (0)1865 483353 h: +44 (0)1865 485019 m: +44 (0)7769 695767 e: jock.coats@jcsolutions.co.uk www: http://jockcoats.blogspot.com/

On 27 Mar 2007, at 20:47, Jock Coats wrote:
Setting SMTPHOST = 'FQDN' (even though it is actually localhost) does
appear to have worked. Until I have more leisure to investigate it
I shall assume that postfix has a jolly good reason for refusing a
request for localhost, but I can't work out what it might be from my
main.cf.
Thanks for the pointers though - it would have taken me ages to start
lookingat postfix as part of the problem!!
Jock
Jock Coats Warden's Flat 1e, J Block Morrell Hall, OXFORD, OX3 0FF w: +44 (0)1865 483353 h: +44 (0)1865 485019 m: +44 (0)7769 695767 e: jock.coats@jcsolutions.co.uk www: http://jockcoats.blogspot.com/

At 8:47 PM +0100 3/27/07, Jock Coats wrote:
Okay - I've done that. In fact I also installed Python 2.5 from source rather than relying on Sarge's 2.4 port just in case it was that.
No, you really want Python 2.4.4, not 2.3.anything or 2.5.anything. The problem is that 2.3.anything is to old (and breaks things in the Mailman code), and 2.5.anything is too new (and breaks things in the Mailman code).
Trust me, you really want the latest release from the 2.4 tree, for use with Mailman 2.1.9.
There's the problem. Postfix probably isn't configured to listen on the localhost IP address. Try adding that to the list of interfaces for postfix to listen on.
That could also work.
-- Brad Knowles <brad@shub-internet.org>, Consultant & Author LinkedIn Profile: <http://tinyurl.com/y8kpxu> Slides from Invited Talks: <http://tinyurl.com/tj6q4>

Jock Coats wrote:
I'd go back to 2.1.9 because you have the same problem with the Debian package.
<snip>
This looks like it's from the qrunner log, not the error log, but it says OutgoingRunner was killed with signal 11 - SIGSEGV segmentation violation.
Since your outgoing runner dies in the same way with both versions, perhaps this is a Python problem, not a mailman problem.
See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.073.htp> for some debugging techniques.
With 2.1.9, there will be a .bak file left in the 'out' queue, but it will be reprocessed when outgoing runner restarts which will repeat the failure in a loop until the retry limit for qrunners is reached.
Because Mailman is unable to send any outgoing mail. It tries, OutgoingRunner is killed because of the segmentation violation and the out queue entry is lost.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 27 Mar 2007, at 16:45, Mark Sapiro wrote:
I'd go back to 2.1.9 because you have the same problem with the Debian package.
Okay - I've done that. In fact I also installed Python 2.5 from
source rather than relying on Sarge's 2.4 port just in case it was that.
I appear to have similar problems but...
Since your outgoing runner dies in the same way with both versions, perhaps this is a Python problem, not a mailman problem.
I do not suspect Python any more, but Postfix itself.,,
Hacking the SMTPDirect.py handler to increase the debugging
information produced the following lines in logs/error (for sure this
time - you were right it was logs/qrunner last time I quoted!):
Mar 27 20:04:00 2007 qrunner(21277): connect: ('localhost', 25)
...ten times for each attempt to deliver anything, then the
OutgoingRunner process is terminated. But when I deleted anything in
qfiles (there was the "virgin" message from setting up the site list
sitting there I think it was) and tried bin/qrunner -v -r Outgoing it
workd and just sat there in foreground apparently. Until, in another
shell, I removed and recreated the site list prompting it to try to
send an outgoing message to the listowner. As soon as I bin/
mailmanctle start - ed after that Outgoing once again collapses with
those same messages in logs/error.
So, I tried telneting to port 25 on localhost and it drops the
connection immediately. I tried telneting to hostname port 25 and it
worked. So I've got postfix for some reason that I cannot fathom
dropping connections from localhost to localhost. Postfix logs
"warning: process /usr/lib/postfix/smtpd pid 21392 killed by signal
11" at this point so I think the answer is not in this forum, but
something to do with postfix. I guess I could maybe change the
outbound mailhost somewhere in Mailman's config could I to see if
that gets round it for now?
Thanks again,
Jock
-- Jock Coats Warden's Flat 1e, J Block Morrell Hall, OXFORD, OX3 0FF w: +44 (0)1865 483353 h: +44 (0)1865 485019 m: +44 (0)7769 695767 e: jock.coats@jcsolutions.co.uk www: http://jockcoats.blogspot.com/

On 27 Mar 2007, at 20:47, Jock Coats wrote:
Setting SMTPHOST = 'FQDN' (even though it is actually localhost) does
appear to have worked. Until I have more leisure to investigate it
I shall assume that postfix has a jolly good reason for refusing a
request for localhost, but I can't work out what it might be from my
main.cf.
Thanks for the pointers though - it would have taken me ages to start
lookingat postfix as part of the problem!!
Jock
Jock Coats Warden's Flat 1e, J Block Morrell Hall, OXFORD, OX3 0FF w: +44 (0)1865 483353 h: +44 (0)1865 485019 m: +44 (0)7769 695767 e: jock.coats@jcsolutions.co.uk www: http://jockcoats.blogspot.com/

At 8:47 PM +0100 3/27/07, Jock Coats wrote:
Okay - I've done that. In fact I also installed Python 2.5 from source rather than relying on Sarge's 2.4 port just in case it was that.
No, you really want Python 2.4.4, not 2.3.anything or 2.5.anything. The problem is that 2.3.anything is to old (and breaks things in the Mailman code), and 2.5.anything is too new (and breaks things in the Mailman code).
Trust me, you really want the latest release from the 2.4 tree, for use with Mailman 2.1.9.
There's the problem. Postfix probably isn't configured to listen on the localhost IP address. Try adding that to the list of interfaces for postfix to listen on.
That could also work.
-- Brad Knowles <brad@shub-internet.org>, Consultant & Author LinkedIn Profile: <http://tinyurl.com/y8kpxu> Slides from Invited Talks: <http://tinyurl.com/tj6q4>
participants (3)
-
Brad Knowles
-
Jock Coats
-
Mark Sapiro