[Mailman-Developers] Re: some Python announcements are lost (LONG)

Barry A. Warsaw barry@zope.com
Thu, 11 Oct 2001 14:24:33 -0400


Background:

Mailman 2.0.6 has a problem in its mail->news gateway related to the
strictness of the NNTP server being posted to.  This bites python.org
for the two mailing lists that are gatewayed to/from netnews.
python-list@python.org is a simple bi-directional gateway, such that
any message originally posted to comp.lang.python is gatewayed to
python-list, and vice versa.

The comp.lang.python.announce newsgroup is gated to
python-announce-list@python.org, but the route is a bit more
circuitous because c.l.py.a is a moderated newsgroup.  On python.org,
we have a MM2.0.6 patch with SF patch #101270 that implements a hack
for moderated newsgroup gatewaying.  I plan on making this patch more
official for MM2.1, but we need to first deal with the following
problem.

Problem:

In the logs/error file on python.org we occasionally see rejections of
messages posted to the news server via the ToUsenet.py module.  Here
are some examples of the reject log entries:

Dec 18 07:31:00 2000 (15053) (ToUsenet) NNTP error for list "clpa-moderators": 441 Can't set system "NNTP-Posting-Host" header
Dec 20 17:46:04 2000 (10666) (ToUsenet) NNTP error for list "clpa-moderators": 441 Can't set system "X-Complaints-To" header
Jan 03 17:21:02 2001 (17750) (ToUsenet) NNTP error for list "clpa-moderators": 441 Duplicate "To" header
Jan 05 12:52:06 2001 (32547) (ToUsenet) NNTP error for list "python-list": 441 Duplicate "Cc" header
Jan 06 14:43:07 2001 (31278) (ToUsenet) NNTP error for list "python-list": 441 437 Binary in non-binary group
Jan 22 07:22:30 2001 (2822) (ToUsenet) NNTP error for list "python-list": 441 Line 201 too long
Jan 24 12:45:01 2001 (5285) (ToUsenet) NNTP error for list "clpa-moderators": 441 Can't set system "X-Trace" header
Jan 24 15:56:29 2001 (17406) (ToUsenet) NNTP error for list "python-list": 441 No valid newsgroups in "['comp.object.corba', 'comp.object.C++', 'comp.object.C++.moderated', 'comp...."
Feb 17 08:17:24 2001 (13293) (ToUsenet) NNTP error for list "python-list": 441 Can't parse "Date" header
Feb 23 05:01:02 2001 (25246) (ToUsenet) NNTP error for list "python-list": 441 Article posted in the future
Mar 19 00:20:34 2001 (15766) (ToUsenet) NNTP error for list "python-list": 441 From: address not in Internet syntax
May 10 15:52:16 2001 (17585) (ToUsenet) NNTP error for list "python-list": 441 Duplicate "Mime-Version" header
Jun 12 09:24:01 2001 (772) (ToUsenet) NNTP error for list "python-list": 441 437 Too old -- "Fri, 05 Jan 1996 16:48:56 -0600"

When MM2.0.x gets such an exception, it simply drops the message on
the floor.  It doesn't save it, or bounce it, and this has definitely
lead to lost messages.  It appears that lossage is more prevalent in
the python-list-announce list than in the python-list, or perhaps it's
more obvious because patch #101270 sets things up such that the
message is never seen by anybody unless it flows first through
Usenet.  python-list lossages simply won't cross the mail->news
boundary, but any message posted on the mailing list will be seen by
all list members and messages posted to the newsgroup will be seen by
all newsgroup readers and all list members.

FTR: our posting host is news.baymountain.com a hosting service that
Zope Corporation uses and which gives us a newsfeed for the gatewayed
groups on mail.python.org.

% telnet news.baymountain.com nntp
Trying 63.102.48.11...
Connected to news.baymountain.com.
Escape character is '^]'.
201 news.baymountain.net InterNetNews NNRP server INN 2.2.2 13-Dec-1999 ready (no posting).

I'm looking for a principled way of handling such exceptions, both for
MM2.0.x and for MM2.1.  There are two sides to this: what can we do to
avoid the rejections in the first place, and what to do when we can't
avoid rejections.

First, it seems like exactly what the news server will reject is not
completely guessable, although there are some hints available.
AFAICT, there is no standard, or even good documentation available on
this subject except the INN source code, and an IETF Internet-Draft
called draft-ietf-usefor-article-05.

This latter says that an injecting agent (i.e. the nntpd Mailman is
posting to), should remove some headers and must remove others.  Being
a draft though, we can't rely on this and besides, because the
injecting agent has the option of not stripping some of the offending
headers, and rejecting the message anyway, we've still got to deal
with it.

>From a post on news.software.nntp, I found a list of "forbidden"
headers that a normally configured INN will reject:

  "NNTP-Posting-Host"
  "X-Trace"
  "X-Complaints-To"
  "NNTP-Posting-Date"
  "Xref"
  "Date-Received"
  "Received"
  "Posted"
  "Posting-Version"
  "Relay-Version"

Okay, so that's a start, however INN can apparently be configured to
reject or accept other combinations of headers, so we can't rely on
this list.

That leads me to think that at the least, we want a Mailman
configuration variable that lists the headers ToUsenet should strip
before attempting to post the message.

Now we come to the issue of illegal duplicate headers, like To: and
Cc:.  Well, I don't think we want to strip these, and I'm not
comfortable with folding them (i.e. folding multiple Cc: headers into
one big long one), so that leads me to think that we want another
configuration variable that will transform duplicates to X-* headers.
It's a bit distasteful that you'd have a message posted through that
will have one Cc: header and a bunch of X-Original-Cc: headers, and it
seems stupid that INN rejects multiple Cc: or To: headers, but it
seems we have no choice here.

One thing I want to try to avoid, because it seems error prone and too
complicated, is to try to grep out the exact problem in the 441
rejection message, then massage the posting and attempt to repost
until we're out of options.

Now, given the above, we can get rid of probably 90% of the
rejections, but we'll still be left with some that are inconvenient to
handle programmatically.  Like the "No valid newsgroups' or "Binary in
non-binary group" rejections.

So here comes the second part.  In MM2.1 we can take any rejected
message, encapsulate it into a MIME document, and send it and the
rejection notice to the list moderators.  The list moderators can then
apply some wetware algorithms to the problem and resend it to the
list.  We'll have to invent some mechanism that a moderator could use
to tell Mailman that this is an approved/munged Usenet post and that
it ought not to go to the list membership, but that should be easy
(perhaps another list-robot address or a special header for
list-request to look at).

In MM2.0.x it's harder to craft that message and get it sent to the
list moderators correctly, so I propose to write a simple patch that
just saves the message on disk some place and then sends a
notification to the list moderators.  They'll have to coordinate with
the site moderators to get the message posted, but at least it won't
get lost as it currently does.

Hmm, here's another thought: what if the rejected message were held in
the pending database?  I'm not sure if that'll be possible (the main
problem being that all the actual posting stuff happens in a child
proc that can't lock the list), but if it were, then the recently
posted patch to allow editing of held messages thru-the-web could be
used to edit the message and re-approve it for posting.  That's an
approach that might work in MM2.1 if the edit-held-message patch is
accepted.

Hopefully someone out there will have some better ideas.  I'd also
love any other pointers to standards or documentation on this matter.

-Barry