[Mailman-Developers] Require 2.4? No thanks [was: Maybe it's time to release 2.1.6]

Stephen J. Turnbull stephen at xemacs.org
Fri Dec 3 05:59:14 CET 2004

>>>>> "BAW" == Barry Warsaw <barry at python.org> writes:

    BAW> On Tue, 2004-11-30 at 20:07, Tokio Kikuchi wrote:

    >> May be we can go forward to requirement of Python 2.4 because
    >> CJK codecs are integreted there.

    BAW> I have no problems requiring Python 2.4 for Mailman 2.2,
    BAW> although I would like to get some feedback from the community
    BAW> before we decide for sure.

I'm not in favor of bumping the Python requirement for Mailman to
greater than 2.3 any time soon.  If you really need something in 2.4
to robustly fix a bug or provide new features, that's one thing.  And
if you could get rid of crocky homebrew infrastructure in favor of
shiny new well-defined Python APIs, that's another.  (Eg, I think the
new Mailman3 architecture, based on Twisted, is just wickedly cool.)

But at one point my Debian system _insisted_ on _four_ different
versions of Python (1.5, 2.1, 2.2, and 2.3) being installed (1.5 is
now gone, and I think 2.1 can finally be dispensed with), because of
various prerequisites stated by different packages.  Besides that, at
various points at least two versions of Perl, and two versions of Ruby
(and I didn't know I had any Ruby-requiring applications installed!),
two versions of autoconf, three versions of automake, ... you get the
drift.  Pretty yucky.  And it's shameful to be outdone by Perl!!

In the case in point, the reason explicitly advanced for moving to 2.4
is that it includes the CJK codecs.  That is, there's a bug that
someone doesn't feel like addressing properly, so you're going to
cover it up.  This just ain't right.

IMO, this particular bug goes a lot deeper than the tracebacks in
#974290 and #926034, inter alia.  I'm currently tracing another bug
where Cc contents are being trashed, probably in AvoidDuplicates, but
there's similar, redundant (?) code in CookHeaders.  This whole notion
of cooking headers is pretty bogus, in my opinion, and the reason that
unknown coding systems crash the runners is also bogus: Mailman in
many places assumes that the headers will be readable and proceeds to
try to read them (ie, translate to Unicode).  In some places it even
assumes that the header will be in ASCII!  My host's error log is full
of entries like

Dec 02 17:25:57 2004 (17826) SHUNTING: 1102026356.809988+4f78036cea165d4e72f6ed3
Dec 02 17:33:36 2004 (17826) Uncaught runner exception: 'ascii' codec can't
decode byte 0xa4 in position 0: ordinal not in range(128)
Dec 02 17:33:36 2004 (17826) Traceback (most recent call last):
[OMITTED, but FYI it's make_headers() -> h.append() again]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0:
ordinal not in range(128)

NB: this one can't be worked around by adding the CJK codecs.  :-/

What to do?  Well, let's get radical and actually think about what's
going on here from the beginning.  Mailman is really a special-purpose
MTA.  So ...

Offhand, I can't think of any reason why I would _ever_ want Mailman
to _change_ an _existing_ header.  I understand that there are a large
number of uneducated perverts with asthmatic MUAs who want Reply-To
munging out there, and that there are broken MUAs that produce broken
(ie, redundant) Cc headers, and the like.  I can almost sympathize
with Subject munging (although the List-ID header makes list names
redundant).  But I don't need any of that, and it's really annoying
that Mailman does some of those things behind my back, especially when
the implementation is buggy.

Similarly, the only reason I can think of why _I_ would want Mailman
to _read_ any headers is to parse out an address (eg, for dupe
suppression), to handle the Approved header, and to handle admin
requests.  Well, if RFC 2822 parsing (which is hard enough as it is)
isn't good enough, that's not Mailman's problem---it's going to baffle
the snot out of Sendmail, too.  So I see no _need_ for conversion of
any headers to Unicode by default!  (I think of ToArchive as calling a
separate application, not as part of Mailman.)

I conclude that the default pipeline should Just Say No to I18N in the
headers.  "We don' need no steenkin' Unicode here."  (I18N for the UI
and the message headers/footers is a completely different issue, of

Optional Handlers?  Of course.

Do you want CC coalescing?  Add a Handler to the pipeline.  (Note that
this can be implemented simply by s/^cc:/, /i and appending the result
to the previous Cc header, then folding as necessary; no need even to
parse out the content, let alone translate it to Unicode.)

Do you want Reply-To munging?  Ditto.

Do you want to sanity check addresses as a service to posters?  Add a
Handler.  Still no need for I18N, though.

Do you want Subject munging?  OK, ya got me.  Here we do want I18N.
But you can still do it with a separate Handler, and you can make it
plain that Here Be Dragons by calling it MungeSubject_I18N or even
MungeSubject_TimeBomb or MungeSubject_MUA (where the last is intended
to indicate that Mailman has gone well beyond its MTA Buddha-nature
and become entangled in dealing with the illusions of Samsara).

The only real implementation problems I can see are if the email
module goes out of its way to parse headers rather than simply storing
contents until they are requested, and that the Mailman admin UI
currently doesn't provide for munging the list pipeline.  Dealing with
the former could be hard, but the latter could be handled for
2.1.6/2.2 by making Handlers that simply return successfully without
doing any processing if they are disabled.  Enabling/disabling to be
done via list-specific and mm_cfg variables.

For the future (Mailman3?), one could use a precedence naming scheme
(to make sorting the Handlers directory easy) and a checkbox list for
basic configuration:

Mailman Handler Pipeline:

[ ] 00_SpamDetect
[ ] 01_Approve
[ ] 05_Replybot
[ ] 10_Moderate
[ ] 15_Hold
[ ] 20_MimeDel
[ ] 25_Emergency
[ ] 30_Tagger
[ ] 45_CalcRecips
[ ] 60_AvoidDuplicates
[ ] 70_Cleanse
[ ] 75_CookHeaders
[ ] 80_ToDigest
[ ] 85_ToArchive
[ ] 90_ToUsenet
[ ] 95_AfterDelivery
[ ] 98_Acknowledge
[ ] 99_ToOutgoing

(Of course one would suppress the numbering and include docstrings in
the web UI, I'm just using this to indicate the idea.)  Alternatively,
there could be a pipeline_default_order list, which would have _all_
known (standard) Handlers in it, copy that to the pipeline_default and
delete the Handlers that are disabled by default.

You'd still have to use bin/withlist to change the order, but since
order is often significant, that's probably a Feature, not a Bug.

I'd volunteer to implement a prototype, at least, but I can't do it on
the schedule proposed for 2.1.6, sorry.  (Note that I don't even know
how broken the email module is in this respect yet. :( )

I will get around to submitting bug reports on the issues mentioned
above, maybe top of next week?

Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

More information about the Mailman-Developers mailing list