"BAW" == Barry Warsaw <barry@python.org> writes:
BAW> On Tue, 2004-11-30 at 20:07, Tokio Kikuchi wrote:
>> May be we can go forward to requirement of Python 2.4 because
>> CJK codecs are integreted there.
BAW> I have no problems requiring Python 2.4 for Mailman 2.2,
BAW> although I would like to get some feedback from the community
BAW> before we decide for sure.
I'm not in favor of bumping the Python requirement for Mailman to greater than 2.3 any time soon. If you really need something in 2.4 to robustly fix a bug or provide new features, that's one thing. And if you could get rid of crocky homebrew infrastructure in favor of shiny new well-defined Python APIs, that's another. (Eg, I think the new Mailman3 architecture, based on Twisted, is just wickedly cool.)
But at one point my Debian system _insisted_ on _four_ different versions of Python (1.5, 2.1, 2.2, and 2.3) being installed (1.5 is now gone, and I think 2.1 can finally be dispensed with), because of various prerequisites stated by different packages. Besides that, at various points at least two versions of Perl, and two versions of Ruby (and I didn't know I had any Ruby-requiring applications installed!), two versions of autoconf, three versions of automake, ... you get the drift. Pretty yucky. And it's shameful to be outdone by Perl!!
In the case in point, the reason explicitly advanced for moving to 2.4 is that it includes the CJK codecs. That is, there's a bug that someone doesn't feel like addressing properly, so you're going to cover it up. This just ain't right.
IMO, this particular bug goes a lot deeper than the tracebacks in #974290 and #926034, inter alia. I'm currently tracing another bug where Cc contents are being trashed, probably in AvoidDuplicates, but there's similar, redundant (?) code in CookHeaders. This whole notion of cooking headers is pretty bogus, in my opinion, and the reason that unknown coding systems crash the runners is also bogus: Mailman in many places assumes that the headers will be readable and proceeds to try to read them (ie, translate to Unicode). In some places it even assumes that the header will be in ASCII! My host's error log is full of entries like
Dec 02 17:25:57 2004 (17826) SHUNTING: 1102026356.809988+4f78036cea165d4e72f6ed3 5943a75961cc7da67 Dec 02 17:33:36 2004 (17826) Uncaught runner exception: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128) Dec 02 17:33:36 2004 (17826) Traceback (most recent call last): [OMITTED, but FYI it's make_headers() -> h.append() again] UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)
NB: this one can't be worked around by adding the CJK codecs. :-/
What to do? Well, let's get radical and actually think about what's going on here from the beginning. Mailman is really a special-purpose MTA. So ...
Offhand, I can't think of any reason why I would _ever_ want Mailman to _change_ an _existing_ header. I understand that there are a large number of uneducated perverts with asthmatic MUAs who want Reply-To munging out there, and that there are broken MUAs that produce broken (ie, redundant) Cc headers, and the like. I can almost sympathize with Subject munging (although the List-ID header makes list names redundant). But I don't need any of that, and it's really annoying that Mailman does some of those things behind my back, especially when the implementation is buggy.
Similarly, the only reason I can think of why _I_ would want Mailman to _read_ any headers is to parse out an address (eg, for dupe suppression), to handle the Approved header, and to handle admin requests. Well, if RFC 2822 parsing (which is hard enough as it is) isn't good enough, that's not Mailman's problem---it's going to baffle the snot out of Sendmail, too. So I see no _need_ for conversion of any headers to Unicode by default! (I think of ToArchive as calling a separate application, not as part of Mailman.)
I conclude that the default pipeline should Just Say No to I18N in the headers. "We don' need no steenkin' Unicode here." (I18N for the UI and the message headers/footers is a completely different issue, of course.)
Optional Handlers? Of course.
Do you want CC coalescing? Add a Handler to the pipeline. (Note that this can be implemented simply by s/^cc:/, /i and appending the result to the previous Cc header, then folding as necessary; no need even to parse out the content, let alone translate it to Unicode.)
Do you want Reply-To munging? Ditto.
Do you want to sanity check addresses as a service to posters? Add a Handler. Still no need for I18N, though.
Do you want Subject munging? OK, ya got me. Here we do want I18N. But you can still do it with a separate Handler, and you can make it plain that Here Be Dragons by calling it MungeSubject_I18N or even MungeSubject_TimeBomb or MungeSubject_MUA (where the last is intended to indicate that Mailman has gone well beyond its MTA Buddha-nature and become entangled in dealing with the illusions of Samsara).
The only real implementation problems I can see are if the email module goes out of its way to parse headers rather than simply storing contents until they are requested, and that the Mailman admin UI currently doesn't provide for munging the list pipeline. Dealing with the former could be hard, but the latter could be handled for 2.1.6/2.2 by making Handlers that simply return successfully without doing any processing if they are disabled. Enabling/disabling to be done via list-specific and mm_cfg variables.
For the future (Mailman3?), one could use a precedence naming scheme (to make sorting the Handlers directory easy) and a checkbox list for basic configuration:
Mailman Handler Pipeline:
[ ] 00_SpamDetect [ ] 01_Approve [ ] 05_Replybot [ ] 10_Moderate [ ] 15_Hold [ ] 20_MimeDel [ ] 25_Emergency [ ] 30_Tagger [ ] 45_CalcRecips [ ] 60_AvoidDuplicates [ ] 70_Cleanse [ ] 75_CookHeaders [ ] 80_ToDigest [ ] 85_ToArchive [ ] 90_ToUsenet [ ] 95_AfterDelivery [ ] 98_Acknowledge [ ] 99_ToOutgoing
(Of course one would suppress the numbering and include docstrings in the web UI, I'm just using this to indicate the idea.) Alternatively, there could be a pipeline_default_order list, which would have _all_ known (standard) Handlers in it, copy that to the pipeline_default and delete the Handlers that are disabled by default.
You'd still have to use bin/withlist to change the order, but since order is often significant, that's probably a Feature, not a Bug.
I'd volunteer to implement a prototype, at least, but I can't do it on the schedule proposed for 2.1.6, sorry. (Note that I don't even know how broken the email module is in this respect yet. :( )
I will get around to submitting bug reports on the issues mentioned above, maybe top of next week?
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.