Google Summer of Code - Spam Defense

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I like to participate in Google Summer of Code this year. One possible Project for me is to implement some Spam Defense in Mailman. I think development for Mailman should be possible through Python Software Foundation. Am I right with this?
I administrate a Mailman installation with about 100 lists and thousands of users and I moderate most of the Lists. I think the biggest Problem of Mailman is the lack in spam defense. Discard messages from nonmembers is no option on most lists.
Some time ago I began some modification of Mailman. But I never finished it. The first action is to integrate support for SpamAssassin in Mailman. Therefor I wrote a python class spamc which connects to spamd. This gives the possibility to scan all incoming Mail. Further ideas for spam defense are:
- Add the possibility to scan all messages form nonmembers half an hour later again before mark them as hold. This is because most of the mails which are not recognized as spam are to new. The servers are not in any blacklist at time of incoming.
- Train the bayes filter from Mailman. Forward all accepted Messages to SpamAssassin to learn them as ham. The autolearn feature of SA doesn't work for me. It learns to much false negatives.
This are my ideas so far. Is this welcome in Mailman and is it enough for an GSoC Project? Where would it be best? 2.1.11? 2.2.0? 3.0.0?
Best Wishes Timo
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFH67yiakDbqHKnrh8RAhxoAKDYWguLeFxuSAy18sSCXdwWONmdiwCg2YwO W60FvGTr79tAAXZEndPSFSk= =iutB -----END PGP SIGNATURE-----

Timo Wingender wrote:
Discard messages from nonmembers is no option on most lists.
AFAIK it is an option, but not the default and defaults are rarely changed in my experience.
Entirely my opinion, but I suspect that this will be harder to do and less portable than having a mailman front-end that recognizes the headers inserted by the major spam gateways.
For example, the sites I am aware of run amavisd+SA in front of mailman. They aren't going to disable amavis to have mailman run SA directly. Nor are sites with barrucudas likely to do so, etc etc. My opinion entirely, but I think it would be better to make mailman aware of the headers inserted by these solutions.
Again, just my opinion.
-- Jo Rhett Net Consonance ... net philanthropy, open source and other randomness

On Mar 27, 2008, at 12:29 PM, Jo Rhett wrote:
One thing that I've been thinking about in regards to this... Is this
job the responsibility of mailman? To scan for spam and other such
things?
I do all of my scanning at the front end, all e-mail gets run through
ASSP to scan blacklists, grey list, etc. etc. then it goes to the MTA,
which in my case just verifies the account exists (Or an alias for it
does) and then is delivered to mailman for delivery...
Why not run something like that instead of checking for all that stuff
just before the delivery?
--
Jason Pruim Raoset Inc. Technology Manager MQC Specialist 3251 132nd ave Holland, MI, 49424-9337 www.raoset.com japruim@raoset.com

On 27-Mar-08, at 12:35 PM, Jason Pruim wrote:
Even if it's not Mailman's responsibility to do the scanning, it can
be incredibly helpful to make the mailman interface aware of and able
to interact with scanning technologies. ie - using messages
discarded as spam for retraining the system, letting list admins
customize the behaviour for some lists (or -owner addresses) based on
lower or higher threshold values than the server itself uses, maybe
having a way to "discard this spam for all lists" or "flag this as
spam for other list admins" in the case of messages being sent to
multiple lists at once...
Terri

Terri Oda schrieb:
I would suggest to be more specific: which functions do we wish to have, what is necessary to implement them, and is the effort worth it?
run Spamassassin (or another classification) on all messages: IMHO this is the MTA's job, so let's assume it already happens
hold or discard Messages marked as spam: Set up Spam-Filter rules with "X-Spam-Flag: YES", "X-Spam-Level: \*\*\*\*\*\*\*", or whatever. It is not the most user friendly interface, but certainly the most configurable and flexible one.
give feedback to train a classifier: The admindb interface already has a checkbox to save spam. IMHO it should be given a better label (cf. http://sourceforge.net/tracker/index.php?func=detail&aid=1910552&group_id=103&atid=300103) but it exists and the site administrator only has to train the saved messages regularly. (On my site I deliver them into a shared IMAP spam-folder for review and training.)
reclassify mails in the hold-queue: This sounds quite promising (I know some people do this successfully with IMAP inboxes). But as already mentioned this requires additional effort from the site admins, which probably will not change a working Amavis setup for this.
P.S.: I hope this mail did not become too negative. I am always in favour of better and more user friendly spam filters. But there are quite a lot of spam-related patches already and any new approach should be clearly better than the already existing hooks and functions. Otherwise it is just a waste of time which should be used for other problems (MM3 comes to mind).
-- Martin

On 27-Mar-08, at 2:37 PM, Martin Schütte wrote:
Choosing the best of from existing spam-related packages and
integrating them into the main development tree actually sounds about
perfect for a Google summer of code project, IMO. As does taking all
the existing methods for handling things (as you described) and
putting a good interface on them that makes it easier for people to
use and realise that these things can be done.
It's easy for *us* to say "well, of course, you can already do that!"
but many list admins and even site admins are not aware that these
things can be done, and perhaps wouldn't even think to do them. If
there was a big "spam management" section in the admin interface,
it'd make a lot of people happy. Even just a simple "discard as
spam" option which (a) discarded the message (b) saved it somewhere,
ran a trainer on it, then deleted it (or not, but these things pile
up) would be a useful change to many people.
I don't think it's a waste of time at all to make it easier for
people to use existing hooks and functions. And I think a lot of the
reason we haven't integrated the existing spam-related patches is
just that no one's had the time to look through them all and figure
out which ones are the best and/or what elements we want from each of
them. Again, I think doing so would be quite the worthwhile endeavour.
Not all development has to be innovative research! :)
Terri

On Thu, Mar 27, 2008 at 07:37:29PM +0100, Martin Schütte wrote:
Back in January I told our 500+ list admins that they could do this:
http://lists.ibiblio.org/pipermail/ibiblio-announce/2008-January/000210.html
And as of yesterday (27 of March) fewer than 20 had done anything. Yesterday I ran a script that imposed that filtering on all lists because we have been blacklisted by spamcop yet again. The message we were blacklisted for had been tagged as spam by SA on the list server, but still got bounced out to an innocent 3rd party (who then reported us).
Anything that makes spam filtering smarter and better-integrated in mailman is a Good Thing (tm). Providing the *option* to have good and sane filtering integration is definitely not enough. Even with hand-holding and encouragement, the vast majority of our list admins are not going to do nearly as good a job as mailman can do if it accepts this GSOC project. I'll be happy to provide whatever feedback or support I can in making that happen.
For the curious, the script I ran was based closely on what's in the wiki for changing generic_nonmember_action:
#!/bin/bash
cd /usr/local/mailman/bin
f=mktemp
echo "header_filter_rules = [('^X-Spam-Status: Yes', 3, 0)]" > $f
for list in cat /root/no-filter-rules.txt
do ./config_list -i $f $list
done
rm $f
where "/root/no-filter-rules.txt" is a list of lists that had not heeded my advice.
Cheers,
Cristóbal Palmer ibiblio.org systems administrator

Cristóbal Palmer writes:
Why not just enforce this in SpamAssassin?
If you've got lists that require special treatment and have trustworthy admins, get a plan from them and then add a rule that gives them a -5 or -10 bonus so that only really egregious spam gets automatically discarded, and the rest gets handled by the list- specific mechanism.
You could also enable the per-address configs in SA itself.
I don't see anything in this story that couldn't be done just as well with central control via SA at the MTA.

On Sat, Mar 29, 2008 at 01:08:14PM +0900, Stephen J. Turnbull wrote:
I don't see anything in this story that couldn't be done just as well with central control via SA at the MTA.
Part of this involves the backstory. 500+ lists that have never been in any way filtered, and many vocal list administrators concerned that having something imposed on them that they can't control will break things.
Personally, I think it's the MTA's job to reject malformed (eg. bad HELO) mail, it's SA's job to *tag* mail, and whatever the MTA hands off to should make the decision about whether to drop, quarantine, or deliver. That's a philosophical stance, and if it's impractical and I shouldn't think that way, then so be it. I'd like to hear some arguments before I change that view, though. My current solution has the advantage that for any complaining list admin, I can point that administrator to her/his own admin panel and say, "Play with these settings."
So basically what I'm saying is that my selfish POV makes me want a mailman that has nice anti-spam policies out of the box. If it requires an admin making decisions about which addresses to protect or whether to do it from within SA, mailman, or something else, then there's a problem.
I'm still scratching my head on how this bounced its way into my inbox, for example:
http://garp.metalab.unc.edu/backscatter-example.txt
How/where do I stop that?
Cheers,
Cristóbal Palmer ibiblio.org systems administrator

Cristóbal Palmer wrote:
It doesn't look to me like backscatter at all. It looks like spam sent to cc-co-owner@lists.ibiblio.org which went to MailScanner on "malecky" which replaced the original message with a message consisting of the "notice" with the original attached. That message then continued through delivery chain to cc-co-owner@lists.ibiblio.org which was redirected to postmaster@lists.ibiblio.org and then to admin@ibiblio.org by lists.ibiblio.org. It was then relayed to metalab.unc.edu (a bit of a puzzle as the MX for ibiblio.org is mail.metalab.unc.edu, but perhaps these are really the same machine) which redirected admin@ibiblio.org to cmpalmer@ibiblio.org which ultimately got delivered to cmpalmer@garp.metalab.unc.edu.
It also appears that the cmpalmer@ibiblio.org to cmpalmer@garp.metalab.unc.edu step involved a resend which rewrote the envelope sender to cmpalmer@metalab.unc.edu.
I don't know what there is to stop here. I may be completely wrong, but it looks like this was just mail sent to cc-co-owner@lists.ibiblio.org delivered through the chain that would apply to all such mail.
OK, I've just seen your reply to Robby Griffin's off-list message so the question is "why did cc-co-owner@lists.ibiblio.org" go to postmaster@lists.ibiblio.org.
You say "What I'm missing here is the step where the mail went from going to one of the three list admins (again, all at gmail) to going to me. Where was the forgery? How did mailman (or was it postfix?) get duped?"
There is no evidence in the Received: chain that this copy was sent to any of the three list admins. What does
/usr/local/mailman/bin/list_owners -m cc-co
show you? Assuming that doesn't list postmaster, what is in the MTA logs on lists.ibiblio.org regarding this message, and what's in Mailman's smtp log regarding this message? There's actually no indication that this ever went to Mailman. How is list mail delivered to Mailman on this machine? Is it possible that cc-co-owner@lists.ibiblio.org is mis-interpreted as trying to deliver to the 'co-owner' address of the cc list and this mis-delivery goes to postmaster?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Cristóbal Palmer writes:
"Beggars can't be choosers."
That is the stance I take personally. It's also the one described here:
http://mayfirst.org/?q=node/180
(this URL is from Ian Eiloart, but I don't know if he endorses the stance himself).
I know no other list-admins (not Mailman site admins or postmasters, list-admins) who take that stance. They simply want as little in their moderation queues as possible, and many ignore those until somebody complains. Don't you get the "I have 1000 spams in my queue and need to find one held message that's a real post, but lists.ibiblio.org times out and the page never gets done" FAQ from some of your admins? I haven't heard it from other list admins at my site, but I know two have 500 and 1200 pending in the mod queue! They certainly won't care if my site goes to a "shoot first, moderate later" policy.
There is a technical problem with our stance, which is that there is a difference between an SMTP reject (permanent failure status) and a bounce message. The SMTP reject *will* be heard by the spammer, and it is in his interest to prune such addresses from the list, at least the one he uses personally. (Not all are smart enough to recognize that, of course.)
Bounce messages, if sent, will almost certainly go to a forged address as backscatter :-(, and will not be heard by the spammer. In fact, since the spam was accepted, he is likely to consider the address to have been validated, whether you try to send a bounce or not.
For this reason I am looking forward to a way to issue SMTP rejects based on content. Eg, for sendmail and postfix, this could be implemented via a Mailman-provided milter.
Unfortunately, tuning list settings that have to do with filtering is not and never really was something that you want people who have never even set up an MTA to do. Understanding what happens is quite complex.
I don't see why that would be needed. If you have list-specific tweaks, then either all the SAs are feeding Mailman, or the ones that aren't won't care. I do this all the time.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mar 29, 2008, at 11:17 PM, Stephen J. Turnbull wrote:
What about the Mailman 3 LMTP server? I plan on backporting this to
2.2.
The solution in Mailman 3 will be to allow for defining named styles.
A style is simply a collection of some subset of all the configuration
variables on a mailing list. So you could imagine a site that cans a
few common styles for spam filtering and lets their list admins choose
which they want. If they really want to let them have full control,
they could do that through an 'advanced' tab.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0PRwACgkQ2YZpQepbvXHy3wCaAn5rGVTS9YsAqkq+amt3mVDP 3e4AoKYj71ShABwIJm13rlYy+jtzpPRR =Wrix -----END PGP SIGNATURE-----

Barry Warsaw writes:
I don't see how that works. Do SMTP MTAs typically initiate an LMTP session before accepting mail from remote hosts? If they don't, then you don't get any extra leverage on the backscatter problem.
That's half the solution. The other half is providing means and encouragement for good ones to end up in a contrib directory. :-)
A style is simply a collection of some subset of all the configuration
variables on a mailing list.
And these cascade, right?

--On 3 April 2008 14:28:55 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
Exim could do that. It has Access Controls that can be implemented at any stage of the SMTP process. They're capable of running arbitrary perl code. That's not trivial.
What would be easy and useful though, would be LMTP call forwards, which are run after each RCPT TO command. We already use these to determine quota status on our Cyrus mailstores. Mailman would need to reject mail after RCPT TO if the sender isn't permitted to post to the list, or if the recipient address doesn't refer to a list.
It would be nice if RFC1893/2034 enhanced error codes were used. These seem relevant:
X.1.1 Bad destination mailbox address X.2.4 Mailing list expansion problem X.5.3 Too many recipients X.7.2 Mailing list expansion prohibited The sender is not authorized to send a message to the intended mailing list.
X is 4 for a temporary error, 5 for a permanent error.
-- Ian Eiloart IT Services, University of Sussex x3148

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Apr 3, 2008, at 6:19 AM, Ian Eiloart wrote:
Thanks, wiki updated.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkgDXaYACgkQ2YZpQepbvXEJBwCfemogkxdUEtsbDxeHgjUuxmdV 5yMAn3aQ/1bfMgqiR/vEmvXgsClTxNjm =xQxj -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Apr 3, 2008, at 1:28 AM, Stephen J. Turnbull wrote:
Or the Cheeseshop. At least the /intent/ is that third party packages/
eggs could be installed that provide entry points for the plugins that
Mailman defines. If that doesn't work in practice then we need to bug
the distutils folks or find another solution. I really want to
encourage a rich ecosystem of add-ons that don't need to be officially
supported or even acknowledged by the core team. The core should be
concentrating on the core functionality and providing a robust
framework with useful plug points.
They can, other than that I haven't written the code yet. ;)
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkgDW28ACgkQ2YZpQepbvXGe3gCfQ5ZwND9HWHVxFnMOrEzqiFDg tNgAoJr/P3p6tNs7V2g5YLeBCSevIYey =0FGj -----END PGP SIGNATURE-----

Martin Schütte schrieb:
http://sourceforge.net/tracker/index.php?func=detail&aid=1910552&group_id=103&atid=300103)
Yes, most of this can be done with mailman in some way already. But therefor you have to know much about mailman and the mail system. This requires changes of the system and sometimes patching of mailman. I this it's an good idea to have this features in mailman, so everyone can simply use it. Even if they don't have access to the system.

Jason Pruim schrieb:
Of course this is not the primary job of mailman. But it can help to reduce the effort to moderate lists.
If you have an moderator for each list it's no problem if a few spam messages per day are hold. But if you don't it's much effort to sort out all the spam.
Not every site runs ASSP and I am not a fan of greylisting. It delays mails of legitimate users and once most sites use is spammers will get around it.

On Mar 27, 2008, at 1:18 PM, Timo Wingender wrote:
I see what you're getting at.. I've never looked at it from that stand
point before so it didn't make sense to me. I use mailman primarily
for internal communication for my company, the ability to send a file
to art@mydomain.com and it goes to who ever is in the art department
that day has been great!
I completely agree... Greylisting isn't the best way to use it. I
should say though that I might be in a different configuration then
you... Most of the people who e-mail my domain are our customers and
have their own domain. So I can white list all of our customers since
the chance of spam is very low from their company mail servers.
Greylisting also helped me get my feet wet with spam related issues
since I wasn't ready to start blocking based on content.
But now we get too much spam making it around the greylisting so I
need to get back into the fight against spam! :)
--
Jason Pruim Raoset Inc. Technology Manager MQC Specialist 3251 132nd ave Holland, MI, 49424-9337 www.raoset.com japruim@raoset.com

Jo Rhett wrote:
I think that Timo is saying that for his purposes on his lists it is not viable for him to automatically discard messages from nonmembers, not that the option doesn't exist in Mailman.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro schrieb:
I know how to discard messages automatically. But on most lists the default is to hold them. The question is how mailman is used. If the list are there for communication only with list members then it's an option to discard messages. But if you use a list as an official contact you cannot simply discard all messages from nonmembers.
Sometimes people send mails to a list from an alternative address or someone not on the list wants to contact the list. If this is an intended use of a list then discard by default is no option.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mar 27, 2008, at 12:29 PM, Jo Rhett wrote:
There should definitely be a handler to do this recognition. It's
easy to do and as you say, if the site is already running SA in the
MTA, this would be a useful addition to Mailman.
However, not all sites run thing this way, and I think it would be
helpful if people could run SA scanning in Mailman, though we should
not recommend it or enable it by default.
One other place to scan, which an MTA-embedded SA doesn't cover, is
gating messages from NNTP. I know that it's not a common use case
these days, but many sites still pull messages from NNTP to their
mailing list, and in this case, there is currently /no/ scanning for
spam. It's a pretty common vector that would be useful to close off.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkfzbs8ACgkQ2YZpQepbvXEkfgCdFcOdyursgpmIR4b2Cec2kVeu 0voAoJKRaUVj6u+Ynuz+6YknDWbbBgVU =OWHn -----END PGP SIGNATURE-----

Timo Wingender writes:
This are my ideas so far. Is this welcome in Mailman and is it enough for an GSoC Project? Where would it be best? 2.1.11? 2.2.0? 3.0.0?
I don't speak for the core developers, but to summarize what several others have said and add a couple of points:
If you are going to reject or discard a post, you really want to reject it in the SMTP transaction that submits it to your external MX, not in Mailman. This implies
- Since SpamAssassin (SpamBayes, etc) are easily integrated into MTAs, it's of secondary value to have them run from Mailman. There are also already such patches, so I don't think those would really qualify for GSoC by themselves.
- The big hole in the current architecture is that there is no way for spam filters in the MTA to get information from Mailman's member lists. That seems to be the crucial defect at present.
You should get in touch with Brad Knowles who currently isn't subscribed to this list AFAIK, but is the resident anti-spam guru on Mailman-Users. He might be a good GSoc mentor if he's willing, although he's not a code jockey AFAIK.
Peripherally related, but also very important, is work on the backscatter problem. See the ongoing "before next release: disable backscatter in default installation" thread on this list. However, Jo Rhett has sketched out what basically is needed. It's not big enough to qualify for GSoC ;-) The remaining work, however, is substantial, but may not really be on-topic for GSoC: updating the templates, working with the translators to get the new templates translated, and testing the result.
(I do not speak for Barry or Mark, but FWIW) As I read Barry's statements, this kind of thing would not be appropriate for 2.1.
It is definitely appropriate for 2.2 IMO. That would need Mark's cooperation, of course.
Barry has already started work on 3.0 with the intent of realizing some of the ideas summarized above by allowing callbacks into Mailman to be subscriber info for the use of the MTA, including spam fighting. Probably the architectures of a 2.2 implementation and a 3.0 implementation would be quite different.
HTH

Stephen J. Turnbull schrieb:
If you don't discard all messages from nonmembers then this is not possible. Of course everything which is obviously spam or produce an error in mailman should be rejected in the smtp transaction. But the must be discarded in mailman
That's a good point for lists which discards all messages from non-members. But I think it doesn't affect lists which holds messages from nonmembers by default.
I read parts of this thread. It's a very long discussion. Adding configuration options to mailman to disable some of the aliases and only answer requests if they seem to contain a command, could be a part of my application.
I thought of 2.2 too. 2.1 would be nicer because 2.2 and 3.0 seems too far away.
I wrote to this list to get some information which spam features are acceptable by the mailman developers and to get some more specific ideas.

Timo Wingender wrote:
This would be useful. I *might* even be able to mentor this (or even a larger) part of the project, but I'm not able to commit to that at this time.
As far as I'm concerned, nothing should be done in 2.1 at this point. All development of anything other than bug and security related fixes going forward should be on the 2.2 or 3.0 branches (I know some consider backscatter to be a security issue, but that is not the sense I mean when I say 'security related').
I would like 2.2.0 to be the next release after 2.1.10. I have some small changes to go into it now, and I think addressing backscatter should be high on the list. We also thought initially that getting some GUI improvements would be the primary focus of 2.2, but if that work (which wasn't being done by me) is not forthcoming soon, I would consider making 2.2 relatively short lived and deferring the GUI improvements to a potential 2.3 release.
I wrote to this list to get some information which spam features are acceptable by the mailman developers and to get some more specific ideas.
My view on spam is that as much as possible, it should be dealt with in the MTA and Mailman should never see it. I know that this is not possible in all situations because critera for non-acceptance or discard of a message may be different for Mailman lists than for other mail.
I personally have recently started using MailScanner on the server that I admin. It is actually very flexible in allowing different critera and rules based on sender or recipient or whatever. The one thing I don't like is that it interfaces with Postfix (and other MTAs too) in a way that doesn't allow rejecting the mail at incoming SMTP time. Of course I don't accept mail for recipients I don't know, and I greylist everything else, but after that, I have to just store or discard the mail I don't want to deliver as it's too late to not accept it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mar 28, 2008, at 4:51 AM, Stephen J. Turnbull wrote:
This ties into the work that we sprinted on at Pycon, which would
expose a REST interface to Mailman's internal data structures. I
don't know if Maki, Andrew and Richard have made much progress on this
since then though. The intent is to make this available in both MM2.2
and 3.0.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0LdsACgkQ2YZpQepbvXEWSgCfcScee9wn7auWu3VIvK3IV7j1 UV4An0kOPqAjOpzmghjNvTO2vrD3u0V5 =mmHM -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mar 27, 2008, at 11:26 AM, Timo Wingender wrote:
Hi Timo,
I think you could do this through either the PSF or the GNU project.
It's a good project to pursue, and either organization seems
appropriate.
A couple of thoughts, and then I'm going to try to respond to other
messages in this thread. While I agree that it's generally much
better to do spam detection upstream of Mailman, i.e. in the MTA, I
think there is still some benefit to developing several hooks in
Mailman to something like SpamAssassin. One of course would be a
fairly simple handler to recognize SA headers and do the appropriate
thing. Your idea of having a call out to SA to scan the message is
valid too though, because I don't think everybody is able to hook it
into their MTA, for whatever reason. This wouldn't be on by default,
but it should be an option.
Several years ago, myself and a few others worked on some code to hook
Mailman's approval mechanism into Spambayes training. It worked
moderately well, but not good enough to ever add to Mailman proper.
I'm sure the patches are still on SourceForge and might even still
apply to MM 2.1. It's an interesting idea that you might like to dust
off and see if you can get working for SA.
This are my ideas so far. Is this welcome in Mailman and is it enough for an GSoC Project? Where would it be best? 2.1.11? 2.2.0? 3.0.0?
I wouldn't do it for 2.1 since I'd like to be very strict about "no
new features" for 2.1. It would probably be most useful for people in
2.2, but I hope that you'll also consider looking at 3.0 because I
think the architecture will be more amenable to these ideas. E.g.
you'd be able to reject spammy messages during the LMTP phase.
I'm planning on releasing Mailman 3 alpha 1 in the next day or so.
It's basically ready, but I have to fix an annoying setuptools problem.
Cheers,
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkfzbgcACgkQ2YZpQepbvXFAMQCfTQZ/Ef6XCHGHUjMu9vVPgqoZ 7l8An3FotgRC+CeKbCcu3tjk6oxuvbyu =/Oay -----END PGP SIGNATURE-----

Barry Warsaw writes:
Why have a Handler when you can already use header scanning in the privacy filters? Ie, isn't this a documentation or UI problem rather than a Handler problem?
Even if you can't hook it into the MTA, it's certainly possible to do this via procmail, etc. There are so many ways to do this already. Why doesn't TOOWTDI apply here?

Stephen J. Turnbull wrote:
I agree, it's a UI problem. What I'd like is that privacy/spam administration page had two checkboxes for "Filter obvious spam" and "Filter propable spam" (or smthng like that). These would be handled internally like spam filter regexp, only difference would be that those regexps would be configured in mm_cfg.py. And there shouldn't be Reject as a possible action for these rules.
In mm_cfg.py site admin could enable/disable/hide these and set default actions for these rules.
That way list admins could easily e.g. discard obvious spam messages and hold probable ones.
-- Eino Tuominen

Eino Tuominen writes:
Do you mean "Filter SpamAssassin score >15" and "Filter SpamAssassin >5", or equivalent for other filter applications?
The problem is that on "watch-lovers@rich-folks.net" discussion of Rolex watches is likely not spam, but on "xemacs-commits@xemacs.org" it sure is! So "obvious" is not a good word to use.
I think if that's what you mean, a numerical entry documented with
For most lists, >5 is an aggressive setting that will permit very
little spam but may throw out legitimate posts. >15 is a very
relaxed setting that will almost never throw out a legitimate post
but will let in a fair amount of spam. Adjust the setting
according to your preferences and the results.
I think I would do the UI this way:
SpamAssassin filters:
If your site uses SpamAssassin to tag messages, then you can set the following filters based on the SpamAssassin score of each post. If SpamAssassin is not installed, or not configured to tag the message with a score[1], these settings will have no effect.
On/Off Action Score (o) HOLD posts with score over [ 5.0] ( ) DISCARD posts with score over [ 15.0]
The settings in the prototype will NEVER discard a post, and will hold posts with a score over 5.0. I prefer this style because we can provide visible defaults for the scores.
Footnotes: [1] This should mention the exact SA header we're looking for, of course.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Apr 2, 2008, at 1:07 PM, Eino Tuominen wrote:
When I translate that to Mailman 3-speak, I agree. :)
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0PpcACgkQ2YZpQepbvXFurQCeP9rTjasS08Zq6WyWyzoVAJAt R+wAoJj+B6+wFMAcEtMGXOaEDMxu1YiX =GMcO -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Apr 2, 2008, at 1:02 PM, Stephen J. Turnbull wrote:
I'm really thinking about Mailman 3's rule architecture. I think it's
going to be much better to have narrower rules for specific purposes,
as they can be packaged in plugins, enabled separately etc. A generic
header regexp matcher could be used, but would probably be a bit more
difficult to use.
Because I think you're talking about different sites having different
administrative domains. In some, it will make sense to do it as far
upstream as possible, but in others, it's going to be left to the list
admin, so they should at least have the ability to enable such filters
if nobody else will.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0PlYACgkQ2YZpQepbvXE7GQCfZQfB5JzwxtoomyXSkraMuO7r FYAAoJ5/FD294hMx2s0/QLDkJD9XNec1 =VHlH -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As the deadline was on March 31th I submitted a application to the Python Software foundation. A copy of my application can be found here:
http://kybs.de/~wingender/gsoc2008/mailman/application.html
It's not good because I had to less time. But luckily Google extended the deadline until April 7th. So I am able to improve my application and correct some errors. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFH848eakDbqHKnrh8RAlwrAJ9t+EaeSY65U3MpcoKQMvRHqSQk0QCgzCWq +Z7A0vWF9Re5Hcf9zK7Du74= =rdix -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Apr 2, 2008, at 9:50 AM, Timo Wingender wrote:
Does the extension mean you could file with the FSF as well? I don't
know what the procedure is for the organizations to get in touch with
the projects, but I'll try to look into this.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0PYsACgkQ2YZpQepbvXE+XgCglWak616HyloS4qzJWVhIp+r6 yqUAoIRcI6h9Q0sF+6WYGK0T4ux/UiiI =vWg1 -----END PGP SIGNATURE-----

Timo Wingender wrote:
Discard messages from nonmembers is no option on most lists.
AFAIK it is an option, but not the default and defaults are rarely changed in my experience.
Entirely my opinion, but I suspect that this will be harder to do and less portable than having a mailman front-end that recognizes the headers inserted by the major spam gateways.
For example, the sites I am aware of run amavisd+SA in front of mailman. They aren't going to disable amavis to have mailman run SA directly. Nor are sites with barrucudas likely to do so, etc etc. My opinion entirely, but I think it would be better to make mailman aware of the headers inserted by these solutions.
Again, just my opinion.
-- Jo Rhett Net Consonance ... net philanthropy, open source and other randomness

On Mar 27, 2008, at 12:29 PM, Jo Rhett wrote:
One thing that I've been thinking about in regards to this... Is this
job the responsibility of mailman? To scan for spam and other such
things?
I do all of my scanning at the front end, all e-mail gets run through
ASSP to scan blacklists, grey list, etc. etc. then it goes to the MTA,
which in my case just verifies the account exists (Or an alias for it
does) and then is delivered to mailman for delivery...
Why not run something like that instead of checking for all that stuff
just before the delivery?
--
Jason Pruim Raoset Inc. Technology Manager MQC Specialist 3251 132nd ave Holland, MI, 49424-9337 www.raoset.com japruim@raoset.com

On 27-Mar-08, at 12:35 PM, Jason Pruim wrote:
Even if it's not Mailman's responsibility to do the scanning, it can
be incredibly helpful to make the mailman interface aware of and able
to interact with scanning technologies. ie - using messages
discarded as spam for retraining the system, letting list admins
customize the behaviour for some lists (or -owner addresses) based on
lower or higher threshold values than the server itself uses, maybe
having a way to "discard this spam for all lists" or "flag this as
spam for other list admins" in the case of messages being sent to
multiple lists at once...
Terri

Terri Oda schrieb:
I would suggest to be more specific: which functions do we wish to have, what is necessary to implement them, and is the effort worth it?
run Spamassassin (or another classification) on all messages: IMHO this is the MTA's job, so let's assume it already happens
hold or discard Messages marked as spam: Set up Spam-Filter rules with "X-Spam-Flag: YES", "X-Spam-Level: \*\*\*\*\*\*\*", or whatever. It is not the most user friendly interface, but certainly the most configurable and flexible one.
give feedback to train a classifier: The admindb interface already has a checkbox to save spam. IMHO it should be given a better label (cf. http://sourceforge.net/tracker/index.php?func=detail&aid=1910552&group_id=103&atid=300103) but it exists and the site administrator only has to train the saved messages regularly. (On my site I deliver them into a shared IMAP spam-folder for review and training.)
reclassify mails in the hold-queue: This sounds quite promising (I know some people do this successfully with IMAP inboxes). But as already mentioned this requires additional effort from the site admins, which probably will not change a working Amavis setup for this.
P.S.: I hope this mail did not become too negative. I am always in favour of better and more user friendly spam filters. But there are quite a lot of spam-related patches already and any new approach should be clearly better than the already existing hooks and functions. Otherwise it is just a waste of time which should be used for other problems (MM3 comes to mind).
-- Martin

On 27-Mar-08, at 2:37 PM, Martin Schütte wrote:
Choosing the best of from existing spam-related packages and
integrating them into the main development tree actually sounds about
perfect for a Google summer of code project, IMO. As does taking all
the existing methods for handling things (as you described) and
putting a good interface on them that makes it easier for people to
use and realise that these things can be done.
It's easy for *us* to say "well, of course, you can already do that!"
but many list admins and even site admins are not aware that these
things can be done, and perhaps wouldn't even think to do them. If
there was a big "spam management" section in the admin interface,
it'd make a lot of people happy. Even just a simple "discard as
spam" option which (a) discarded the message (b) saved it somewhere,
ran a trainer on it, then deleted it (or not, but these things pile
up) would be a useful change to many people.
I don't think it's a waste of time at all to make it easier for
people to use existing hooks and functions. And I think a lot of the
reason we haven't integrated the existing spam-related patches is
just that no one's had the time to look through them all and figure
out which ones are the best and/or what elements we want from each of
them. Again, I think doing so would be quite the worthwhile endeavour.
Not all development has to be innovative research! :)
Terri

On Thu, Mar 27, 2008 at 07:37:29PM +0100, Martin Schütte wrote:
Back in January I told our 500+ list admins that they could do this:
http://lists.ibiblio.org/pipermail/ibiblio-announce/2008-January/000210.html
And as of yesterday (27 of March) fewer than 20 had done anything. Yesterday I ran a script that imposed that filtering on all lists because we have been blacklisted by spamcop yet again. The message we were blacklisted for had been tagged as spam by SA on the list server, but still got bounced out to an innocent 3rd party (who then reported us).
Anything that makes spam filtering smarter and better-integrated in mailman is a Good Thing (tm). Providing the *option* to have good and sane filtering integration is definitely not enough. Even with hand-holding and encouragement, the vast majority of our list admins are not going to do nearly as good a job as mailman can do if it accepts this GSOC project. I'll be happy to provide whatever feedback or support I can in making that happen.
For the curious, the script I ran was based closely on what's in the wiki for changing generic_nonmember_action:
#!/bin/bash
cd /usr/local/mailman/bin
f=mktemp
echo "header_filter_rules = [('^X-Spam-Status: Yes', 3, 0)]" > $f
for list in cat /root/no-filter-rules.txt
do ./config_list -i $f $list
done
rm $f
where "/root/no-filter-rules.txt" is a list of lists that had not heeded my advice.
Cheers,
Cristóbal Palmer ibiblio.org systems administrator

Cristóbal Palmer writes:
Why not just enforce this in SpamAssassin?
If you've got lists that require special treatment and have trustworthy admins, get a plan from them and then add a rule that gives them a -5 or -10 bonus so that only really egregious spam gets automatically discarded, and the rest gets handled by the list- specific mechanism.
You could also enable the per-address configs in SA itself.
I don't see anything in this story that couldn't be done just as well with central control via SA at the MTA.

On Sat, Mar 29, 2008 at 01:08:14PM +0900, Stephen J. Turnbull wrote:
I don't see anything in this story that couldn't be done just as well with central control via SA at the MTA.
Part of this involves the backstory. 500+ lists that have never been in any way filtered, and many vocal list administrators concerned that having something imposed on them that they can't control will break things.
Personally, I think it's the MTA's job to reject malformed (eg. bad HELO) mail, it's SA's job to *tag* mail, and whatever the MTA hands off to should make the decision about whether to drop, quarantine, or deliver. That's a philosophical stance, and if it's impractical and I shouldn't think that way, then so be it. I'd like to hear some arguments before I change that view, though. My current solution has the advantage that for any complaining list admin, I can point that administrator to her/his own admin panel and say, "Play with these settings."
So basically what I'm saying is that my selfish POV makes me want a mailman that has nice anti-spam policies out of the box. If it requires an admin making decisions about which addresses to protect or whether to do it from within SA, mailman, or something else, then there's a problem.
I'm still scratching my head on how this bounced its way into my inbox, for example:
http://garp.metalab.unc.edu/backscatter-example.txt
How/where do I stop that?
Cheers,
Cristóbal Palmer ibiblio.org systems administrator

Cristóbal Palmer wrote:
It doesn't look to me like backscatter at all. It looks like spam sent to cc-co-owner@lists.ibiblio.org which went to MailScanner on "malecky" which replaced the original message with a message consisting of the "notice" with the original attached. That message then continued through delivery chain to cc-co-owner@lists.ibiblio.org which was redirected to postmaster@lists.ibiblio.org and then to admin@ibiblio.org by lists.ibiblio.org. It was then relayed to metalab.unc.edu (a bit of a puzzle as the MX for ibiblio.org is mail.metalab.unc.edu, but perhaps these are really the same machine) which redirected admin@ibiblio.org to cmpalmer@ibiblio.org which ultimately got delivered to cmpalmer@garp.metalab.unc.edu.
It also appears that the cmpalmer@ibiblio.org to cmpalmer@garp.metalab.unc.edu step involved a resend which rewrote the envelope sender to cmpalmer@metalab.unc.edu.
I don't know what there is to stop here. I may be completely wrong, but it looks like this was just mail sent to cc-co-owner@lists.ibiblio.org delivered through the chain that would apply to all such mail.
OK, I've just seen your reply to Robby Griffin's off-list message so the question is "why did cc-co-owner@lists.ibiblio.org" go to postmaster@lists.ibiblio.org.
You say "What I'm missing here is the step where the mail went from going to one of the three list admins (again, all at gmail) to going to me. Where was the forgery? How did mailman (or was it postfix?) get duped?"
There is no evidence in the Received: chain that this copy was sent to any of the three list admins. What does
/usr/local/mailman/bin/list_owners -m cc-co
show you? Assuming that doesn't list postmaster, what is in the MTA logs on lists.ibiblio.org regarding this message, and what's in Mailman's smtp log regarding this message? There's actually no indication that this ever went to Mailman. How is list mail delivered to Mailman on this machine? Is it possible that cc-co-owner@lists.ibiblio.org is mis-interpreted as trying to deliver to the 'co-owner' address of the cc list and this mis-delivery goes to postmaster?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Cristóbal Palmer writes:
"Beggars can't be choosers."
That is the stance I take personally. It's also the one described here:
http://mayfirst.org/?q=node/180
(this URL is from Ian Eiloart, but I don't know if he endorses the stance himself).
I know no other list-admins (not Mailman site admins or postmasters, list-admins) who take that stance. They simply want as little in their moderation queues as possible, and many ignore those until somebody complains. Don't you get the "I have 1000 spams in my queue and need to find one held message that's a real post, but lists.ibiblio.org times out and the page never gets done" FAQ from some of your admins? I haven't heard it from other list admins at my site, but I know two have 500 and 1200 pending in the mod queue! They certainly won't care if my site goes to a "shoot first, moderate later" policy.
There is a technical problem with our stance, which is that there is a difference between an SMTP reject (permanent failure status) and a bounce message. The SMTP reject *will* be heard by the spammer, and it is in his interest to prune such addresses from the list, at least the one he uses personally. (Not all are smart enough to recognize that, of course.)
Bounce messages, if sent, will almost certainly go to a forged address as backscatter :-(, and will not be heard by the spammer. In fact, since the spam was accepted, he is likely to consider the address to have been validated, whether you try to send a bounce or not.
For this reason I am looking forward to a way to issue SMTP rejects based on content. Eg, for sendmail and postfix, this could be implemented via a Mailman-provided milter.
Unfortunately, tuning list settings that have to do with filtering is not and never really was something that you want people who have never even set up an MTA to do. Understanding what happens is quite complex.
I don't see why that would be needed. If you have list-specific tweaks, then either all the SAs are feeding Mailman, or the ones that aren't won't care. I do this all the time.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mar 29, 2008, at 11:17 PM, Stephen J. Turnbull wrote:
What about the Mailman 3 LMTP server? I plan on backporting this to
2.2.
The solution in Mailman 3 will be to allow for defining named styles.
A style is simply a collection of some subset of all the configuration
variables on a mailing list. So you could imagine a site that cans a
few common styles for spam filtering and lets their list admins choose
which they want. If they really want to let them have full control,
they could do that through an 'advanced' tab.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkf0PRwACgkQ2YZpQepbvXHy3wCaAn5rGVTS9YsAqkq+amt3mVDP 3e4AoKYj71ShABwIJm13rlYy+jtzpPRR =Wrix -----END PGP SIGNATURE-----
participants (12)
-
Barry Warsaw
-
Cristóbal Palmer
-
Dale Newfield
-
Eino Tuominen
-
Ian Eiloart
-
Jason Pruim
-
Jo Rhett
-
Mark Sapiro
-
Martin Schütte
-
Stephen J. Turnbull
-
Terri Oda
-
Timo Wingender