Mailman 3 anti-spam filter - Mailman-Developers

newer
mailman query

anti-spam filter

older
Re: [Mailman-Developers] GSOC...

Stephen J. Turnbull

April 14, 2013

11:42 p.m.

Pratik Sarkar writes:

...

Is the anti-spam/abuse filter still being seriously considered as a gsoc project this year?

I would say so, yes. Personally, I am fundamentally opposed to it; I think it's wrong in principle (filtering of this kind should be done by the incoming MTA) and inappropriate for the 3.0 release. *But* there is clearly user demand for it, and among the actually signed-up mentors[1] there are at least two who have shown some support for it.

OTOH, you should be aware that while nobody has a veto except Barry (Terri has one ex oficio but I doubt she'd exercise it if Barry was in favor), it's likely that projects that all the mentors favor will be ranked higher, and looking at the quality of posts from students so far there is going to be competition for Mailman slots.

If you have a solid proposal for anti-spam already worked out (the idea the guy proposed with a Bayesian filter based on word triads comes close to what I'd call solid, see also Terri's post where she proposed a schedule for the kind of additional detail needed), then that's probably your best bet.

But if you're looking for a project and you think that anti-spam is cool but that's all the thinking you've done so far, I'd say it's very risky proposal. There are a lot of things we need in UI (both subscriber-oriented and admin-oriented) that are interesting and higher-priority.

Footnotes: [1] Terri (org admin), Barry (project lead), Wacky, Pingu, Florian, and me.

Show replies by date

Patrick Ben Koetter

April 2013

2:28 a.m.

Stephen J. Turnbull <stephen@xemacs.org>:

...

Pratik Sarkar writes:

...
Is the anti-spam/abuse filter still being seriously considered as a gsoc project this year?

I would say so, yes. Personally, I am fundamentally opposed to it; I think it's wrong in principle (filtering of this kind should be done by the incoming MTA) and inappropriate for the 3.0 release. *But* there is clearly user demand for it, and among the actually signed-up mentors[1] there are at least two who have shown some support for it.

OTOH, you should be aware that while nobody has a veto except Barry (Terri has one ex oficio but I doubt she'd exercise it if Barry was in favor), it's likely that projects that all the mentors favor will be ranked higher, and looking at the quality of posts from students so far there is going to be competition for Mailman slots.

If you have a solid proposal for anti-spam already worked out (the idea the guy proposed with a Bayesian filter based on word triads comes close to what I'd call solid, see also Terri's post where she proposed a schedule for the kind of additional detail needed), then that's probably your best bet.

But if you're looking for a project and you think that anti-spam is cool but that's all the thinking you've done so far, I'd say it's very risky proposal. There are a lot of things we need in UI (both subscriber-oriented and admin-oriented) that are interesting and higher-priority.

Perhaps the integration could create an interface itself that makes it easy to add other filters in the future. I am thinking Postfix 'content filter', which uses SMTP/LMTP to send messages to an external filter and they send it back then using SMTP.

The first Mailman content filter could be the one you proposed. Others could add their filter later.

p@rick

-- [*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Joerg Heidrich

Stephen J. Turnbull

3:22 a.m.

Patrick Ben Koetter writes:

...

Perhaps the integration could create an interface itself that makes it easy to add other filters in the future. I am thinking Postfix 'content filter', which uses SMTP/LMTP to send messages to an external filter and they send it back then using SMTP.

The first Mailman content filter could be the one you proposed. Others could add their filter later.

I don't see any Mailman-specific issues in content-based spam filtering, though, and very little Mailman-specific coding. I also worry about reinventing the wheel, with corners. Ie, why do we think a GSoC student is going to be able to do something that's worth putting up again the very effective filters with huge user bases like SpamAssassin and SpamBayes that are already out there? Wouldn't a half-baked summer project just sit there and bitrot?

OTOH, a generic interface to the Mailman REST API, which could be used by Sendmail milters or whatever else is out there and somewhat standard (is it Postfix or Exim that can handle milters? I forget), with example implementation of a milter that checks whether the poster is subscribed by asking Mailman, would be useful as an extension to any MTA used with Mailman.

Or Terri's through-the-web interface to the Mailman Handler pipeline(s), with an example Handler installable from PyPI which wraps SpamAssassin or SpamBayes.

Richard Wackerbarth

4:11 a.m.

FWIW,

I tend to support Stephen's view with respect to usefulness and interface strategy.

Wacky

On Apr 15, 2013, at 5:22 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

Patrick Ben Koetter writes:

...
Perhaps the integration could create an interface itself that makes it easy to add other filters in the future. I am thinking Postfix 'content filter', which uses SMTP/LMTP to send messages to an external filter and they send it back then using SMTP.

The first Mailman content filter could be the one you proposed. Others could add their filter later.

I don't see any Mailman-specific issues in content-based spam filtering, though, and very little Mailman-specific coding. I also worry about reinventing the wheel, with corners. Ie, why do we think a GSoC student is going to be able to do something that's worth putting up again the very effective filters with huge user bases like SpamAssassin and SpamBayes that are already out there? Wouldn't a half-baked summer project just sit there and bitrot?

OTOH, a generic interface to the Mailman REST API, which could be used by Sendmail milters or whatever else is out there and somewhat standard (is it Postfix or Exim that can handle milters? I forget), with example implementation of a milter that checks whether the poster is subscribed by asking Mailman, would be useful as an extension to any MTA used with Mailman.

Or Terri's through-the-web interface to the Mailman Handler pipeline(s), with an example Handler installable from PyPI which wraps SpamAssassin or SpamBayes.

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/rkw%40dataplex.net

Security Policy: http://wiki.list.org/x/QIA9

Pratik Sarkar

4:32 a.m.

Can someone please give me a link of the existing mailman spam filter techniques.? Pratik

On Mon, Apr 15, 2013 at 4:11 AM, Richard Wackerbarth <rkw@dataplex.net>wrote:

...

FWIW,

I tend to support Stephen's view with respect to usefulness and interface strategy.

Wacky

On Apr 15, 2013, at 5:22 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...
Patrick Ben Koetter writes:

...
Perhaps the integration could create an interface itself that makes it easy to add other filters in the future. I am thinking Postfix 'content filter', which uses SMTP/LMTP to send messages to an external filter and they send it back then using SMTP.

The first Mailman content filter could be the one you proposed. Others could add their filter later.

I don't see any Mailman-specific issues in content-based spam filtering, though, and very little Mailman-specific coding. I also worry about reinventing the wheel, with corners. Ie, why do we think a GSoC student is going to be able to do something that's worth putting up again the very effective filters with huge user bases like SpamAssassin and SpamBayes that are already out there? Wouldn't a half-baked summer project just sit there and bitrot?

OTOH, a generic interface to the Mailman REST API, which could be used by Sendmail milters or whatever else is out there and somewhat standard (is it Postfix or Exim that can handle milters? I forget), with example implementation of a milter that checks whether the poster is subscribed by asking Mailman, would be useful as an extension to any MTA used with Mailman.

Or Terri's through-the-web interface to the Mailman Handler pipeline(s), with an example Handler installable from PyPI which wraps SpamAssassin or SpamBayes.

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/rkw%40dataplex.net

Security Policy: http://wiki.list.org/x/QIA9

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/iampratiksarkar%40...

Security Policy: http://wiki.list.org/x/QIA9

Ian Eiloart

4:48 a.m.

On 15 Apr 2013, at 11:22, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

OTOH, a generic interface to the Mailman REST API, which could be used by Sendmail milters or whatever else is out there and somewhat standard (is it Postfix or Exim that can handle milters? I forget), with example implementation of a milter that checks whether the poster is subscribed by asking Mailman, would be useful as an extension to any MTA used with Mailman.

I think Mailman supports SMTP/LMTP calls to discover whether a sender is permitted to post to a list, doesn't it?

Exim doesn't handle Milters, but can do the calls forward. Provided Mailman is making the judgement, and issuing L/SMTP rejects at L/SMTP time before accepting the message, Exim is fine.

Content filtering *could* also be done at L/SMTP time. I think that where the Mailman and the MTA installations are managed by the same person or organisation, then the better place to have content filtering performed is at the MTA, but there might be exceptions to this.

For example, a medical mailing list might want to be more liberal with regard to drugs that are commonly marketed in spam. Conversely, a list might have a particular subscriber demographic that makes it more sensitive to bad language. Or perhaps different lists might have different primary languages, and therefore different views on the value of messages in that language.

So, I can see that different lists on the same system might have different requirements for spam filtering. However, the solution is probably to provide hooks into Spamassassin, or another existing spam solution, and to provide ways that list owners can manage a configuration file on a per list basis.

-- Ian Eiloart Postmaster, University of Sussex +44 (0) 1273 87-3148

Patrick Ben Koetter

5:13 a.m.

Ian Eiloart <iane@sussex.ac.uk>:

...

On 15 Apr 2013, at 11:22, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...
OTOH, a generic interface to the Mailman REST API, which could be used by Sendmail milters or whatever else is out there and somewhat standard (is it Postfix or Exim that can handle milters? I forget), with example implementation of a milter that checks whether the poster is subscribed by asking Mailman, would be useful as an extension to any MTA used with Mailman.

I think Mailman supports SMTP/LMTP calls to discover whether a sender is permitted to post to a list, doesn't it?

Exim doesn't handle Milters, but can do the calls forward. Provided Mailman is making the judgement, and issuing L/SMTP rejects at L/SMTP time before accepting the message, Exim is fine.

It would be great if Exim as third MTA in the OSS troika of MTAs would.

...

Content filtering *could* also be done at L/SMTP time. I think that where the Mailman and the MTA installations are managed by the same person or organisation, then the better place to have content filtering performed is at the MTA, but there might be exceptions to this.

For example, a medical mailing list might want to be more liberal with regard to drugs that are commonly marketed in spam. Conversely, a list might have a particular subscriber demographic that makes it more sensitive to bad language. Or perhaps different lists might have different primary languages, and therefore different views on the value of messages in that language.

ACK

...

So, I can see that different lists on the same system might have different requirements for spam filtering. However, the solution is probably to provide hooks into Spamassassin, or another existing spam solution, and to provide ways that list owners can manage a configuration file on a per list basis.

ACK. SpamAssassin knows how to apply different policies per recipient(domain). It is possible to provide policies via file config, SQL or LDAP.

p@rick

-- [*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Joerg Heidrich

Barry Warsaw

7:39 a.m.

On Apr 19, 2013, at 11:48 AM, Ian Eiloart wrote:

...

I think Mailman supports SMTP/LMTP calls to discover whether a sender is permitted to post to a list, doesn't it?

MM3's LMTP server currently only does a limited sanity check on the messages. E.g. does the To: field name an existing mailing list[1]

...

Exim doesn't handle Milters, but can do the calls forward. Provided Mailman is making the judgement, and issuing L/SMTP rejects at L/SMTP time before accepting the message, Exim is fine.

As a side note, right now only Postfix is officially supported, mostly because that's what I use so I can easily debug it. I would love to have contributions to support at least Exim and Sendmail out of the box. If you're an expert willing to contribute that code, please get in touch.

...

Content filtering *could* also be done at L/SMTP time. I think that where the Mailman and the MTA installations are managed by the same person or organisation, then the better place to have content filtering performed is at the MTA, but there might be exceptions to this.

Currently, I'm trying to keep the processing that the LMTP server does at acceptance time to a minimum, just because I'm concerned about its single threaded performance. While it does async I/O, and it runs in a separate process, time consuming processing for a single message will still block acceptance of all other messages.

The answer to this is to somehow multiplex the LMTP server, but ideally without using multiple threads (MM3 is currently single threaded everywhere). In any case, this would also be interesting to work on.

-Barry

[1] I just noticed https://bugs.launchpad.net/mailman/+bug/1170726

Stephen J. Turnbull

9:26 p.m.

Barry Warsaw writes:

...

I would love to have contributions to support at least Exim and Sendmail out of the box. If you're an expert willing to contribute that code, please get in touch.

I'm not an Exim expert, but my production[1] system uses Exim. I'm working (slowly) on Mailman 3 integration.

Footnotes: [1] It's part of the daily workflow, but high availability is not a requirement. :-)

Pratik Sarkar

5:44 a.m.

Okay so what should a gsoc student concentrate on for the project? 1.a standardized interface (e.g. MILTER, SMTP/LMTP transport) 2.Handler which delegates to external spam filtering packages 3.A totally new spam filter 4.An interface where users can manually tag "this mail is a spam" (which remain unfiltered) to improve existing spam database.

On Sat, Apr 20, 2013 at 9:56 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

...

Barry Warsaw writes:

...
I would love to have contributions to support at least Exim and Sendmail out of the box. If you're an expert willing to contribute that code, please get in touch.

I'm not an Exim expert, but my production[1] system uses Exim. I'm working (slowly) on Mailman 3 integration.

Footnotes: [1] It's part of the daily workflow, but high availability is not a requirement. :-)

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/iampratiksarkar%40...

Security Policy: http://wiki.list.org/x/QIA9

Ian Eiloart

8:44 a.m.

On 19 Apr 2013, at 15:39, Barry Warsaw <barry@list.org> wrote:

...

On Apr 19, 2013, at 11:48 AM, Ian Eiloart wrote:

...
I think Mailman supports SMTP/LMTP calls to discover whether a sender is permitted to post to a list, doesn't it?

MM3's LMTP server currently only does a limited sanity check on the messages. E.g. does the To: field name an existing mailing list[1]

The "To: field"? Does that mean the argument of the "RCPT TO" command in the LMTP session? Or does it mean the "To:" message header? The two aren't necessarily related.

And, does it not also check the argument of the "MAIL FROM" command? To ensure that the sender is permitted to send to the list specified in RCPT TO. That check is hugely important. It's what keeps mailing lists spam free.

...

...
Exim doesn't handle Milters, but can do the calls forward. Provided Mailman is making the judgement, and issuing L/SMTP rejects at L/SMTP time before accepting the message, Exim is fine.

As a side note, right now only Postfix is officially supported, mostly because that's what I use so I can easily debug it. I would love to have contributions to support at least Exim and Sendmail out of the box. If you're an expert willing to contribute that code, please get in touch.

...
Content filtering *could* also be done at L/SMTP time. I think that where the Mailman and the MTA installations are managed by the same person or organisation, then the better place to have content filtering performed is at the MTA, but there might be exceptions to this.

Currently, I'm trying to keep the processing that the LMTP server does at acceptance time to a minimum, just because I'm concerned about its single threaded performance.

That's a very good argument for limiting the checks to the RCPT TO phase. Exim can call forward at MAIL FROM, and reject the message if necessary without ever seeing the message body.

...

While it does async I/O, and it runs in a separate process, time consuming processing for a single message will still block acceptance of all other messages.

The answer to this is to somehow multiplex the LMTP server, but ideally without using multiple threads (MM3 is currently single threaded everywhere). In any case, this would also be interesting to work on.

-Barry

[1] I just noticed https://bugs.launchpad.net/mailman/+bug/1170726

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/iane%40sussex.ac.u...

Security Policy: http://wiki.list.org/x/QIA9

-- Ian Eiloart Postmaster, University of Sussex +44 (0) 1273 87-3148

4288

Age (days ago)

4295

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Barry Warsaw
Ian Eiloart
Patrick Ben Koetter
Pratik Sarkar
Richard Wackerbarth
Stephen J. Turnbull

anti-spam filter

Richard Wackerbarth

Pratik Sarkar

Pratik Sarkar

tags

participants (6)