[Mailman-Users] GSOC idea: The central scrutinizer ;)

Rich Kulawiec rsk at gsp.org
Tue Apr 17 10:56:38 EDT 2018


I have a partially-completed spec for a module that will examine
messages for various issues but my Python-fu is likely not sufficient
to realize it and I'm busy writing anyway.  This is probably a GSOC-size
and GSOC-scope project, so if anybody is game, below is a poorly-written
and large incomplete description of what I have in mind.

1. As I've said for years, anti-abuse controls should be layered and
should start at the network perimeter (with things like the Spamhaus
DROP list in border routers).  Additional layers may be in firewalls,
in the MTA, in the MLM (such as Mailman).  No one layer can catch
everything because it doesn't have contextual knowledge, e.g., the
firewall doesn't know who the members of a mailing list are or even
that a particular SMTP connection it's allowing contains traffic for
a mailing list.

Unfortunately, email accounts have been hijacked by the billions (and
that's just counting Yahoo).  The new owners of those accounts likely
enjoy the same SMTP reachability that the old owners did and can thus
drop messages onto mailing lists.  (I'm going to omit the long explanation
about why context-sensitive measures in the MTA are not only inadequate
to stop this, but are also highly undesirable.)  To put it another way:
we don't have many defenses against our friends.  But we need them.

2. We all have our own views on proper netiquette for mail/mailing lists,
and of course, mine are the only correct ones. ;)  But regardless of
those, it'd be useful to have a mechanism to scrutinize messages for
things like "800 lines of quoted digest with one top-posted line above it".
Of course some of this is easy to see and hard to code, but I think a
modest attempt in this direction would be helpful.  (Besides, if the
action taken on detection of these is to merely hold the message,
then little harm is done by false positives.)

3. 1 and 2 are related.  The same kinds of criteria that are useful
in detecting and putting a hold on spam are useful in detecting undesirable
content in messages, like URL shorteners and third-party tracking links.
So it makes sense to have a highly configurable module that comes with
a minimal/loose default configuration that can be salted to taste.

So what I have in mind is a module that scrutinizes messages based
on a set of (enabled/disabled) criteria, each one of which is configurable.

I know that's not very clear.  Let me make up an example and see if
that helps.

Let's suppose we call this module the Central Scrutinizer (CS) because
oblique tributes to Frank Zappa are always in order.  The CS might have
a list of dozens of checks like this:

	- URL shorteners (1)
	- tracking links (2)
	- full digest quoting (3)

Check (1) would consult a list of known URL shortening domains and
do pattern matching of any URLs in the message against them.  Check (2)
would attempt to detect tracking links/"web bugs".  Check (3) would
check messages to see if it includes an entire quoted digest.

Each one of these would be associated with an action -- and I strongly
think that "hold" would be best, because these tests are going to make
lots of mistakes for a while.  The results would be presented to list
owners (in email notifications and in the browser interface) with
something like:

	Central scrutinizer report:
		- URL shorteners - fail, example.net detected
		- tracking links - pass, none detected
		- full digest quoting - pass, none detected

with appropriate presentation so that it's easy to read and so that
when a test fails, it reports *why* it failed.

Let me note that the MTA is wrong place to do this stuff for a whole
bunch of reasons.  Among other things, lots and lots of people want
different policies applied to mailing list traffic (which will result
in outbound mail traffic) than they due to local traffic (which won't).
This is a serious issue for anybody concerned with their mail system's
reputation, which is based on what it emits, not what it accepts.

Of course not everyone will agree with every check, and not every
check is appropriate on every mailing list.  (For example, a one-way
announce-only list is unlikely to need most of these.)  But if this
is modular, and if individual checks can be switched on/off readily,
then it can ship with everything off and folks can enable whatever
subset they find palatable.

Let me also note that the motivation for this comes not just from
things I've bumped into while running lists, but things I've seen on
other lists.  (And I'm on rather a lot of them.)  I've been thinking
about this for years, but have become convinced in the last six months
or so that the problem has now reached a point where a solution is
worth constructing.

---rsk


More information about the Mailman-Users mailing list