[Mailman-Developers] FYI -- mailback validations no longer safe?

J C Lawrence claw@kanga.nu
Sat, 09 Dec 2000 11:24:10 -0800


On Sat, 9 Dec 2000 03:09:26 -0600 
Christopher Lindsey <lindsey@ncsa.uiuc.edu> wrote:

>> I'm passing this along mostly as a FYI, but also as a sanity
>> check. I sent this out to list-managers tonight, to bring up an
>> issue that sort of crystalized this afternoon and made me realize
>> that I think we have the beginnings of a problem in mail list
>> land. Your thoughts are welcome....If I'm right, well, oh,
>> boy. If I'm wrong -- I'd love to find out my idea won't work, but
>> I think it's not only possible, but fairly easy.

> Hi Chuq,

>    Yes, this has definitely been troublesome.  I've blocked many
> commercial sites like findmail.com (egroups) and remarq.com from
> my lists because of their secret archiving that displays email
> addresses to the public, but at least they don't spam the lists
> back.  But of course anyone can browse these sites and get
> addresses to their heart's content, then forge MAIL FROM: to sneak
> mail into the lists.

>    I'm not sure what the right thing is to do.  MLMs like sympa (

>       http://listes.cru.fr/sympa/

>    ) are definitely moving in the right direction with S/MIME
> signatures/encryption and X509 user certs, but that still doesn't
> stop someone from using throwaway certs to spam several lists or
> from harvesting addresses.  

You are failing to distinguish between two problems:

  1) Is this post from someone I know?
  2) Is this post from who I think it is from?

#1 is handled by any form of digital signature, and is handled
especially well by non-centrally signed/verified forms (eg PGP, GPG,
etc).  All that's needed is for the list member to convey or have
conveyed to you their public key.

#2 is handled by having a public key for a member.  It doesn't
matter if it is signed or if Verisign or some other set of pains
vouch for it.  It merely matters that a current post from them
cross-checks with their signature.

So what problem does that leave?

  Is this post, which comes from a member for whom I have a key, and
  which ckecks against his key, SPAM?

It doesn't matter where the keys or certs come from.  It doesn't
matter if a trusted authority is involved or not.  It is a human
question.  I'm sure that spammers are just as capable as we are at
getting demo certs from Thawte, or in cooking up new GPG keys willy
nilly.  

Again, this is a question of trust networks, and is a subset of the
problem of reputational systems.  Unfortunately, unlike the systems
I normally needing reputational systems that I normally spend my
time looking at, this is not an area which panders to central
databases, and over-arching solutions.  SPAM detection at the
individual membership level is a question of individual evaluation.

> The problem is that when these methods are used for authentication
> they just prove that the email address sending the stuff is who we
> think he or she is.  But at least you can't forge the source email
> address to look like it's coming from a list member who is allowed
> to post (well, it's harder :)

It raises the bar to requiring that SPAMers compromise members keys
in a wholesale fashion.  While certainly possible (I spent an hour
last weekend seeing how many exploitable Windows systems I could
find within that hour.  I gave up after about 20 minutes when the
count passed treble digits), the barrier to entry is much larger and
the ROI is much smaller (a compromised key only gains access to a
few lists and posting venues).

> I think that there's an implicit level of trust that has to be
> honored in mailing list management.  Even SASL-based SMTP
> authentication from ISPs isn't going to prevent throw-away
> accounts from being used.  Until we can get a fingerprint or
> cornea scan (or even a driver's license) with each mailing list
> subscription and compare it against a master database (which I'm
> not advocating), you can't be 100% sure of the users.

No.  Think in terms of trust, not identity.  Do you really care that
you can track that particularl membership back to one identifiable
human body?  Really?  Over the entire planet?  Or do you really just
care that JoeBlow has posted signal in the past and you feel that
you can trust him to post signal in the future?

Or more simply, if you are not going to operate on past posting
behaviour as your trust metric:

  Why do you trust member X to not post spam?  What are the criteria
  you use for making that decision?  Why are those criteria
  trustable?  What are the risks?

I like past behaviour as it is simple, non-invasive (I don't need to
know who anybody is), and it fits transparently in as an invisible
extention of traditional list moderation models.  

>    For now I'd say that the best method is a social one; require
> references when people want to subscribe to your list.  Ask them
> which lists they participate on, an example post from another
> list, etc.  But ultimately it becomes a judgement call by the
> listowner either way.

For a few years I ran a list where membersip was by invitation only
-- a current member had to invite you.  It worked well.  Membership
grew steadily, more than 70% of the members regularly posted to the
list, and signal was high.

Later I moved the list to free subscription with posting authority
granted on application only, with applications needing to be
accompanied by a proposed first post.  This too worked well with
minor caveats.  List membership grew roughly twice as fast, but the
poting percentage fell to around 40%.  The caveats were in
maintaining the approved poster list, and in particular determinging
and enforcing policy for removing posting authority was painful.

I currently run an open subscription model with hand moderation of
all posts.  Again, this has worked well.  Subscription rates are
roughly 4 times higher than ever before, but posting percentages are
down in the <10% range.  It is of course labour intensive.

Recently I've moved to not only hand moderating the list, but hand
editing posts (eg to remove HTML, over quoting, inflammatory
content, etc) marking each post I edit with a comment as to the
changes.  While I haven't been doing this for long, subscription
rates have noticably increased by perhaps 20% (tho the sample time
is small).  The work load on me is non-trivial.  It moved me from
the position of gatekeeper to editor, a position I'm willing to
occupy, but am not keen on.

In the end you are trading editorial control (your trust model and
signal definition) for work.  Ultimately you are attempting to
automate the process of determining signal.  The initial
approximation is by determining signal sources.  The approximation
we are discussing above is in determining if signal sources really are
signal sources.

-- 
J C Lawrence                                       claw@kanga.nu
---------(*)                        : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--