[Mailman-Developers] UI for Mailman 3.0 update

Wed Jun 16 06:33:51 CEST 2010

On Tue, Jun 15, 2010 at 10:44:03PM -0400, Barry Warsaw wrote:
> 
> Given that all signups require an email validation step, and that we'll
> rate-limit that to prevent using signups as a spam vector, what additional
> protection does captcha provide?

Are you saying that no scripts/bots can automatically sign up for
mailman lists? I get plenty of signups like "qneu456na at nanke62w.net"
that suggest otherwise. I should take the time to log those and send
them to you, perhaps? After my masters paper...

Most of these numbers are educated guess numbers; if you want real,
validated numbers they'll have to wait, again, until I turn in my
masters paper. With that...

Let's say I have a large list that receives 16 signups a day, and of
those two are actually humans and not scripts. The list owner, having
had trouble with spammy signups in the past, has set the list to
require moderator approval before users can post. What are the human
costs?  We'll say that the two human signups took 40s each (80s), and
the moderator also took 40 seconds per signup (640s), for a total of
720s = 12 minutes.

Now let's assume the reCAPTCHA adds 13s[0] to real human signups and
cuts down spammy signups to 4 per day and re-run our math. The two
people now spend 106s and the moderator spends 160s, or 4.43 minutes.

Yes, we've shifted some costs to our subscribers, but they do that
once, and the moderator gets back time daily. What's more, we've
increased their burden by just over a quarter and almost divided the
moderators burden by three. And we haven't even mentioned the
increased cost to the spammer, or (in the case of reCAPTCHA) the
benefit to society the CAPTCHA solving work.

That's the real point of all this: drive up the cost to spammers as
much as possible while imposing as little burden as is reasonable on
list owners, moderators, subscribers, site admins, etc. We can't
exactly follow the metafilter model[0] here, and I think this is as
good an idea as I have seen, but I'd love for others to propose
something else that imposes less of a burden on subscribers and we
know will drive up costs to spammers over a longer-term basis.

Again, I don't even propose we turn this on by default. I would just
like to see this as a documented, tested option that can be enabled by
site admins and cleanly upgraded without extra work.

Okay... now that I've put all this energy into this explanation, I'll
admit: spam to list owners, especially of the "Dear $LISTNAME owner,
we at $SITENAME security need you to reset your password. Please find
instructions in the attached .zip file..." were a much bigger problem
a couple of years ago (surprisingly even after implementing SA) until
I decided to block .zip and several other mime types at the MTA
level. So if y'all have no interest in doing any reCAPTCHA
integration, I'll just spend that much more time making anti-spam
tweaks at the MTA level, and I'll field one or two more "I'm a
moderator and I'm dealing with a lot of spam here" tickets every now
and then.

That's another point, come to think of it: I've had plenty of time and
experience running a couple of decently-sized mailman installs, but
what about the many, many people who have less experience running
mailman? The easier we make it for them to make it hard on spammers,
the better.

A final note: are there any published user studies on mailman? I see
your ATEC '03 and LISA '98 presentations in the ACM portal, and I see
http://www.gnu.org/software/mailman/otherstuff.html ... but nothing
else turns up in google scholar. Please point me to other research on
mailman and its user base if it exists. If it doesn't, maybe I need to
make that happen....

Thanks so much for all the work all of you do. It really is a pleasure
and a privilege to be involved.

Cheers,
-- 
Cristóbal Palmer
ibiblio.org
metalab.unc.edu

[0] http://www.sciencemag.org/cgi/content/full/321/5895/1465
"reCAPTCHA: Human-Based Character Recognition via Web Security Measures."
Originally published in Science Express on 14 August 2008
Science 12 September 2008:
Vol. 321. no. 5895, pp. 1465 - 1468
DOI: 10.1126/science.1160379

Quoting:

User testing on our site (http://captcha.net) showed that it took
13.51 s on average (SD = 6.37) for 1000 randomly chosen users to solve
a seven-letter conventional CAPTCHA (25th percentile was 8.28 s,
median was 12.62 s, and 75th percentile was 17.12 s), whereas it took
13.06 s on average (SD = 7.67) for a different set of 1000 randomly
chosen users (also from http://captcha.net) to solve a reCAPTCHA (25th
percentile was 5.79 s, median was 12.64 s, and 75th percentile was
18.91 s).

[1] Charge five US dollars (paypal) for an account.