[Python-Dev] Summary of Tracker Issues

Wed May 16 11:07:36 CEST 2007

Georg Brandl writes:

 > By requesting a registration form over and over, and recording all
 > questions. A human would then answer them, which is easily done for
 > 50 questions (provided that they are *not* targeted at experienced
 > Python programmers, which shouldn't be done).

We are not going to publish all 50 questions at once.

ISTM you need one only question requiring human attention at a time,
because once a spammer assigns a human (or inhuman of equivalent
intelligence) to cracking you, you're toast.  Use it for a short
period of time (days, maybe even weeks).  The crucial thing is that
questions (or site-specific answers that require reading comprehension
to obtain from the page) differ across sites; they must not be shared.

Now it's much faster for the human to simply do the registration on
the current question, and then point the spambot at the site and
vandalize 10,000 or so issues.  If we can force them to do that,
though, I think we're going to win.  (In a "scorched earth" sense,
maybe, but the spammers will get burned too.)

Note that one crucial aspect is to record the ID of the question that
each account authenticated with (at creation, not at login -- the
person's password is a different token).  Then have a Big Red Switch
that hides[1] all data entered by accounts that authenticated with that
question.  Of course admins only throw the switch on actually seeing
the spam, but since all data is associated with a creation token, you
can nuke all of it, even if the spammer has had forethought to create
multiple accounts with the Question of the Day, with *one* switch.
And if they try to save such an account for tomorrow, cool! they're
busted right there.

You can get smarter than that (ie, by only barring access to data by
accounts that touch more than a small number of issues in a short
period of time), if you wish, but that should be sufficient unless
you're getting dozens of new users during the validity period for a
given question.  I guess there will need to be a special token,
available only to accounts confirmed by admins, to recover accounts
for people who happen to have the same "birthday" as a spammer.

Footnotes: 
[1]  Ie, requires user action to become visible, and is tagged as
"possible spam".  This requires a new attribute on data items, and
some programming, but since roundup has to recreate the page for every
request (even if it caches, it has to do so for every new item; it's
not a problem to invalidate the cache and recreate, I bet), I think
it's probably not going to require huge amounts of extra effort or
changes in the basic design.

[2]  Probabilistically.  If the spammers are cracking your site on
average every 10 days, rotate the question every 5 days.  50 questions
means protection for most of a year in that case.