ACTIVITY SUMMARY (05/06/07 - 05/13/07) Tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1650 open ( +0) / 8584 closed ( +0) / 10234 total ( +0) Average duration of open issues: 791 days. Median duration of open issues: 743 days. Open Issues Breakdown open 1650 ( +0) pending 0 ( +0)
On 5/12/07, Tracker <status@bugs.python.org> wrote: [...] I clicked on the tracker link out of curiosity noticed that the tracker has been spammed -- issues 1028, 1029 and 1030 are all spam (1028 seems a test by the spammer). These issues should be deleted and their creator's accounts disabled. BTW What's the hold-up for making roundup live? I'm sick of sf. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 5/12/07, Guido van Rossum <guido@python.org> wrote:
On 5/12/07, Tracker <status@bugs.python.org> wrote: [...]
I clicked on the tracker link out of curiosity noticed that the tracker has been spammed -- issues 1028, 1029 and 1030 are all spam (1028 seems a test by the spammer).
We know. Skip is working on something for this. These issues should be deleted and their creator's accounts disabled. The user accounts will be created from scratch when we do the actual transition. Plus all of the existing tracker items will be wiped. BTW What's the hold-up for making roundup live? I'm sick of sf. Well, the tracker you are sick of is holding us up. The data dump that we were depending on stopped working properly last month. We are trying to bug them to fix it as they don't see any issues with it while all of us cannot get a complete dump. -Brett --
--Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
I clicked on the tracker link out of curiosity noticed that the tracker has been spammed -- issues 1028, 1029 and 1030 are all spam (1028 seems a test by the spammer).
These issues should be deleted and their creator's accounts disabled.
(Notice that the spammer hasn't been as successful as he thinks - the spam downloads as plain text, not as HTML as he had hoped). That's actually an issue that will like require continuous volunteer efforts. Unless an automated spam filtering materializes (which may or may not happen), people will need to clean out spam manually. It's not that easy for a spammer to submit the spam: we require a registration with an email roundtrip - which used to be sufficient, but isn't anymore, as the spammers now have email accounts which they use for signing up. We have some machinery to detect spambots performing registration, and that already filters out a lot attempts (at least the spam frequency went down when this got activated), but some spammers get still past it. Now it's up to volunteers to do ongoing spam clearing, and we don't have that much volunteers. I think a single-click button "Spammer" should allow committers to lock an account and hide all messages and files that he sent, but that still requires somebody to implement it. Regards, Martin
On 5/13/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Now it's up to volunteers to do ongoing spam clearing, and we don't have that much volunteers. I think a single-click button "Spammer" should allow committers to lock an account and hide all messages and files that he sent, but that still requires somebody to implement it.
I'd expect that to be pretty effective -- like graffiti artists, spammers want their work to be seen, and a site that quickly removes them will not be worth the effort for them. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
>> Now it's up to volunteers to do ongoing spam clearing, and we don't >> have that much volunteers. I think a single-click button "Spammer" >> should allow committers to lock an account and hide all messages and >> files that he sent, but that still requires somebody to implement it. Guido> I'd expect that to be pretty effective -- like graffiti artists, Guido> spammers want their work to be seen, and a site that quickly Guido> removes them will not be worth the effort for them. I'm still (slowly) working on adding SpamBayes to Roundup. I've exchanged one or two messages with Richard Jones. In the meantime (thinking out loud here), would it be possible to keep search engines from seeing a submission or an edit until a trusted person has had a chance to approve it? It should also be possible for trusted users to mark other users as trusted. Trusted users' submissions and edits should not require approval. In a rather short period of time I think you'd settle on a fairly static group of trusted users who are responsible for most changes. Only new submissions from previously unknown users would require approval. Skip
In the meantime (thinking out loud here), would it be possible to keep search engines from seeing a submission or an edit until a trusted person has had a chance to approve it?
It would be possible, but I would strongly oppose it. A bug tracker where postings need to be approved is just unacceptable. Regards, Martin
On Mon, May 14, 2007, "Martin v. L?wis" wrote:
Skip(?):
In the meantime (thinking out loud here), would it be possible to keep search engines from seeing a submission or an edit until a trusted person has had a chance to approve it?
It would be possible, but I would strongly oppose it. A bug tracker where postings need to be approved is just unacceptable.
Could you expand this, please? It sounds like Skip is just talking about a dynamic robots.txt, essentially. Anyone coming in to the tracker itself should still see everything. Moreover, this restriction only comes into play for postings from new people, which should limit the load. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
Aahz schrieb:
On Mon, May 14, 2007, "Martin v. L?wis" wrote:
Skip(?):
In the meantime (thinking out loud here), would it be possible to keep search engines from seeing a submission or an edit until a trusted person has had a chance to approve it? It would be possible, but I would strongly oppose it. A bug tracker where postings need to be approved is just unacceptable.
Could you expand this, please? It sounds like Skip is just talking about a dynamic robots.txt, essentially. Anyone coming in to the tracker itself should still see everything.
I must have misunderstood Skip then - I thought he had a scheme in mind where an editor would have approve postings before they become visible to tracker users; the tracker itself cannot distinguish between a search engine and a regular (anonymous) user. As for a dynamically-expanding robots.txt - I think that would be difficult to implement (close to being impossible). At best, we can have robots.txt filter out entire issues, not individual messages within an issue. So if a spammer posts to an existing issue, no proper robots.txt can be written. Even for new issues: they can be added to robots.txt only after they have been created. As search engines are allowed to cache robots.txt, they might not see that it has been changed, and fetch the issue that was supposed to be blocked. Regards, Martin
>> On Mon, May 14, 2007, "Martin v. L?wis" wrote: >>> Skip(?): >>>> In the meantime (thinking out loud here), would it be possible to >>>> keep search engines from seeing a submission or an edit until a >>>> trusted person has had a chance to approve it? >>> It would be possible, but I would strongly oppose it. A bug tracker >>> where postings need to be approved is just unacceptable. >> >> Could you expand this, please? It sounds like Skip is just talking >> about a dynamic robots.txt, essentially. Anyone coming in to the >> tracker itself should still see everything. Martin> I must have misunderstood Skip then - I thought he had a scheme Martin> in mind where an editor would have approve postings before they Martin> become visible to tracker users; the tracker itself cannot Martin> distinguish between a search engine and a regular (anonymous) Martin> user. Okay, let me expand. ;-) I didn't mean do dynamically update robots.txt. I meant to modify Roundup to restrict view of items which have not yet been explicitly or implicitly approved. I envision three classes of users: 1. People with no special credentials (I include anonymous users such as search engine spiders in this class) 2. Tracker admins (Erik, Aahz, Martin, me, etc) 3. Other trusted users (include admins in this group - they are the root of the trust network). Anyone can submit an item or edit an item, however, if that person is not trusted, their submissions need to be approved by a trusted user before they are made visible to the unwashed masses in group 1. Also, such users will not be able to see any unapproved items. (That thwarts the desire of the spammers for visibility - search engine spiders will not know their submissions exist, and anonymous users will just get 404 responses when they try to access unapproved attachments or submissions.) The intent is that this would be done by modifying Roundup. True, initially, lots of submissions would be held for review, but I think we would fairly quickly expand the trust network to a larger, fairly static group of users. Once someone adds Guido to the trust network, any pending and future modifications of his will be visible to the world. Once trusted, Guido can extend the trust network himself, by, for example adding Georg to the network. Also, once trusted, a user would see everything and would be able to approve individual submissions. Again, as I indicated, I was thinking out loud. I don't think this is a trivial undertaking. I suspect the approach might work for a number of similar systems (Trac, MoinMoin, etc), not just Roundup though. Skip
I think a single-click button "Spammer" should allow committers to lock an account and hide all messages and files that he sent, but that still requires somebody to implement it.
I'd expect that to be pretty effective -- like graffiti artists, spammers want their work to be seen, and a site that quickly removes them will not be worth the effort for them.
Unfortunately, the spammers are using automated tools to locate, register on and post to victim sites. The tools are distributed (running on compromised PCs) and massively parallel, so they really don't care that some of their posts are never seen. I'm reluctant to mention the name of one particular tool I'm aware of, but as well as the above, it also has OCR to defeat CAPTCHA, and automatically creates throw-away e-mail accounts with a range of free web-mail providers for registration purposes. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
I'm reluctant to mention the name of one particular tool I'm aware of, but as well as the above, it also has OCR to defeat CAPTCHA, and automatically creates throw-away e-mail accounts with a range of free web-mail providers for registration purposes.
Right. We considered CAPTCHA, but some people were immediately opposed to using it, both for the reason that spammers still get past it in an automated manner, and that it might lock out certain groups of legitimate users. So I have personally given up on that path. Regards, Martin
""Martin v. Löwis"" <martin@v.loewis.de> wrote in message news:46494156.30406@v.loewis.de... |> I'm reluctant to mention the name of one particular tool I'm aware | > of, but as well as the above, it also has OCR to defeat CAPTCHA, and | > automatically creates throw-away e-mail accounts with a range of free | > web-mail providers for registration purposes. | | Right. We considered CAPTCHA, but some people were immediately opposed | to using it, both for the reason that spammers still get past it in | an automated manner, and that it might lock out certain groups of | legitimate users. So I have personally given up on that path. I have not noticed any spam on the very public SF tracker (or have I just missed it?) while I saw some my first visit to our hardly public trial site. Any ideas on why the difference? tjr
"Andrew McNamara" <andrewm@object-craft.com.au> wrote in message news:20070515034743.2030F5CC4B5@longblack.object-craft.com.au... | I'm reluctant to mention the name of one particular tool I'm aware | of, but as well as the above, it also has OCR to defeat CAPTCHA, and How about asking a Python specific question, with answered filled in rather that multiple choice selected: I would be willing to make up a bunch. The initials of Python's founder. ____ The keyword for looping by condition. ____ The char that signals a name-binding statement. ____ (I am intentionally avoiding question words and ? that would signal Test Question to automated software.) If we anticipate users rather than programmers to register (as if so, it would be nice to collect that info to formulate sensible responses), then questions like The orb that shines in the sky during the day. ____ | automatically creates throw-away e-mail accounts with a range of free | web-mail providers for registration purposes. Either don't accept registrations from such accounts (as other sites have done), or require extra verification steps or require approval of the first post. How many current legitimate registered users use such? Terry Jan Reedy
Terry Reedy schrieb:
"Andrew McNamara" <andrewm@object-craft.com.au> wrote in message news:20070515034743.2030F5CC4B5@longblack.object-craft.com.au... | I'm reluctant to mention the name of one particular tool I'm aware | of, but as well as the above, it also has OCR to defeat CAPTCHA, and
How about asking a Python specific question, with answered filled in rather that multiple choice selected: I would be willing to make up a bunch.
The initials of Python's founder. ____ The keyword for looping by condition. ____ The char that signals a name-binding statement. ____ (I am intentionally avoiding question words and ? that would signal Test Question to automated software.)
There are two problems with this: * The set of questions is limited, and bots can be programmed to know them all. * Even programmers might not immediately know an answer, and I can understand them turning away on that occasion (take for example the "name-binding" term).
If we anticipate users rather than programmers to register (as if so, it would be nice to collect that info to formulate sensible responses), then questions like The orb that shines in the sky during the day. ____
| automatically creates throw-away e-mail accounts with a range of free | web-mail providers for registration purposes.
Either don't accept registrations from such accounts (as other sites have done), or require extra verification steps or require approval of the first post. How many current legitimate registered users use such?
This is impossible to find out, I think, since SF.net does not publicly show real e-mail addresses, instead, each user has an alias username@sourceforge.net. Georg
On 15-May-07, at 12:32 AM, Georg Brandl wrote:
There are two problems with this: * The set of questions is limited, and bots can be programmed to know them all.
Sure, but if someone is customizing their bot to python's issue tracker, in all likelyhood they would have to be dealt with specially anyway. Foiling automated bots shoud be the first priority--they should represent the vast majority of cases.
* Even programmers might not immediately know an answer, and I can understand them turning away on that occasion (take for example the "name- binding" term).
It would be hard to make it so easy that anyone with business submitting a bug report should know the answer: What python keyword is used to define a function? What file extension is typically used for python source files? etc. If there is still worry, then a failed answer could simply be the moderation trigger. -Mike
"Georg Brandl" <g.brandl@gmx.net> wrote in message news:f2bnlr$b14$1@sea.gmane.org... | Terry Reedy schrieb: | > How about asking a Python specific question, with answered filled in rather | > that multiple choice selected: I would be willing to make up a bunch. And I would spend longer than a couple of minutes at 3am to do so. | There are two problems with this: | * The set of questions is limited, but unbounded. I would aim for, say, 50 to start | and bots can be programmed to know them all. by hacking into the site? or by repeated failed attempts? Then someone has to answer the questions correctly to put the correct answers in. A lot of work for very temporary access (a day?) to one site. Then maybe I reword the questions or add new ones, so more programming is needed. | * Even programmers might not immediately know an answer, and I can | understand them turning away on that occasion (take for example the "name- | binding" term). I would expect and want review by others, including non-American, non-native-English speakers to weed out unintended obscurities and ambiguities. | > | automatically creates throw-away e-mail accounts with a range of free | > | web-mail providers for registration purposes. | > | > Either don't accept registrations from such accounts (as other sites have | > done), or require extra verification steps or require approval of the first | > post. How many current legitimate registered users use such? | | This is impossible to find out, I think, since SF.net does not publicly show | real e-mail addresses, instead, each user has an alias username@sourceforge.net. If the list of registrants we got from SF does not have real emails, we will need such to validate accounts on our tracker so *it* can send emails. Terry
Terry Reedy schrieb:
"Georg Brandl" <g.brandl@gmx.net> wrote in message news:f2bnlr$b14$1@sea.gmane.org... | Terry Reedy schrieb: | > How about asking a Python specific question, with answered filled in rather | > that multiple choice selected: I would be willing to make up a bunch.
And I would spend longer than a couple of minutes at 3am to do so.
| There are two problems with this: | * The set of questions is limited,
but unbounded. I would aim for, say, 50 to start
| and bots can be programmed to know them all.
by hacking into the site? or by repeated failed attempts?
By requesting a registration form over and over, and recording all questions. A human would then answer them, which is easily done for 50 questions (provided that they are *not* targeted at experienced Python programmers, which shouldn't be done).
Then someone has to answer the questions correctly to put the correct answers in. A lot of work for very temporary access (a day?) to one site.
Assuming that you don't replace all questions after killing the bot account, he can get the access again very easily.
Then maybe I reword the questions or add new ones, so more programming is needed.
| * Even programmers might not immediately know an answer, and I can | understand them turning away on that occasion (take for example the "name- | binding" term).
I would expect and want review by others, including non-American, non-native-English speakers to weed out unintended obscurities and ambiguities.
That's necessary in any case. Georg
Georg Brandl writes:
By requesting a registration form over and over, and recording all questions. A human would then answer them, which is easily done for 50 questions (provided that they are *not* targeted at experienced Python programmers, which shouldn't be done).
We are not going to publish all 50 questions at once. ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast. Use it for a short period of time (days, maybe even weeks). The crucial thing is that questions (or site-specific answers that require reading comprehension to obtain from the page) differ across sites; they must not be shared. Now it's much faster for the human to simply do the registration on the current question, and then point the spambot at the site and vandalize 10,000 or so issues. If we can force them to do that, though, I think we're going to win. (In a "scorched earth" sense, maybe, but the spammers will get burned too.) Note that one crucial aspect is to record the ID of the question that each account authenticated with (at creation, not at login -- the person's password is a different token). Then have a Big Red Switch that hides[1] all data entered by accounts that authenticated with that question. Of course admins only throw the switch on actually seeing the spam, but since all data is associated with a creation token, you can nuke all of it, even if the spammer has had forethought to create multiple accounts with the Question of the Day, with *one* switch. And if they try to save such an account for tomorrow, cool! they're busted right there. You can get smarter than that (ie, by only barring access to data by accounts that touch more than a small number of issues in a short period of time), if you wish, but that should be sufficient unless you're getting dozens of new users during the validity period for a given question. I guess there will need to be a special token, available only to accounts confirmed by admins, to recover accounts for people who happen to have the same "birthday" as a spammer. Footnotes: [1] Ie, requires user action to become visible, and is tagged as "possible spam". This requires a new attribute on data items, and some programming, but since roundup has to recreate the page for every request (even if it caches, it has to do so for every new item; it's not a problem to invalidate the cache and recreate, I bet), I think it's probably not going to require huge amounts of extra effort or changes in the basic design. [2] Probabilistically. If the spammers are cracking your site on average every 10 days, rotate the question every 5 days. 50 questions means protection for most of a year in that case.
-----Original Message----- From: python-dev-bounces+castironpi=comcast.net@python.org [mailto:python- dev-bounces+castironpi=comcast.net@python.org] On Behalf Of Stephen J. Turnbull
ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast.
I can't believe this is still profitable. It's either lucrative or fulfilling, and malice, if the latter.
-----Original Message-----
ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast.
I can't believe this is still profitable. It's either lucrative or fulfilling, and malice, if the latter.
At any rate, it is hardly such an urgent problem that it needs all this brainpower poured into it. And it almost certainly doesn't require novel solutions. Kristján
Kristján Valur Jónsson wrote:
-----Original Message-----
ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast. I can't believe this is still profitable. It's either lucrative or fulfilling, and malice, if the latter.
At any rate, it is hardly such an urgent problem that it needs all this brainpower poured into it. And it almost certainly doesn't require novel solutions.
Possibly so, but I can't see c.l.p.dev passing up the chance to discuss this particular bicycle shed. It gets kind of personal when someone is spamming *your* tracker ... ;-) I have already been criticized on c.l.py for suggesting there should be at least one day of the year when we should be allowed to hang spammers up by the nuts (assuming they have any) - "not very welcoming" was the phrase, IIRC. So maybe I'm no longer rational on this topic. or-any-other-for-that-matter-ly y'rs - steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ----------------
Aaron Brady writes:
ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast.
I can't believe this is still profitable. It's either lucrative or fulfilling, and malice, if the latter.
That's precisely my point. I don't think it is profitable, and therefore at a reasonable expense to us (one of us makes up a question every couple of days) we can make the tracker an unprofitable target for spammers, and probably avoid most spam. There's ample evidence of malicious behavior by spammers who feel threatened or thwarted, though.
-----Original Message----- From: Stephen J. Turnbull [mailto:stephen@xemacs.org] Sent: Wednesday, May 16, 2007 5:10 AM To: Aaron Brady Cc: 'Georg Brandl'; python-dev@python.org Subject: Re: [Python-Dev] Summary of Tracker Issues
Aaron Brady writes:
ISTM you need one only question requiring human attention at a time, because once a spammer assigns a human (or inhuman of equivalent intelligence) to cracking you, you're toast.
I can't believe this is still profitable. It's either lucrative or fulfilling, and malice, if the latter.
That's precisely my point. I don't think it is profitable, and therefore at a reasonable expense to us (one of us makes up a question every couple of days) we can make the tracker an unprofitable target for spammers, and probably avoid most spam.
There's ample evidence of malicious behavior by spammers who feel threatened or thwarted, though.
Can we spam back? /blink/ Click here for free therapy. //blink/
If we anticipate users rather than programmers to register (as if so, it would be nice to collect that info to formulate sensible responses), then questions like The orb that shines in the sky during the day. ____
This question I could not answer, because I don't know what an orb is (it's not an object request broker, right?) Is the answer "sun"? Regards, Martin
>> The orb that shines in the sky during the day. ____ Martin> This question I could not answer, because I don't know what an Martin> orb is (it's not an object request broker, right?) Martin> Is the answer "sun"? It is indeed. I would use "star" instead of "orb". It might be reasonable to have a translate the questions into a handful of other languages and let the user select the language. Skip
skip@pobox.com writes:
>> The orb that shines in the sky during the day. ____
Martin> This question I could not answer, because I don't know what an Martin> orb is (it's not an object request broker, right?)
Martin> Is the answer "sun"?
It is indeed. I would use "star" instead of "orb".
And what happens if the user writes "the sun"? Everyday knowledge is pretty slippery.
It might be reasonable to have a translate the questions into a handful of other languages and let the user select the language.
Since English is the common language used in the community, I think a better source of questions would be the English language itself, such as: How many words are in the question on this line? ___ten___ John threw a ball at Mark. Who threw it? __John___ John was thrown a ball by Mark. Who threw it? __Mark___ I think most human readers able to use the tracker would be able to handle even the passive "was thrown" construction without too much trouble. We could also use easy "reading comprehension" questions, say from the Iowa achievement test for 11-year-olds. :-) Or even the SAT (GMAT, LSAT); there must be banks of practice questions for those. (Copyright might be a problem, though. Any fifth-grade teachers who write drill programs for their kids out there?) You could also have the user evaluate a simple Python program fragment. Probably it should contain an obvious typo or two to foil a program that evals it. It would be sad if somebody who could write a program to handle any of those couldn't find a better job than working for spammers ....
""Martin v. Löwis"" <martin@v.loewis.de> wrote in message news:464A29E2.1060207@v.loewis.de... |> If we anticipate users rather than programmers to register (as if so, it | > would be nice to collect that info to formulate sensible responses), then | > questions like | > The orb that shines in the sky during the day. ____ | | This question I could not answer, because I don't know what an orb is | (it's not an object request broker, right?) Ugh. The sort of reason why I would want review both by myself when not half asleep and by others ;-) | Is the answer "sun"? Yes. Skip is right about 'star' My underlying point: seeing porno spam on the practice site gave me a bad itch both because I detest spammers in general and because I would not want visitors turned off to Python by something that is completely out of place and potentially offensive to some. So I am willing to help us not throw up our hands in surrender. Terry Jan Reedy
Terry Reedy wrote:
My underlying point: seeing porno spam on the practice site gave me a bad itch both because I detest spammers in general and because I would not want visitors turned off to Python by something that is completely out of place and potentially offensive to some. So I am willing to help us not throw up our hands in surrender.
Typically spammers don't go through the effort to do a custom login script for each different site. Instead, they do a custom login script for each of the various software applications that support end-user comments. So for example, there's a script for WordPress, and one for PHPNuke, and so on. For applications that allow entries to be added via the web, the solution to spam is pretty simple, which is to make the comment submission form deviate from the normal submission process for that package. For example, in WordPress, you could rename the PHP URL that posts a comment to an article to a non-standard name. The spammer's script generally isn't smart enough to figure out how to post based on an examination of the page, it just knows that for WordPress, the way to submit comments is via a particular URL with particular params. There are various other solutions. The spammer's client isn't generally a full browser, it's just a bare HTTP robot, so if there's some kind of Javascript that is required to post, then the spammer probably won't be able to execute it. For example, you could have a hidden field which is a hash of the bug summary line, calculated by the Javascript in the web form, which is checked by the server. (For people who have JS turned off, failing the check would fall back to a captcha or some other manual means of identification.) Preventing spam that comes in via the email gateway is a little harder. One method is to have email submissions mail back a confirmation mail which must be responded to in some semi-intelligent way. Note that this confirmation step need only be done the first time a new user submits a bug, which can automatically add them to a whitelist for future bug submissions. -- Talin
Talin <talin@acm.org> wrote:
Terry Reedy wrote:
My underlying point: seeing porno spam on the practice site gave me a bad itch both because I detest spammers in general and because I would not want visitors turned off to Python by something that is completely out of place and potentially offensive to some. So I am willing to help us not throw up our hands in surrender.
There are various other solutions. The spammer's client isn't generally a full browser, it's just a bare HTTP robot, so if there's some kind of Javascript that is required to post, then the spammer probably won't be able to execute it. For example, you could have a hidden field which is a hash of the bug summary line, calculated by the Javascript in the web form, which is checked by the server. (For people who have JS turned off, failing the check would fall back to a captcha or some other manual means of identification.)
I'm not sure how effective the question/answer stuff is, but a bit of javascript seems to be a good idea. What has also worked on a phpbb forum that I admin is "Stop Spambot Registration". As the user is registering, it tells them not enter in any profile information when they are registering, that they should do that later. Anyone who enters any profile information is flagged as a spammer, their registration rejected, and I get an email (of the 35 rejections I've received, none have been legitimate users, and only one smart spambot got through, but he had a drug-related name and was easy to toss). If we include fake profile entries during registration that we tell people not to fill in (like 'web page', 'interests', etc.), we may catch some foolish spambots. Of course there is the other *really* simple option of just renaming registration form entry names. Have a 'username' field, but make it hidden and empty by default, rejecting registration if it is not empty. The real login form name could be generated uniquely for each registration attempt, and verified against another hidden form with minimal backend database support. While it would only take a marginally intelligent spambot to defeat it, it should thwart the stupid spambots. - Josiah
On Wed, May 16, 2007, Josiah Carlson wrote:
I'm not sure how effective the question/answer stuff is, but a bit of javascript seems to be a good idea.
Just for the record (and to few people's surprise, I'm sure), I am entirely opposed to any use of JavaScript. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
On Thursday 17 May 2007, Aahz wrote:
On Wed, May 16, 2007, Josiah Carlson wrote:
I'm not sure how effective the question/answer stuff is, but a bit of javascript seems to be a good idea.
Just for the record (and to few people's surprise, I'm sure), I am entirely opposed to any use of JavaScript.
What about flash, instead, then? /ducks -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
Typically spammers don't go through the effort to do a custom login script for each different site. Instead, they do a custom login script for each of the various software applications that support end-user comments. So for example, there's a script for WordPress, and one for PHPNuke, and so on.
In my experience, what you say is true - the bulk of the spam comes via generic spamming software that has been hard-coded to work with a finite number of applications. However - once you knock these out, there is still a steady stream of what are clearly human generated spams. The mind boggles at the economics or desperation that make this worthwhile. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Andrew McNamara wrote:
Typically spammers don't go through the effort to do a custom login script for each different site. Instead, they do a custom login script for each of the various software applications that support end-user comments. So for example, there's a script for WordPress, and one for PHPNuke, and so on.
In my experience, what you say is true - the bulk of the spam comes via generic spamming software that has been hard-coded to work with a finite number of applications.
However - once you knock these out, there is still a steady stream of what are clearly human generated spams. The mind boggles at the economics or desperation that make this worthwhile.
Actually, it doesn't cost that much, because typically the spammer can trick other humans into doing their work for them. Here's a simple method: Put up a free porn site, with a front page that says "you must be 18 or older to enter". The page also has a captcha to verify that you are a real person. But here's the trick: The captcha is actually a proxy to some other site that the spammer is trying to get access to. When the human enters in the correct word, the spammer's server sends that word to the target site, which result in a successful login/registration. Now that the spammer is in, they can post comments or whatever they need to do. -- Talin
However - once you knock these out, there is still a steady stream of what are clearly human generated spams. The mind boggles at the economics or desperation that make this worthwhile.
Actually, it doesn't cost that much, because typically the spammer can trick other humans into doing their work for them.
Here's a simple method: Put up a free porn site, with a front page that says "you must be 18 or older to enter". The page also has a captcha to verify that you are a real person. But here's the trick: The captcha is actually a proxy to some other site that the spammer is trying to get access to. When the human enters in the correct word, the spammer's server sends that word to the target site, which result in a successful login/registration. Now that the spammer is in, they can post comments or whatever they need to do.
Yep - I was aware of this trick, but the ones I'm talking about have also got through filling out questionnaires, and whatnot. Certainly the same technique could be used, but my suspicion is that real people are being paid a pittance to sit in front of a PC and spam anything that moves. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
* Andrew McNamara <andrewm@object-craft.com.au> [2007-05-17 15:30:43 +1000]:
technique could be used, but my suspicion is that real people are being paid a pittance to sit in front of a PC and spam anything that moves.
http://www.mturk.com/mturk/welcome Complete simple tasks that people do better than computers. And, get paid for it. Learn more. Choose from thousands of tasks, control when you work, and decide how much you earn. -- mithrandi, i Ainil en-Balandor, a faer Ambar
On Wed, 2007-05-16 at 22:17 -0700, Talin wrote:
Here's a simple method: Put up a free porn site [...]
Is it known that someone actually implemented this? It's a neat trick, but as far as I know, it started as a thought experiment of what *could* be done to fairly easily defeat the captchas, as well as all other circumvention methods that make use of human intelligence.
On 5/17/07, Hrvoje Nikšić <hrvoje.niksic@avl.com> wrote:
On Wed, 2007-05-16 at 22:17 -0700, Talin wrote:
Here's a simple method: Put up a free porn site [...]
Is it known that someone actually implemented this? It's a neat trick, but as far as I know, it started as a thought experiment of what *could* be done to fairly easily defeat the captchas, as well as all other circumvention methods that make use of human intelligence.
I don't have hard data but it's been related to me as true by Googlers who should have first-hand experience. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Hrvoje Niksic wrote:
On Wed, 2007-05-16 at 22:17 -0700, Talin wrote:
Here's a simple method: Put up a free porn site [...]
Is it known that someone actually implemented this?
I moderate a discussion forum which was abused with this exact attack. At the time, it was a phpBB forum which had the standard graphical captcha. After switching to a different forum package, the attacks went away. I will assume because (as it has been said) it was no longer a well-known and common interface. However, it may also be because instead of using a graphic (which is easily transplanted to another page), it uses ascii art which would require more effort to extract and move to another page. -Scott -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
* Scott Dial <scott+python-dev@scottdial.com> [2007-05-17 11:04:46]:
However, it may also be because instead of using a graphic (which is easily transplanted to another page), it uses ascii art which would require more effort to extract and move to another page.
Another approach would be a 'text scrambler' logic: You can aclltauy srlbcame the quiotesn psneeetrd wchih only a hmuan can uetrnnadsd pperlory. The quiotesn ovubolsiy slouhd be a vrey vrey slmipe one. And you can hvae a quiotesn form the quiotesn itslef. Site: What is the futorh word of tihs scnnteee? Answer: fourth. Site: You are intelligent, I shall allow you. -- O.R.Senthil Kumaran http://uthcode.sarovar.org
O.R.Senthil Kumaran schrieb:
* Scott Dial <scott+python-dev@scottdial.com> [2007-05-17 11:04:46]:
However, it may also be because instead of using a graphic (which is easily transplanted to another page), it uses ascii art which would require more effort to extract and move to another page.
Another approach would be a 'text scrambler' logic:
You can aclltauy srlbcame the quiotesn psneeetrd wchih only a hmuan can uetrnnadsd pperlory. The quiotesn ovubolsiy slouhd be a vrey vrey slmipe one.
And you can hvae a quiotesn form the quiotesn itslef.
Site: What is the futorh word of tihs scnnteee?
Answer: fourth.
Site: You are intelligent, I shall allow you.
Please bear in mind that non-native speakers who don't have had much exposure to the English language should be able to solve this problem too. I doubt that is the case for the kind of challenge you propose. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
* Greg Ewing <greg.ewing@canterbury.ac.nz> [2007-05-18 13:06:41]:
Site: What is the futorh word of tihs scnnteee? Answer: fourth.
Are you sure it isn't "futorh"?-)
:-) My idea was, a human got to answer it unscrambled as 'fourth' as he "understands" what the question is and gives the proper answer. Agreed, there could be confusion at first. For non-native speakers of English, this could be difficult if their experience with English is less, but we will have to take a chance that anyone capable of reading english should be able to figure it out. Again these are my thoughts and I dont have a good data to prove it. Implementation standpoint, this is one of the easiest I can think of. Thanks, -- O.R.Senthil Kumaran http://uthcode.sarovar.org
O.R.Senthil Kumaran writes:
:-) My idea was, a human got to answer it unscrambled as 'fourth' as he "understands" what the question is and gives the proper answer.
Agreed, there could be confusion at first.
But for any given user, there's only going to be a first. Either they pass the test the first time and after that authenticate via personal password, or they say WTF!! In that case we could lose all the bug reports they were ever going to write. If we're going to do CAPTCHA, what we're looking for is something that any 4 year old does automatically, but machines can't do at all. Visual recognition used to be one, but isn't any more. The CAPTCHA literature claims that segmentation still is (dividing complex images into letters), but that's nontrivial for humans, too, and I think that machines will eventually catch up. (Ie, within a handful of months.) I think it would be better to do content. URLs come to mind; without something clickable, most commercial spam would be hamstrung. But few bug reports and patches need to contain URLs, except for specialized local ones pointing to related issues. For example, how about requiring user interaction to display any post containing an URL, until an admin approves it? Or you could provide a preview containing the first two non-empty lines not containing an URL. This *would* be inconvenient for large attachments and other data where the reporter prefers to provide an URL rather than the literal data, but OTOH only people who indicate they really want to see spam would see it. ;-)
"Stephen J. Turnbull" <stephen@xemacs.org> wrote in message news:87lkfm8sds.fsf@uwakimon.sk.tsukuba.ac.jp... | I think it would be better to do content. URLs come to mind; without | something clickable, most commercial spam would be hamstrung. But | few bug reports and patches need to contain URLs, except for | specialized local ones pointing to related issues. A bug is a disparity between promise and performance. Promise is often best demonstrated by a link to the relevant section of the docs. Doc patches should also contain a such a link. So doc references should be included with local (to tracker) links and not filtered on. | For example, how about requiring user interaction to display any post | containing an URL, until an admin approves it? Why not simply embargo any post with an off-site link? Tho there might have been some, I can't remember a single example of such at SF. Anybody posting such could certainly understand "Because this post contains an off-site link, it will be embargoed until reviewed to ensure that it is legitimate." | Or you could provide a preview containing the first two non-empty lines | not containing an URL. | This *would* be inconvenient for large attachments and other | data where the reporter prefers to provide an URL rather than the | literal data, but OTOH only people who indicate they really want to | see spam would see it. ;-) I don't get this, but it sounds like more work than simple embargo. I think html attachments should also be embargoed (I believe this is what I saw a couple of months ago.) And perhaps the account uploading an html file. Terry Jan Reedy
On 5/18/07, Terry Reedy <tjreedy@udel.edu> wrote:
"Stephen J. Turnbull" <stephen@xemacs.org> wrote in message news:87lkfm8sds.fsf@uwakimon.sk.tsukuba.ac.jp... | I think it would be better to do content. URLs come to mind; without | something clickable, most commercial spam would be hamstrung. But | few bug reports and patches need to contain URLs, except for | specialized local ones pointing to related issues.
A bug is a disparity between promise and performance. Promise is often best demonstrated by a link to the relevant section of the docs. Doc patches should also contain a such a link. So doc references should be included with local (to tracker) links and not filtered on.
| For example, how about requiring user interaction to display any post | containing an URL, until an admin approves it?
Why not simply embargo any post with an off-site link? Tho there might have been some, I can't remember a single example of such at SF. Anybody posting such could certainly understand "Because this post contains an off-site link, it will be embargoed until reviewed to ensure that it is legitimate."
| Or you could provide a preview containing the first two non-empty lines | not containing an URL. | This *would* be inconvenient for large attachments and other | data where the reporter prefers to provide an URL rather than the | literal data, but OTOH only people who indicate they really want to | see spam would see it. ;-)
I don't get this, but it sounds like more work than simple embargo.
I think html attachments should also be embargoed (I believe this is what I saw a couple of months ago.) And perhaps the account uploading an html file.
If you guys want to see any of this happen please take this discussion over to the tracker-discuss mailing list. -Brett
Terry Reedy writes:
Why not simply embargo any post with an off-site link? Tho there might have been some, I can't remember a single example of such at SF.
Fine by me; if it doesn't happen often, then embargoing them would be fine. My occasional experience with distro reporting processes shows that they happen a fair amount there (often as a reference to an upstream or downstream bug report). The major ones can probably be special-cased easily as needed.
I don't get [the short preview idea], but it sounds like more work than simple embargo.
It would be. It is a use-case that according to your explanation doesn't apply to Python's tracker, so a YAGNI until proved otherwise.
I think html attachments should also be embargoed (I believe this is what I saw a couple of months ago.) And perhaps the account uploading an html file.
Sounds OK to me, except that there are some modules that handle HTML (and XML? can that be practically distinguished from HTML?), and I would suppose people would want upload examples and test cases.
Terry Reedy wrote:
Why not simply embargo any post with an off-site link? Tho there might have been some, I can't remember a single example of such at SF.
I have often posted links off-site because the SF tracker didn't allow unrelated parties to attach things. I don't know whether the new tracker will allow that, but if it doesn't, you will again see links off-site. -Scott -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
-----Original Message----- From: python-dev-bounces+castironpi=comcast.net@python.org [mailto:python- dev-bounces+castironpi=comcast.net@python.org] On Behalf Of Stephen J. Turnbull Sent: Friday, May 18, 2007 3:10 AM To: python-dev@python.org Subject: Re: [Python-Dev] Summary of Tracker Issues
O.R.Senthil Kumaran writes:
:-) My idea was, a human got to answer it unscrambled as 'fourth' as he "understands" what the question is and gives the proper answer.
Agreed, there could be confusion at first.
password, or they say WTF!! In that case we could lose all the bug reports they were ever going to write.
That's bad.
If we're going to do CAPTCHA, what we're looking for is something that any 4 year old does automatically, but machines can't do at all. Visual recognition used to be one, but isn't any more. The CAPTCHA literature claims that segmentation still is (dividing complex images into letters), but that's nontrivial for humans, too, and I think that machines will eventually catch up. (Ie, within a handful of months.)
Complex backgrounds used? Colorful foreground on a interior decorating background. Also gradient foreground, gradient background.
I think it would be better to do content. URLs come to mind; without something clickable, most commercial spam would be hamstrung. But few bug reports and patches need to contain URLs, except for specialized local ones pointing to related issues.
For example, how about requiring user interaction to display any post containing an URL, until an admin approves it? Or you could provide a preview containing the first two non-empty lines not containing an URL. This *would* be inconvenient for large attachments and other data where the reporter prefers to provide an URL rather than the literal data, but OTOH only people who indicate they really want to see spam would see it. ;-)
Block spam or hide? Maybe a reader is what you want. "Posting a URL requires heavier spam-proofing. Click here to authenticate yourself." Takes you to ours- the PL question.
"Aaron Brady" <castironpi@comcast.net> wrote:
"Stephen J. Turnbull" <stephen@xemacs.org> wrote:
If we're going to do CAPTCHA, what we're looking for is something that any 4 year old does automatically, but machines can't do at all. Visual recognition used to be one, but isn't any more. The CAPTCHA literature claims that segmentation still is (dividing complex images into letters), but that's nontrivial for humans, too, and I think that machines will eventually catch up. (Ie, within a handful of months.)
Complex backgrounds used? Colorful foreground on a interior decorating background.
Also gradient foreground, gradient background.
Captchas like this are easily broken using computational methods, or even the porn site trick that was already mentioned. Never mind Stephen's stated belief, that you quoted, that he believes that even the hard captchas are going to be beaten by computational methods soon. Please try to pay attention to previous posts. - Josiah As an aside, while the '4 year old can do it' is a hard qualification to meet, add 10 years and there exists a fairly sexist method (-), that can be subjective (-), that seems to work quite well (+), but requires javascript (-); the 'hot or not' captcha. It fetches 9 random pictures from hot or not (hopefully changes their file names) and asks the user to pick the 4 hottest of the 9. A variant exists that asks "choose the 4 horses" or "select all of the iguanas", but it requires an ever-evolving number of tagged input images (which is why hot or not works so well as a source).
Josiah Carlson wrote:
Captchas like this are easily broken using computational methods, or even the porn site trick that was already mentioned. Never mind Stephen's stated belief, that you quoted, that he believes that even the hard captchas are going to be beaten by computational methods soon. Please try to pay attention to previous posts.
I think people are trying too hard here - in other words, they are putting more of computational science brainpower into the problem than it really merits. While it is true that there is an arms race between creators of social software applications and spammers, this arms race is only waged the largest scales - spammers simply won't spend the effort to go after individual sites, its not cost effective, especially when there are much more lucrative targets. Generally, sites are only vulnerable when they have a comment submission interface that is identical to thousands of other sites. All that one needs to do on the web side is to make the submission process slightly idiosyncratic compared to other sites. If one wants to put in extra effort, you can change the comment submission process on a regular basis. The real issue is comment submission via email, which I believe RoundUp supports (although I don't know if it's enabled for the Python tracker.) Because there's very little that you can do to "customize" an email submission interface (you have to work with standard email clients after all). Do we know how these spam comments entered the system? There's no point in spending any thought securing the web interface if the comments were submitted via email. And has there been any spam submitted since that point? If we're talking less than one spam a week on average, then this is all a moot point, its less effort for someone to just manually delete it than it is to come up with an automated system. -- Talin
Talin wrote:
Josiah Carlson wrote:
Captchas like this are easily broken using computational methods, or even the porn site trick that was already mentioned. Never mind Stephen's stated belief, that you quoted, that he believes that even the hard captchas are going to be beaten by computational methods soon. Please try to pay attention to previous posts.
I think people are trying too hard here - in other words, they are putting more of computational science brainpower into the problem than it really merits. While it is true that there is an arms race between creators of social software applications and spammers, this arms race is only waged the largest scales - spammers simply won't spend the effort to go after individual sites, its not cost effective, especially when there are much more lucrative targets.
[clip]
And has there been any spam submitted since that point? If we're talking less than one spam a week on average, then this is all a moot point, its less effort for someone to just manually delete it than it is to come up with an automated system.
I was thinking the same thing. Once we start using it, any spam that does get though won't stay there very long. At most maybe half a day, but likely only an hour or two. (or less) If it becomes a frequent problem, then it is the time to put the brain cells to work on this. So far we've only had one instance over how long? Lets spend the brain power on getting it up and running first. Cheers, Ron
Do we know how these spam comments entered the system?
Through the web site. Submission through email is not an issue: you need to use a registered email address, and those are hard to guess.
And has there been any spam submitted since that point?
One day after the tracker was renamed to bugs.python.org, there were 10 spam submissions, and new spam was entered at a high rate. We then added some anti-spam measures, and now new spam is added very infrequently. The real problem now is that people panic when they see spam in the tracker, demanding all kinds of immediate action, and wondering what bastards let the spam in in the first place. Regards, Martin
Talin <talin@acm.org> wrote:
Josiah Carlson wrote:
Captchas like this are easily broken using computational methods, or even the porn site trick that was already mentioned. Never mind Stephen's stated belief, that you quoted, that he believes that even the hard captchas are going to be beaten by computational methods soon. Please try to pay attention to previous posts.
I think people are trying too hard here - in other words, they are putting more of computational science brainpower into the problem than it really merits. While it is true that there is an arms race between creators of social software applications and spammers, this arms race is only waged the largest scales - spammers simply won't spend the effort to go after individual sites, its not cost effective, especially when there are much more lucrative targets.
My point was that spending time to come up with a "better" captcha in attempt to thwart spammers was ill advised, in particular because others brought up varous reasons why captchas weren't the way to go. - Josiah
talin> While it is true that there is an arms race between creators of talin> social software applications and spammers, this arms race is only talin> waged the largest scales - spammers simply won't spend the effort talin> to go after individual sites, its not cost effective, especially talin> when there are much more lucrative targets. The advantage of choosing a couple simple topical questions means that in theory, every Roundup installation can create a site-specific set of questions. If each site builds a small database of 10 or so questions then chooses two or three at random for each submission, it seems that would make Roundup, a very challenging system to hack in this regard. It would also likely be tough to use the porn site human proxy idea as well since questions will (or ought to be) topical (what is the power operator?, what does the "E" in R. E. Olds stand for?), not general (what star shines during the day? what day preceeds Monday?) Skip
Talin wrote:
Here's a simple method: Put up a free porn site, with a front page that says "you must be 18 or older to enter". The page also has a captcha to verify that you are a real person. But here's the trick: The captcha is actually a proxy to some other site that the spammer is trying to get access to.
The "python-related question" technique would probably be somewhat resistant to this, as your average porn surfer probably doesn't know anything about Python. (At least until CP4E takes off and everyone knows Python...) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
My underlying point: seeing porno spam on the practice site gave me a bad itch both because I detest spammers in general and because I would not want visitors turned off to Python by something that is completely out of place and potentially offensive to some. So I am willing to help us not throw up our hands in surrender.
Would that help go so far as to provide patches to the roundup installation? Regards, Martin
participants (23)
-
"Martin v. Löwis"
-
Aahz
-
Aaron Brady
-
Andrew McNamara
-
Anthony Baxter
-
Brett Cannon
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Hrvoje Nikšić
-
Josiah Carlson
-
Kristján Valur Jónsson
-
Mike Klaas
-
O.R.Senthil Kumaran
-
Ron Adam
-
Scott Dial
-
skip@pobox.com
-
Stephen J. Turnbull
-
Steve Holden
-
Talin
-
Terry Reedy
-
Tracker
-
Tristan Seligmann