"Turing test" to reject email harvesting bots

Hi mailman-users!
I wrote a "turing test" that keeps e.g. email harvesting bots off:
http://bksys.at/bernhard/img/turing.php?who=agent%20smith source: http://bksys.at/bernhard/img/turing.txt
Since it is trivial to collect "high quality" email addresses form mailman lists even if they are only available to the members I'd like such a turing test be in mailman. Either before the "view subscribers" page or as part of the subscription process.
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%22Click+here+for+the+list%22+%22batched+in+a+daily+digest%22&btnG=Google+Search yields 24000 hits. http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%22The+subscribers+list+is+only+available+to+the+list+members yields 40000 hits.
The text image generation is a little CPU intensive (2 s on a 1800 MHz P4) so some measures may be desirable to prevent DoS attacks by flood requests of the images. E.g. Put the test after receival of the subscription email cookie was returned.
This test may disable users of non graphical web browers or email only subscribers to subscribe.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

- On 2003.11.16, in <3FB78C1A.5080702@gmx.at>,
This test may disable users of non graphical web browers or email only subscribers to subscribe.
I've generally found that encoding the address as HTML character entities works fine. I've had a bait address on my web page for quite some time, and it's never received any spam. It's readable to text browsers, it doesn't affect readability with unexpected font sizes, it takes little computation, and it's trivial to write.
My address would become: dgc@uchicago.edu
I've long favored this approach where display of mail addresses offends people.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

David Champion wrote:
While this approach may work in individual cases it is trivially and quite likely defeatet when the prize is 100,000 mailing lists with 1 to 5 million (!) high quality email addresses.
Since your answer is the only one and the problem does not appear to be addressed sufficiently I wrote an example exploit program that finds mailman lists and harvests their email addresses. After about 20 minutes it collected about 30.000 email addresses: http://bksys.at/bernhard/30,000%20email%20addresses.gz
The program can be further improved. It can be a little parallelized. It can check a site for further mailing lists (the admin overview has a more complete list than the listinfo overview). And it can be made to subscribe to mailing lists where the member list is only available to the list members.
If you think the problem is worth fixing please estimate how long it will take and I will wait a reasonable time for a fix before I post the problem and the exploit code to bugtraq. Otherwise I will post to bugtraq in about 1 week.
Here is the exploit code:
#!/usr/bin/perl -w
$n=0;
$u=0;
for ($i=0;1;$i+=10) {
$#urls=-1;
$google=lynx --dump 'http://www.google.com/search?q=%22Click+here+for+the+list%22+%22batched+in+a+daily+digest%22&start=$i'
;
# print $google;
@urls=$google=~/cache:.{12}:(.*?)\+%22/g;
if ($#urls==-1) {last;}
# print join("\n",@urls);
# print "\naoeu $#urls\n";
foreach $url (@urls) {
$u++;
$url=~s*/listinfo/*/roster/*;
print "$url...\n";
$roster=`lynx -connect_timeout=10 -dump $url`;
# print $roster;
@mails=$roster=~/^ +\* \(?\[\d+\](.* at
.*?)\)?$/mgo; foreach $mail (@mails) { $mail=~s/ at /@/; print "$mail\n"; $n++; } print "mails=".($#mails+1).", total=$n, url=$u, google=$i\n"; # exit; } #foreach url
} #while google
Have a nice day, Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

Barry Warsaw wrote:
Thank you. I'd appreciate being informed about the progress.
I want to remind you about my graphical turing test I proposed as solution:
http://mail.python.org/pipermail/mailman-developers/2003-November/016082.htm...
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Nov 25, 2003, at 11:03 AM, Barry Warsaw wrote:
Fails ADA and accessibility requirements badly. I'd argue against any
solution that fails such basic needs without any real way to fix it.
Better is to simply teach the archives not to distribute sensitive
information at all. And a lot easier to implement, actually.

On Tue, Nov 25, 2003 at 11:07:39AM -0800, Chuq Von Rospach wrote:
Fails ADA and accessibility requirements badly. I'd argue against any
solution that fails such basic needs without any real way to fix it.
What about reverse turing tests that aren't graphics-based? It's easier to beat "What is the sum of three and fifteen?" or "what is the name of this mailing list?" text-tests than the more complex RTTs, but it would make exploit code that much harder to write without sacrificing users who can't, for example, view graphics or hear sounds.
Better is to simply teach the archives not to distribute sensitive
information at all. And a lot easier to implement, actually.
So, is anyone working on this *within* pipermail? I know there are great alternative archivers out there, but Mailman still winds up with a bad reputation if the default isn't very secure. Maybe for 2.2 we could have a "completely obscure archived email addresses" option which changed them all to user@xxxxxx.

On Nov 27, 2003, at 9:08 AM, Terri Oda wrote:
if it can be made accessible, I have no problem with it. But I think it's solving the wrong problem, because the data is still accessible to a motivated person. you're not fixing the issue, simply raising the bar and hoping they give up. It won't stop the spammer who hires a dozen temps to surf web sites and authenticate their bots through, right?
So the REAL answer, IMHO, is to not make that information available. Cloak it programattically. If you want to have an authenticated mode for subscribers to keep it uncloaked, that's fine by me. But the public archives should simply recognize significant pieces of information and elide them from the output. That way, no matter what a spammer or other nasty person does, they can't get the information. It's not there.
my definition of significant pieces of information: email addresses, phone numbers, social security nubmers (and if there are global equivalents to this US number, those, too). Simply replace them in the text with [[email address omitted]] as you deliver the archive. Then you stop playing this arms war with spambots completely, by removing the target they're after. No need to, two years from now, rip out the work you did and come up with a new temporary fix because spammers got around to implementing new OCR techniques.
Remember challenge/response? When everyone thought it was the solution to all of our problems? Took the spammers under six weeks to crack it once they decided to try. (answer: send spam as being "From:" you, "To:" you. Most C/R systems have the user's email address whitelisted. end of story.
that would be the answer, or throw it out (I'm not a huge fan of pipermail; it's only advantage to mailman is it's written in Python) and do something else. Or leave pipermail alone, and write a CGI that all archives exit through that does the filtering, which is IMHO, how you ought to do it. That way, you can authenticate via that CGI to a level of access, change the filtering on the fly, and leave the archives unedited (as I think they ought to be).

On Thu, Nov 27, 2003 at 09:17:33AM -0800, Chuq Von Rospach wrote:
Of course. We should remember that *that's* the reason not to do turing tests.
Incidentally, I don't advocate their use either --I think a CGI address-eater as Chuq describes is probably is the way to go, and given more time I'd even write it-- but I wanted to make sure they weren't dismissed out of hand for technical reasons.
Terri

On Nov 27, 2003, at 9:52 AM, Terri Oda wrote:
It's a great example of people solving problems before they actually define them, and throwing resources at symptoms, not really solving what's at root cause.
Now sometimes you have no alternative than a continuing arms race of escalation, like in the current spam/anti-spam wars. But it's always useful to sit back and see if you can figure out what the real problem is and whether you can circumvent it at a basic level and not just run around patching the latest version of it.
And it's also important to not over-fix a problem. After all, there's still nothing stopping spammers from simply subscribing to mailing lists and harvesting addresses from postings directly, other than it's simply easier and more anonymous to grab archives. So don't waste time OVER-securing the archives, since that just leads to a false sense of security anyway. If you really want to secure this, you'll have to tear down mailman to square one, and re-engineer it to obscure mail addresses on all traffic, and replace them with mapped addresses that forward through the server. that means all 1to1 traffic (replies, etc) also need to travel through the server, and effectively, Mailman starts becoming an anonymous remailer type of beast as well as a mail server. Which creates a whole new class of problems while solving this one...
(and yes, that's actually a design paradigm I'm noodling on, in what little time I have to noodle right now.)

On Thu, 2003-11-27 at 12:08, Terri Oda wrote:
No one's working on it AFAIK, but I agree that this is the right approach. I'm not sure how to go about this within the Mailman 2.1 series though, because currently only the private archives are accessed programmatically. That may be a good first step though -- add the obscuring stuff to the private archive cgi and then if that works out well, provide a way to make a public archive vend through the private archive cgi (one way: enable private archives with no password). It's still arguably a new feature, but perhaps we could sneak it in as a bug fix.
-Barry

On Fri, 2003-11-28 at 06:08, Terri Oda wrote:
On the copy of Mailman I run here, I just went though Mailman/Archiver/HyperArch.py and replaced all the occurances of re.sub('@', _(' at ') with re.sub(r'([\w\.-]+@.)[\w\.-]+', r'\1...' which achieves a similar effect with ARCHIVER_OBSCURES_EMAILADDRS turned on.
(then you just need to add an ACL to the webserver to stop someone downloading the listname.mbox file that has all the unmunged addresses still in it)
-- Colin Palmer <colinp@waikato.ac.nz> University of Waikato, ITS Division

On Nov 27, 2003, at 2:26 PM, Colin Palmer wrote:
which is a no-op, since spambot's learned how to de-obfuscate that stuff years ago. False sense of security. All it really does is make it more difficult for people reading it, not the computers harvesting it.

On Fri, 2003-11-28 at 06:26, Colin Palmer wrote:
I'd consider turning this off for 2.1.4 if people agree. Perhaps making it available only through a site config var. I'm not sure how easy that is, but it seems important enough to close off access to the mbox file.
Opinions? -Barry

On Thursday 27 November 2003 11:05 pm, Barry Warsaw wrote:
I'd prefer it gone. If someone needs it badly enough and they can convince me, I can make it available by some other method.
--
"The true measure of a man is how he treats someone who can do him absolutely no good."
- Samuel Johnson (1709-1784)

- On 2003.11.27, in <1069992315.19968.8.camel@anthem>,
- "Barry Warsaw" <barry@python.org> wrote:
I *really* value this ability, but I understand the arguments for not making it downloadable. How hard would it be to avail it to subscribers, but to restrict it to anonymous accessors? And would that be sufficient for most people?
And while on the topic, I'd like to see munging in the anonymous filter and original text in the authenticated filter, too, as someone else has described.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

On Fri, 2003-11-28 at 20:39, David Champion wrote:
Probably more than I can realistically do for 2.1.4. However I did implement the ability for the site admin to turn off public mboxes. You'll see it after I catch up on all my turkey-induced hacking.
-Barry

On Fri, 2003-11-28 at 17:05, Barry Warsaw wrote:
Maybe just have ARCHIVE_TO_MBOX default to 0?
I deliberately want Mailman to keep creating mbox archives in case I want to regenerate the list archives completely using a newer version of HyperArch, or switch to something else entirely, I just don't want to offer them for download, so having them created outside of /pipermail/ if they are turned on would be nice, but not an urgent thing since it's easy enough to block access at the webserver.
-- Colin Palmer <colinp@waikato.ac.nz> University of Waikato, ITS Division

- On 2003.11.25, in <3FC39580.2020605@gmx.at>,
- "Bernhard Kuemel" <darsie@gmx.at> wrote:
You have way too much time on your hands.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

David Champion wrote:
You are absolutely right. Anyone has a (paid) (Linux) job for me?
Also - sorry for being annoying, but I try to keep the spam down and when the graphical turing test I offered to block email harvesting bots was just turned away I was a little disappointed. Well, maybe I am wrong and this is not really an issue, then sit back and relax. Maybe my contribution will also be rejected from bugtraq. But I feel millions of email addresses are worth a bit more security/privacy.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On 25 Nov 2003, at 17:46, Bernhard Kuemel wrote:
I am just a spectator but this doesn't look like a major contribution
to the Open Source movement by you.
As a way of getting your code and ideas adopted it is one hell of an
approach.
A better approach might be to work up a patch for the current Mailman
release that will demonstrably function in practice (how are we going
to manage all those images your original "Turing test" proposal will
lead to) and submit that like any other contributor. You can program in
Perl so using Python should be a snap for a clever fellow like you.
But I confess if it were for me to decide on a response to your
threats, which it is not, I'd say sex and travel fits the bill.
There's irony for you.
Richard Barrett http://www.openinfo.co.uk

Richard Barrett wrote:
You are right. It is a small contribution. I also filed a PHP bug today. Another small contribution. Makes 2 today. But not all days are as productive as this one.
As a way of getting your code and ideas adopted it is one hell of an
approach.
Well, I'm not sure if a graphical turing test makes up for the drawbacks I mentioned so I'm not sure it will make it to mailman. But I'm glad that the email harvesting problem get's some attention now.
It would probably be more efficient if some who are familiar with the mailman code fixed its "security flaws". Also we first need to find out what should be done about it. A graphical turing test may rule out users of non graphical web browsers and maybe we can come up with something bettter. Implementing it prematurely might be a waste of human resources.
You can program in Perl so using Python should be a snap for a clever fellow like you.
Maybe. However, I don't like python as on our old P60 server it burned up so much CPU time (15 s/min). I can also program in C so I could probably fix the PHP bug as well. However, I do not always feel like doing everything, especially if the others don't like it.
But I confess if it were for me to decide on a response to your threats,
I was looking for a better word than 'warning', however, none of the alternatives seemed to fit. Also I tried to make my announcement of my bugtraq post as little offensive as possible. If you are a native english speaker maybe you can show me an even better way.
which it is not, I'd say sex and travel fits the bill.
Well, well, if you prefer some hints about sex over my bug reports maybe we should change the forum. About travelling, if you want you can join next European Rainbow gathering, my every year summer highlight. See my rainbow website for details: http://rainbow.bksys.at .
Have a nice day,
There's irony for you.
That was not meant ironically. Hmm, maybe 'cheers' would have been less ambigous, but only '(kind) regards' came to my mind at that time and that sounded too formal to me. Other suggestions?
Cheers, Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On 25 Nov 2003, at 20:06, Bernhard Kuemel wrote:
It would be interesting to see you present convincing evidence that Python runs slower than Perl which you seem happy to rely on.
Maybe dilettante springs to mind as a description that fits.
How about acting like a contributor to produce solutions instead of being a smart guy: why just contributes to the pool of problems when you could contribute to the pool of solutions to problems. Why should anybody take your proposals seriously, and invest their unpaid effort into proving their worth, when you cannot be bothered to invest the effort yourself.

Richard Barrett wrote:
That can be difficult as different programming languages are designed for different tasks so they all have their strengths and weaknesses. That also pretty much makes such a comparison senseless. It would e.g. be a good choice to make an mp3 encoder in C and my mailman exploit in perl and not the other way round. I would without doubt claim that C runs faster and many tasks are coded quicker in perl. It is my impression that python is slow, at least it has a lengthy startup. It may still be suitable for certain tasks, however I have no idea which as I don't speak python. Mailman was run once per minute from cron on my old server. Maybe Mailman was coded inefficiently. However, I read it scales better than Majordomo, a perl program. That difference is probably a design issue rather than to blame on the programming language.
Anyways, since you asked for a benchmark, here is a quick start. These programs were run repeatedly so the perl interpreter was already loaded from disk and cached. The shortest run was picked to minimize interference from other processes.
Detected 735.005 MHz processor. Calibrating delay loop... 1468.00 BogoMIPS CPU: Intel Celeron (Coppermine) stepping 06
bernhard@b:~/t$ time perl -e 'print "hello world\n";' hello world
real 0m0.017s user 0m0.000s sys 0m0.010s
This shows the small overhead on perl startup.
bernhard@b:~/t$ time perl -e 'for ($i=1;$i<=1000000;$i++) {print "$i: hello world\n";}' >/dev/null
real 0m2.147s user 0m2.090s sys 0m0.010s
A million string interpolations and file accesses in 2.1 s - not bad.
Compare this to C:
int main(void) { puts("hello world"); }
bernhard@b:~/src/benchmark$ time ./hello hello world
real 0m0.007s user 0m0.000s sys 0m0.000s
#include <stdio.h> int main(void) { int i; for (i=1;i<=1000000;i++) { printf("%d: hello world/n",i); } } //main
bernhard@b:~/src/benchmark$ time ./1Mhello >/dev/null
real 0m1.007s user 0m0.960s sys 0m0.010s
Give some examples for python. I'll run them on my machine if you don't have a campareble one.
Sure, this benchmark sucks, but it's not completely bogus. Any ideas about some more serious benchmarks? Going towards the strengths of each language would be somehow unfair, so a rather complex problem may be best but also difficult to implement. OTOH picking the language that offers the most advantages for a given problem is the way to go.
Maybe dilettante springs to mind as a description that fits.
Your insults are getting boring.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

Bernhard Kuemel wrote:
A million string interpolations and file accesses in 2.1 s - not bad.
Hmm, maybe the startup overhead of python is still significant with 1,000,000 iterations so here are 10,000,000 timings:
bernhard@b:~/src/benchmark$ time perl -e 'for ($i=1;$i<=10000000;$i++) {print "$i: hello world\n";}' >/dev/null
real 0m21.400s user 0m21.130s sys 0m0.030s
bernhard@b:~/src/benchmark$ time ./10Mhello >/dev/null
real 0m9.932s user 0m9.760s sys 0m0.020s
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Wed, 2003-11-26 at 05:36, Bernhard Kuemel wrote:
We don't need to get into lengthy language wars here, but I submit that there's no practical difference in performance between Python and Perl, especially in the problem domain that Mailman addresses. Anyone who wants to recode Mailman in C, Java, ML, Haskell, Smalltalk, Objective-C, C++, Perl, Ruby or whatever has my blessing (just don't call it Mailman :).
see-you-in-10-years-ly y'rs, -Barry

On Nov 27, 2003, at 10:32 AM, Barry Warsaw wrote:
Sorry, given that Mailman is almost always rate limited by I/O and the MTA, it's mostly irrelevant to talk about performance. Mailman is almost NEVER the bottleneck when you use 2.1. So why argue about upgrading a six lane highway to eight lanes when a mile down the road it all turns back into four?

Richard Barrett wrote:
Eric Steven Raymond says in "The Art of Unix Programming" (http://www.faqs.org/docs/artu/index.html) on http://www.faqs.org/docs/artu/ch14s04.html :
"Python cannot compete with C or C++ on raw execution speed (though using a mixed-language strategy on today's fast processors probably makes that relatively unimportant). In fact it's generally thought to be the least efficient and slowest of the major scripting languages, a price it pays for runtime type polymorphism. Beware of rejecting Python on these grounds, however; most applications do not actually need better performance than Python offers, and even those that appear to are generally limited by external latencies such as network or disk waits that entirely swamp the effects of Python's interpretive overhead. Also, by way of compensation, Python is exceptionally easy to combine with C, so performance-critical Python modules can be readily translated into that language for substantial speed gains."
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Tue, Dec 16, 2003 at 11:44:14PM +0100, Bernhard Kuemel wrote:
Uhh... duh? There isn't a scripting language out there that's faster than C (I mean, there are things that scripting languages can *make* faster, like preventing ppl from doing stupid sorts by only putting in well-optimized sort algo's but that's not what most ppl mean by "speed"). I can't believe ESR wrote that for python and not just for scripting languages in general.
Really? I never thought that python was slower than, say, tcl prior to 8.x. I thought that the general concensus is that python and perl are about neck-and-neck for the fastest possible interpreter.
But in short you're taking something written for popular consumption instead of writing a test case in a few languages and seeing how fast it runs. That's not helpful at all.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

On Tue, 2003-11-25 at 15:06, Bernhard Kuemel wrote:
It would probably be more efficient if some who are familiar with the mailman code fixed its "security flaws".
Just to be snitty and pedantic, I don't consider email address leaks in Pipermail to be security flaws. Not that I don't consider them serious enough to address (I do), but it's a different class of problem than some exploit that could be used to subvert the Mailman system or the machine it's running on.
-Barry

It's not a security issue. It's a privacy issue. Very different beasts. Very important beasts, but the only thing they have in common is the number of legs they have.
The underlying issue is similar to many bugtraq issues: what used to be a common, acceptable coding practice no longer is. But man, things like "security issue" are rapidly becoming an aspect of Godwin's law: phrases that are overused and used improperly, and mostly indicating the end of useful dicussion and the entrance into the world of political and emotional rhetoric.
On Nov 27, 2003, at 10:28 AM, Barry Warsaw wrote:
Just to be snitty and pedantic, I don't consider email address leaks in Pipermail to be security flaws.

On Tuesday, Nov 25, 2003, at 11:46 US/Central, Bernhard Kuemel wrote:
It seems, Bernard, that you may as well have posted it to bugtraq immediately, since your posting of the code to this list will likely make the exploit code accessible via a google search for "mailman exploit" within a matter of hours...

Doug Selph wrote:
Well, it's a different thing being told about something or having to look for it without even knowing that it exists. But if you are worried, feel free to remove the exploit from the archive. I guess you know how to do that.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

- On 2003.11.16, in <3FB78C1A.5080702@gmx.at>,
This test may disable users of non graphical web browers or email only subscribers to subscribe.
I've generally found that encoding the address as HTML character entities works fine. I've had a bait address on my web page for quite some time, and it's never received any spam. It's readable to text browsers, it doesn't affect readability with unexpected font sizes, it takes little computation, and it's trivial to write.
My address would become: dgc@uchicago.edu
I've long favored this approach where display of mail addresses offends people.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

David Champion wrote:
While this approach may work in individual cases it is trivially and quite likely defeatet when the prize is 100,000 mailing lists with 1 to 5 million (!) high quality email addresses.
Since your answer is the only one and the problem does not appear to be addressed sufficiently I wrote an example exploit program that finds mailman lists and harvests their email addresses. After about 20 minutes it collected about 30.000 email addresses: http://bksys.at/bernhard/30,000%20email%20addresses.gz
The program can be further improved. It can be a little parallelized. It can check a site for further mailing lists (the admin overview has a more complete list than the listinfo overview). And it can be made to subscribe to mailing lists where the member list is only available to the list members.
If you think the problem is worth fixing please estimate how long it will take and I will wait a reasonable time for a fix before I post the problem and the exploit code to bugtraq. Otherwise I will post to bugtraq in about 1 week.
Here is the exploit code:
#!/usr/bin/perl -w
$n=0;
$u=0;
for ($i=0;1;$i+=10) {
$#urls=-1;
$google=lynx --dump 'http://www.google.com/search?q=%22Click+here+for+the+list%22+%22batched+in+a+daily+digest%22&start=$i'
;
# print $google;
@urls=$google=~/cache:.{12}:(.*?)\+%22/g;
if ($#urls==-1) {last;}
# print join("\n",@urls);
# print "\naoeu $#urls\n";
foreach $url (@urls) {
$u++;
$url=~s*/listinfo/*/roster/*;
print "$url...\n";
$roster=`lynx -connect_timeout=10 -dump $url`;
# print $roster;
@mails=$roster=~/^ +\* \(?\[\d+\](.* at
.*?)\)?$/mgo; foreach $mail (@mails) { $mail=~s/ at /@/; print "$mail\n"; $n++; } print "mails=".($#mails+1).", total=$n, url=$u, google=$i\n"; # exit; } #foreach url
} #while google
Have a nice day, Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

Barry Warsaw wrote:
Thank you. I'd appreciate being informed about the progress.
I want to remind you about my graphical turing test I proposed as solution:
http://mail.python.org/pipermail/mailman-developers/2003-November/016082.htm...
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Nov 25, 2003, at 11:03 AM, Barry Warsaw wrote:
Fails ADA and accessibility requirements badly. I'd argue against any
solution that fails such basic needs without any real way to fix it.
Better is to simply teach the archives not to distribute sensitive
information at all. And a lot easier to implement, actually.

On Tue, Nov 25, 2003 at 11:07:39AM -0800, Chuq Von Rospach wrote:
Fails ADA and accessibility requirements badly. I'd argue against any
solution that fails such basic needs without any real way to fix it.
What about reverse turing tests that aren't graphics-based? It's easier to beat "What is the sum of three and fifteen?" or "what is the name of this mailing list?" text-tests than the more complex RTTs, but it would make exploit code that much harder to write without sacrificing users who can't, for example, view graphics or hear sounds.
Better is to simply teach the archives not to distribute sensitive
information at all. And a lot easier to implement, actually.
So, is anyone working on this *within* pipermail? I know there are great alternative archivers out there, but Mailman still winds up with a bad reputation if the default isn't very secure. Maybe for 2.2 we could have a "completely obscure archived email addresses" option which changed them all to user@xxxxxx.

On Nov 27, 2003, at 9:08 AM, Terri Oda wrote:
if it can be made accessible, I have no problem with it. But I think it's solving the wrong problem, because the data is still accessible to a motivated person. you're not fixing the issue, simply raising the bar and hoping they give up. It won't stop the spammer who hires a dozen temps to surf web sites and authenticate their bots through, right?
So the REAL answer, IMHO, is to not make that information available. Cloak it programattically. If you want to have an authenticated mode for subscribers to keep it uncloaked, that's fine by me. But the public archives should simply recognize significant pieces of information and elide them from the output. That way, no matter what a spammer or other nasty person does, they can't get the information. It's not there.
my definition of significant pieces of information: email addresses, phone numbers, social security nubmers (and if there are global equivalents to this US number, those, too). Simply replace them in the text with [[email address omitted]] as you deliver the archive. Then you stop playing this arms war with spambots completely, by removing the target they're after. No need to, two years from now, rip out the work you did and come up with a new temporary fix because spammers got around to implementing new OCR techniques.
Remember challenge/response? When everyone thought it was the solution to all of our problems? Took the spammers under six weeks to crack it once they decided to try. (answer: send spam as being "From:" you, "To:" you. Most C/R systems have the user's email address whitelisted. end of story.
that would be the answer, or throw it out (I'm not a huge fan of pipermail; it's only advantage to mailman is it's written in Python) and do something else. Or leave pipermail alone, and write a CGI that all archives exit through that does the filtering, which is IMHO, how you ought to do it. That way, you can authenticate via that CGI to a level of access, change the filtering on the fly, and leave the archives unedited (as I think they ought to be).

On Thu, Nov 27, 2003 at 09:17:33AM -0800, Chuq Von Rospach wrote:
Of course. We should remember that *that's* the reason not to do turing tests.
Incidentally, I don't advocate their use either --I think a CGI address-eater as Chuq describes is probably is the way to go, and given more time I'd even write it-- but I wanted to make sure they weren't dismissed out of hand for technical reasons.
Terri

On Nov 27, 2003, at 9:52 AM, Terri Oda wrote:
It's a great example of people solving problems before they actually define them, and throwing resources at symptoms, not really solving what's at root cause.
Now sometimes you have no alternative than a continuing arms race of escalation, like in the current spam/anti-spam wars. But it's always useful to sit back and see if you can figure out what the real problem is and whether you can circumvent it at a basic level and not just run around patching the latest version of it.
And it's also important to not over-fix a problem. After all, there's still nothing stopping spammers from simply subscribing to mailing lists and harvesting addresses from postings directly, other than it's simply easier and more anonymous to grab archives. So don't waste time OVER-securing the archives, since that just leads to a false sense of security anyway. If you really want to secure this, you'll have to tear down mailman to square one, and re-engineer it to obscure mail addresses on all traffic, and replace them with mapped addresses that forward through the server. that means all 1to1 traffic (replies, etc) also need to travel through the server, and effectively, Mailman starts becoming an anonymous remailer type of beast as well as a mail server. Which creates a whole new class of problems while solving this one...
(and yes, that's actually a design paradigm I'm noodling on, in what little time I have to noodle right now.)

On Thu, 2003-11-27 at 12:08, Terri Oda wrote:
No one's working on it AFAIK, but I agree that this is the right approach. I'm not sure how to go about this within the Mailman 2.1 series though, because currently only the private archives are accessed programmatically. That may be a good first step though -- add the obscuring stuff to the private archive cgi and then if that works out well, provide a way to make a public archive vend through the private archive cgi (one way: enable private archives with no password). It's still arguably a new feature, but perhaps we could sneak it in as a bug fix.
-Barry

On Fri, 2003-11-28 at 06:08, Terri Oda wrote:
On the copy of Mailman I run here, I just went though Mailman/Archiver/HyperArch.py and replaced all the occurances of re.sub('@', _(' at ') with re.sub(r'([\w\.-]+@.)[\w\.-]+', r'\1...' which achieves a similar effect with ARCHIVER_OBSCURES_EMAILADDRS turned on.
(then you just need to add an ACL to the webserver to stop someone downloading the listname.mbox file that has all the unmunged addresses still in it)
-- Colin Palmer <colinp@waikato.ac.nz> University of Waikato, ITS Division

On Nov 27, 2003, at 2:26 PM, Colin Palmer wrote:
which is a no-op, since spambot's learned how to de-obfuscate that stuff years ago. False sense of security. All it really does is make it more difficult for people reading it, not the computers harvesting it.

On Fri, 2003-11-28 at 06:26, Colin Palmer wrote:
I'd consider turning this off for 2.1.4 if people agree. Perhaps making it available only through a site config var. I'm not sure how easy that is, but it seems important enough to close off access to the mbox file.
Opinions? -Barry

On Thursday 27 November 2003 11:05 pm, Barry Warsaw wrote:
I'd prefer it gone. If someone needs it badly enough and they can convince me, I can make it available by some other method.
--
"The true measure of a man is how he treats someone who can do him absolutely no good."
- Samuel Johnson (1709-1784)

- On 2003.11.27, in <1069992315.19968.8.camel@anthem>,
- "Barry Warsaw" <barry@python.org> wrote:
I *really* value this ability, but I understand the arguments for not making it downloadable. How hard would it be to avail it to subscribers, but to restrict it to anonymous accessors? And would that be sufficient for most people?
And while on the topic, I'd like to see munging in the anonymous filter and original text in the authenticated filter, too, as someone else has described.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

On Fri, 2003-11-28 at 20:39, David Champion wrote:
Probably more than I can realistically do for 2.1.4. However I did implement the ability for the site admin to turn off public mboxes. You'll see it after I catch up on all my turkey-induced hacking.
-Barry

On Fri, 2003-11-28 at 17:05, Barry Warsaw wrote:
Maybe just have ARCHIVE_TO_MBOX default to 0?
I deliberately want Mailman to keep creating mbox archives in case I want to regenerate the list archives completely using a newer version of HyperArch, or switch to something else entirely, I just don't want to offer them for download, so having them created outside of /pipermail/ if they are turned on would be nice, but not an urgent thing since it's easy enough to block access at the webserver.
-- Colin Palmer <colinp@waikato.ac.nz> University of Waikato, ITS Division

- On 2003.11.25, in <3FC39580.2020605@gmx.at>,
- "Bernhard Kuemel" <darsie@gmx.at> wrote:
You have way too much time on your hands.
-- -D. dgc@uchicago.edu University of Chicago > NSIT > VDN > ENSS > ENSA > You are here . . . . . . . always line up dots

David Champion wrote:
You are absolutely right. Anyone has a (paid) (Linux) job for me?
Also - sorry for being annoying, but I try to keep the spam down and when the graphical turing test I offered to block email harvesting bots was just turned away I was a little disappointed. Well, maybe I am wrong and this is not really an issue, then sit back and relax. Maybe my contribution will also be rejected from bugtraq. But I feel millions of email addresses are worth a bit more security/privacy.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On 25 Nov 2003, at 17:46, Bernhard Kuemel wrote:
I am just a spectator but this doesn't look like a major contribution
to the Open Source movement by you.
As a way of getting your code and ideas adopted it is one hell of an
approach.
A better approach might be to work up a patch for the current Mailman
release that will demonstrably function in practice (how are we going
to manage all those images your original "Turing test" proposal will
lead to) and submit that like any other contributor. You can program in
Perl so using Python should be a snap for a clever fellow like you.
But I confess if it were for me to decide on a response to your
threats, which it is not, I'd say sex and travel fits the bill.
There's irony for you.
Richard Barrett http://www.openinfo.co.uk

Richard Barrett wrote:
You are right. It is a small contribution. I also filed a PHP bug today. Another small contribution. Makes 2 today. But not all days are as productive as this one.
As a way of getting your code and ideas adopted it is one hell of an
approach.
Well, I'm not sure if a graphical turing test makes up for the drawbacks I mentioned so I'm not sure it will make it to mailman. But I'm glad that the email harvesting problem get's some attention now.
It would probably be more efficient if some who are familiar with the mailman code fixed its "security flaws". Also we first need to find out what should be done about it. A graphical turing test may rule out users of non graphical web browsers and maybe we can come up with something bettter. Implementing it prematurely might be a waste of human resources.
You can program in Perl so using Python should be a snap for a clever fellow like you.
Maybe. However, I don't like python as on our old P60 server it burned up so much CPU time (15 s/min). I can also program in C so I could probably fix the PHP bug as well. However, I do not always feel like doing everything, especially if the others don't like it.
But I confess if it were for me to decide on a response to your threats,
I was looking for a better word than 'warning', however, none of the alternatives seemed to fit. Also I tried to make my announcement of my bugtraq post as little offensive as possible. If you are a native english speaker maybe you can show me an even better way.
which it is not, I'd say sex and travel fits the bill.
Well, well, if you prefer some hints about sex over my bug reports maybe we should change the forum. About travelling, if you want you can join next European Rainbow gathering, my every year summer highlight. See my rainbow website for details: http://rainbow.bksys.at .
Have a nice day,
There's irony for you.
That was not meant ironically. Hmm, maybe 'cheers' would have been less ambigous, but only '(kind) regards' came to my mind at that time and that sounded too formal to me. Other suggestions?
Cheers, Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On 25 Nov 2003, at 20:06, Bernhard Kuemel wrote:
It would be interesting to see you present convincing evidence that Python runs slower than Perl which you seem happy to rely on.
Maybe dilettante springs to mind as a description that fits.
How about acting like a contributor to produce solutions instead of being a smart guy: why just contributes to the pool of problems when you could contribute to the pool of solutions to problems. Why should anybody take your proposals seriously, and invest their unpaid effort into proving their worth, when you cannot be bothered to invest the effort yourself.

Richard Barrett wrote:
That can be difficult as different programming languages are designed for different tasks so they all have their strengths and weaknesses. That also pretty much makes such a comparison senseless. It would e.g. be a good choice to make an mp3 encoder in C and my mailman exploit in perl and not the other way round. I would without doubt claim that C runs faster and many tasks are coded quicker in perl. It is my impression that python is slow, at least it has a lengthy startup. It may still be suitable for certain tasks, however I have no idea which as I don't speak python. Mailman was run once per minute from cron on my old server. Maybe Mailman was coded inefficiently. However, I read it scales better than Majordomo, a perl program. That difference is probably a design issue rather than to blame on the programming language.
Anyways, since you asked for a benchmark, here is a quick start. These programs were run repeatedly so the perl interpreter was already loaded from disk and cached. The shortest run was picked to minimize interference from other processes.
Detected 735.005 MHz processor. Calibrating delay loop... 1468.00 BogoMIPS CPU: Intel Celeron (Coppermine) stepping 06
bernhard@b:~/t$ time perl -e 'print "hello world\n";' hello world
real 0m0.017s user 0m0.000s sys 0m0.010s
This shows the small overhead on perl startup.
bernhard@b:~/t$ time perl -e 'for ($i=1;$i<=1000000;$i++) {print "$i: hello world\n";}' >/dev/null
real 0m2.147s user 0m2.090s sys 0m0.010s
A million string interpolations and file accesses in 2.1 s - not bad.
Compare this to C:
int main(void) { puts("hello world"); }
bernhard@b:~/src/benchmark$ time ./hello hello world
real 0m0.007s user 0m0.000s sys 0m0.000s
#include <stdio.h> int main(void) { int i; for (i=1;i<=1000000;i++) { printf("%d: hello world/n",i); } } //main
bernhard@b:~/src/benchmark$ time ./1Mhello >/dev/null
real 0m1.007s user 0m0.960s sys 0m0.010s
Give some examples for python. I'll run them on my machine if you don't have a campareble one.
Sure, this benchmark sucks, but it's not completely bogus. Any ideas about some more serious benchmarks? Going towards the strengths of each language would be somehow unfair, so a rather complex problem may be best but also difficult to implement. OTOH picking the language that offers the most advantages for a given problem is the way to go.
Maybe dilettante springs to mind as a description that fits.
Your insults are getting boring.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

Bernhard Kuemel wrote:
A million string interpolations and file accesses in 2.1 s - not bad.
Hmm, maybe the startup overhead of python is still significant with 1,000,000 iterations so here are 10,000,000 timings:
bernhard@b:~/src/benchmark$ time perl -e 'for ($i=1;$i<=10000000;$i++) {print "$i: hello world\n";}' >/dev/null
real 0m21.400s user 0m21.130s sys 0m0.030s
bernhard@b:~/src/benchmark$ time ./10Mhello >/dev/null
real 0m9.932s user 0m9.760s sys 0m0.020s
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Wed, 2003-11-26 at 05:36, Bernhard Kuemel wrote:
We don't need to get into lengthy language wars here, but I submit that there's no practical difference in performance between Python and Perl, especially in the problem domain that Mailman addresses. Anyone who wants to recode Mailman in C, Java, ML, Haskell, Smalltalk, Objective-C, C++, Perl, Ruby or whatever has my blessing (just don't call it Mailman :).
see-you-in-10-years-ly y'rs, -Barry

On Nov 27, 2003, at 10:32 AM, Barry Warsaw wrote:
Sorry, given that Mailman is almost always rate limited by I/O and the MTA, it's mostly irrelevant to talk about performance. Mailman is almost NEVER the bottleneck when you use 2.1. So why argue about upgrading a six lane highway to eight lanes when a mile down the road it all turns back into four?

Richard Barrett wrote:
Eric Steven Raymond says in "The Art of Unix Programming" (http://www.faqs.org/docs/artu/index.html) on http://www.faqs.org/docs/artu/ch14s04.html :
"Python cannot compete with C or C++ on raw execution speed (though using a mixed-language strategy on today's fast processors probably makes that relatively unimportant). In fact it's generally thought to be the least efficient and slowest of the major scripting languages, a price it pays for runtime type polymorphism. Beware of rejecting Python on these grounds, however; most applications do not actually need better performance than Python offers, and even those that appear to are generally limited by external latencies such as network or disk waits that entirely swamp the effects of Python's interpretive overhead. Also, by way of compensation, Python is exceptionally easy to combine with C, so performance-critical Python modules can be readily translated into that language for substantial speed gains."
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html

On Tue, Dec 16, 2003 at 11:44:14PM +0100, Bernhard Kuemel wrote:
Uhh... duh? There isn't a scripting language out there that's faster than C (I mean, there are things that scripting languages can *make* faster, like preventing ppl from doing stupid sorts by only putting in well-optimized sort algo's but that's not what most ppl mean by "speed"). I can't believe ESR wrote that for python and not just for scripting languages in general.
Really? I never thought that python was slower than, say, tcl prior to 8.x. I thought that the general concensus is that python and perl are about neck-and-neck for the fastest possible interpreter.
But in short you're taking something written for popular consumption instead of writing a test case in a few languages and seeing how fast it runs. That's not helpful at all.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

On Tue, 2003-11-25 at 15:06, Bernhard Kuemel wrote:
It would probably be more efficient if some who are familiar with the mailman code fixed its "security flaws".
Just to be snitty and pedantic, I don't consider email address leaks in Pipermail to be security flaws. Not that I don't consider them serious enough to address (I do), but it's a different class of problem than some exploit that could be used to subvert the Mailman system or the machine it's running on.
-Barry

It's not a security issue. It's a privacy issue. Very different beasts. Very important beasts, but the only thing they have in common is the number of legs they have.
The underlying issue is similar to many bugtraq issues: what used to be a common, acceptable coding practice no longer is. But man, things like "security issue" are rapidly becoming an aspect of Godwin's law: phrases that are overused and used improperly, and mostly indicating the end of useful dicussion and the entrance into the world of political and emotional rhetoric.
On Nov 27, 2003, at 10:28 AM, Barry Warsaw wrote:
Just to be snitty and pedantic, I don't consider email address leaks in Pipermail to be security flaws.

On Tuesday, Nov 25, 2003, at 11:46 US/Central, Bernhard Kuemel wrote:
It seems, Bernard, that you may as well have posted it to bugtraq immediately, since your posting of the code to this list will likely make the exploit code accessible via a google search for "mailman exploit" within a matter of hours...

Doug Selph wrote:
Well, it's a different thing being told about something or having to look for it without even knowing that it exists. But if you are worried, feel free to remove the exploit from the archive. I guess you know how to do that.
Bernhard
-- Webspace; Low end Serverhousing ab 15 e, etc.: http://www.bksys.at Linux Admin/Programmierer: http://bksys.at/bernhard/services.html
participants (10)
-
Barry Warsaw
-
Bernhard Kuemel
-
Chuq Von Rospach
-
Colin Palmer
-
David Champion
-
Doug Selph
-
Peter C. Norton
-
Phil Barnett
-
Richard Barrett
-
Terri Oda