Re: [Mailman-Developers] Proposed: remove address-obfuscation code fromMailman 3
--On 24 August 2009 13:15:03 -0500 "Hopkins, Justin" <hopkinsju@umsystem.edu> wrote:
Thanks for such a detailed and compelling post..but I must disagree. I can't refute any of the arguments you made, they are all quite sound, but I do take issue with your conclusion.
Obfuscating the email addresses is just a part of 'defense in depth' - same as patching your computer, using a firewall, etc. Each layer, no matter how thin, still adds something.
Cheers, Justin
Quite right. Rich's argument is, essentially, that obfuscation isn't 100% effective so it shouldn't be used. Frankly, if it's 10% effective, then it's worth doing in my view.
Further, Rich offers no evidence of significant harm done by obfuscation.
Finally, there are other privacy concerns than spam harvesting that may also be mitigated by address obfuscation.
-- Ian Eiloart IT Services, University of Sussex 01273-873148 x3148 For new support requests, see http://www.sussex.ac.uk/its/help/
Ian> Quite right. Rich's argument is, essentially, that obfuscation
Ian> isn't 100% effective so it shouldn't be used. Frankly, if it's 10%
Ian> effective, then it's worth doing in my view.
I would be quite surprised if address obfuscation is anywhere close to 10% effective. Maybe 0.01%.
The problem I see with Barry's argument that users demand it so Mailman must provide it is that position just propagates misinformation about the ineffectiveness of the "feature". I would vote for tossing it out, or at the very least making it a per-list flag which admins could disable if they wanted.
The other thing about Mailman's obfuscation is that I sorta think that by now the spammers have figured it out. I mean, "skip at pobox.com"? Come on. Even Barry stands a good chance of writing a regular expression that can locate something like that, his self-deprecation about his r.e. prowess notwithstanding. :-) If nothing else, all an enterprising spammer would have to do is steal Mailman's email address matcher and replace "@" with " at ". Oh, wait, it's open source. They wouldn't even have to steal the code.
-- Skip Montanaro - skip@pobox.com - http://www.smontanaro.net/ Getting old sucks, but it beats dying young
You are presuming too much on spammers as a whole. I've dealt with a couple spammers, and they just used some tools they got online that search for username@domain.something. Everything else is ignored.
I don't for a minute doubt that the advanced spammers will snag anything and everything no matter how strange it is obfusticated (sp?). But there are a LOT of low-tech spammers still out there, and there is enough "low hanging fruit" for them that this little bit we are discussing can be over their head.
Bob
---------- Original Message ----------- From: skip@pobox.com To: Ian Eiloart <iane@sussex.ac.uk> Cc: mailman-developers@python.org, Rich Kulawiec <rsk@gsp.org> Sent: Tue, 25 Aug 2009 06:42:12 -0500 Subject: Re: [Mailman-Developers] Proposed: remove address-obfuscation code fromMailman 3
Ian> Quite right. Rich's argument is, essentially, that obfuscation Ian> isn't 100% effective so it shouldn't be used. Frankly, if it's 10% Ian> effective, then it's worth doing in my view.
I would be quite surprised if address obfuscation is anywhere close to 10% effective. Maybe 0.01%.
The problem I see with Barry's argument that users demand it so Mailman must provide it is that position just propagates misinformation about the ineffectiveness of the "feature". I would vote for tossing it out, or at the very least making it a per-list flag which admins could disable if they wanted.
The other thing about Mailman's obfuscation is that I sorta think that by now the spammers have figured it out. I mean, "skip at pobox.com"? Come on. Even Barry stands a good chance of writing a regular expression that can locate something like that, his self- deprecation about his r.e. prowess notwithstanding. :-) If nothing else, all an enterprising spammer would have to do is steal Mailman's email address matcher and replace "@" with " at ". Oh, wait, it's open source. They wouldn't even have to steal the code.
-- Skip Montanaro - skip@pobox.com - http://www.smontanaro.net/ Getting old sucks, but it beats dying young
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/bob%40nleaudio.com
Security Policy: http://wiki.list.org/x/QIA9 ------- End of Original Message -------
Bob Puff wrote:
You are presuming too much on spammers as a whole. I've dealt with a couple spammers, and they just used some tools they got online that search for username@domain.something. Everything else is ignored.
I don't for a minute doubt that the advanced spammers will snag anything and everything no matter how strange it is obfusticated (sp?). But there are a LOT of low-tech spammers still out there, and there is enough "low hanging fruit" for them that this little bit we are discussing can be over their head.
It's not. Spammers usually don't do address harvesting themselves nowadays, but outsource it to botnets (just like they outsource the spamming itself to botnets) that are running kind of "off the shelf" software tailored to the task. Today, as a spammer you go out and buy those services in online shops, paying by credit card. And parsing "localpart at domain" is among the most trivial things current harvester modules do.
Any wanna-be spammers who still run their garage business with self written tools are pretty much meaningless in terms of magnitude.
If anything, this kind of obfuscation is an inconvenience to legitimate users, but certainly not to spammers.
-Julian
--On 25 August 2009 21:02:01 +0000 Julian Mehnle <julian@mehnle.net> wrote:
Bob Puff wrote:
You are presuming too much on spammers as a whole. I've dealt with a couple spammers, and they just used some tools they got online that search for username@domain.something. Everything else is ignored.
I don't for a minute doubt that the advanced spammers will snag anything and everything no matter how strange it is obfusticated (sp?). But there are a LOT of low-tech spammers still out there, and there is enough "low hanging fruit" for them that this little bit we are discussing can be over their head.
It's not. Spammers usually don't do address harvesting themselves nowadays, but outsource it to botnets (just like they outsource the spamming itself to botnets) that are running kind of "off the shelf" software tailored to the task. Today, as a spammer you go out and buy those services in online shops, paying by credit card. And parsing "localpart at domain" is among the most trivial things current harvester modules do.
Any wanna-be spammers who still run their garage business with self written tools are pretty much meaningless in terms of magnitude.
If anything, this kind of obfuscation is an inconvenience to legitimate users, but certainly not to spammers.
-Julian
There's recently published research which suggests that simple obfuscation can be effective. Concealment, presumably, is more effective. At <http://www.ceas.cc/> you can download "Spamology: A Study of Spam Origins" <http://www.ceas.cc/papers-2009/ceas2009-paper-18.pdf>
They say "Surprisingly, even simple email obfuscation approaches are still sufficient today to prevent spammers from harvesting emails." and "Commonly-used email obfuscation techniques are offering protection (for now). It is common practice to replace the conventional @ in email addresses by an AT in order to defeat email harvesting. We found that the spammers are still not parsing simple obfuscations as of now. However, one should not count on the protection offered by such simple obfuscation schemes, for they are trivial to defeat."
Of course, list posts hang around for a long time, and may be mirrored (eg by Google caching). Therefore, concealment seems more sensible than obfuscation. Perhaps a captcha could be used to reveal sender addresses, for example.
The paper might be more interesting for its discussion of techniques for detecting (eg with honeypots) and defeating harvesters.
-- Ian Eiloart IT Services, University of Sussex 01273-873148 x3148 For new support requests, see http://www.sussex.ac.uk/its/help/
On Wed, Aug 26, 2009 at 10:57:06AM +0100, Ian Eiloart wrote:
There's recently published research which suggests that simple obfuscation can be effective. Concealment, presumably, is more effective. At <http://www.ceas.cc/> you can download "Spamology: A Study of Spam Origins" <http://www.ceas.cc/papers-2009/ceas2009-paper-18.pdf>
I'm composing a combined reply to all of the comments here, but wish to reply to this single point separately.
This paper seems well-intentioned, but has some very serious problems -- any one of which is sufficient to dismiss its conclusions entirely. Let me just enumerate a few of them; I'll spare you the entire list.
- The authors presume that they can tell that an address has been harvested *and* added to at least one spammer database (or not) by observing spam sent to it. But that's wrong: we know that many addresses are harvested and never spammed, or not spammed for a very long time (as in "years"). Conversely, many addresses are spammed that have *never* been harvested. And some addresses that are harvested are spammed, but not because they were harvested. [1] And some addresses are picked up by routine/ordinary web crawlers, and then subsequently spammed, but not by the people running those crawlers. [2]
This invalidates their measurement technique.
There's a major methodology error here:
"We began by registering a dedicated domain for this project, which we hosted on servers in our department."
We know that some spammers -- the competent ones, who are the ones that matter -- use suppression lists based not just on domains, but TLDs, IP addresses, network allocations, ASNs, NS records, MX records, etc. We further know that anything tracing to a .edu or a network allocation/ASN associated with a .edu is quite likely to appear on those suppression lists. (This is an "old tradition" among spammers. Not all of them follow it, but quite a few do.)
This also invalidates their measurement technique.
Statistics from any single domain are often wildly skewed one way or another. For example: I happen to host three domains which have the same name, but in three different TLDs. Everything else about them is exactly the same: NS, MX, web content, valid email addresses, etc. The spam they receive varies over three orders of magnitude.
And then there's this: it doesn't cover use of the single largest current vector for address harvesting -- zombied systems. No discussion of contemporary address harvesting techniques can even be begun without considering this. It's like writing a paper on tides without factoring in the moon's gravitation. [3]
(I checked to see if perhaps this paper's publication predated the rise of the zombies earlier this decade, but it's from 2009.)
To put it another way: yes, there are still address harvesters using the techniques that these researchers were looking for. But these harvesters are outdated and unimportant; they're only used by spammers who don't have the expertise and resources to do better. And not only is that class of spammer is steadily shrinking, that's NOT the class of spammer we need to worry about, as it's quite easy to block just about all their traffic whether they have valid addresses or not. (C'mon, these are people who can't decode rskATgsp.org, do you really think they constitute a serious threat?)
So like I said above, I'll spare you points 5-N, but they're similar. None of what I've said here is new or novel: it's common knowledge among experienced people working in the field. I think perhaps in the future that people trying to conduct this kind of research should spend a few years reading spam-l and other similar lists before diving in.
The bottom line is that (a) the numbers they've produced have no meaning and (b) their conclusions are all wrong.
Notes:
[1] As an example: conside joe@example.com, and let's suppose that it's been deliberately exposed to one method of harvesting because it's published at http://www.example.com. If spam arrives, then it may be because the address was harvested by a web crawler and added to a spammer database -- or it may be because "joe" is very common LHS string and thus one that spammers are very likely to try in *any* domain. Note that while spammers' list of such likely LHS were quite limited years ago, they're not any more: spammers now have the resources to try all known and all plausible LHS strings if they wish. And they are: check your logs. You may be surprised at which LHS strings are being tried: what was computationally infeasible a decade ago is now routine.
[2] It's not difficult to figure out who's running a web crawler: just setting up a web site, making sure it's linked to, waiting, and then analyzing logs will reveal a candidate list. It's somewhat more work to figure out which of those crawler operations can be broken into, but it has significant advantages: it allows one to mine all their data without the expense/hassle of collecting it, and it conceals the source/use of that data.
There are a lot of crawler operations out there. It would be silly to think that they're *all* secure.
[3] Harvesting addresses on zombies has quite a few advantages over other techniques: It uses the host's own resources. It's unlikely to be detected. It won't be stopped by firewalls or rate-limiting at the network level. It provides social graph information. It provides timestamp information. It provides MUA information. It may yield useful phishing information. It may yield useful identity theft information. It may yield useful blackmail information. And all of this can be bundled up by suitable extraction software and delivered as a package back to a C&C node.
For example, from a single email message sitting on Fred's computer:
Fred last received email from Barney at 2009-08-11 07:32:12 UTC,
thus Barney's address is known-good as of that time, Fred will
probably accept suitably-forged mail from Barney, and vice-versa.
And of course since Fred's computer is now owned by spammers,
no anti-forgery mechanism of any kind will detect the latter.
And maybe an appropriate malware payload from Fred to Barney
will yield another zombie, where "appropriate" may be partially
inferred by checking the headers and seeing what MUA Barney
is using. Maybe those headers will also identify what MTA
and associated anti-malware software Barney's site is using, so
that the payload can be appropriately chosen. Phishing bonus if
Barney's address is barney@some-bank or similar. Blackmail bonus
if Barney's address is on an "adult dating" or "escort" site.
Identity theft bonus if regexp matching on message-body turns
up NNN-NN-NNNN (US social security number) and the like. &etc.
Now multiply this by a billion. At least -- because there are at least a hundred million zombies and estimating only 10 stored messages per zombie gets us to a billion. This is why the serious/"professional" address harvesting operations have shifted from some of the older and less efficient techniques to this one, and why defending against those methods is now pointless.
---Rsk
Something else that occurs to me.
If we accept that obfuscation is worthless and stop doing it, then
there's no reason we shouldn't make the raw mbox files available for
anyone to download. Mailman used to do this, but we removed the
feature due to user outcry. Now you can download the gzip
monthly .txt files, but they are sanitized. If we stop obfuscating,
is there any reason not to make the raw messages available for download?
-Barry
That's the logical progression of that argument, and is the good reason why obfuscation or even removal of parts is not only a good idea, its a necessity. Exposing raw email addresses in their normal form is real low-hanging fruit.
Regardless of what I think, my clients will cry bloody murder if emails leak out. I had one person recently google their email address, and found a link to an archive file that should have been private. I had removed all links to the archives, but somehow Google found it, indexed it, and the guy threatened me with bloody murder if I didn't take it down. Sheesh.
Bob
---------- Original Message ----------- From: Barry Warsaw <barry@python.org> To: Rich Kulawiec <rsk@gsp.org> Cc: mailman-developers@python.org Sent: Fri, 28 Aug 2009 21:46:01 -0400 Subject: Re: [Mailman-Developers] Proposed: remove address-obfuscation code from Mailman 3
Something else that occurs to me.
If we accept that obfuscation is worthless and stop doing it, then
there's no reason we shouldn't make the raw mbox files available for anyone to download. Mailman used to do this, but we removed the
feature due to user outcry. Now you can download the gzip monthly .txt files, but they are sanitized. If we stop obfuscating, is there any reason not to make the raw messages available for download?-Barry ------- End of Original Message -------
Bob Puff wrote:
That's the logical progression of that argument, and is the good reason why obfuscation or even removal of parts is not only a good idea, its a necessity. Exposing raw email addresses in their normal form is real low-hanging fruit.
Regardless of what I think, my clients will cry bloody murder if emails leak out. I had one person recently google their email address, and found a link to an archive file that should have been private. I had removed all links to the archives, but somehow Google found it, indexed it, and the guy threatened me with bloody murder if I didn't take it down. Sheesh.
There's robots.txt, you know? If this is just about user outcry, then robots.txt will fix it (since all legitimate search engines honor it).
-Julian
I am pretty sure allowing the raw email addresses to be available is going to go over like a lead balloon here. Anything (however minor) to help protect the users/clients email addresses is helpful despite what others think. It is fine if someone considers the obfuscation that Mailman uses is trivial, however, anything I can do to make it harder or more computationally time-invested to get the email address is better than giving it away. Sure bots are out there but if what I do helps slow down someones system to make them look at it (and hopefully get rid of the bot), then great. But at least give me the choice to be able to do it.
I happened to like Barry's (?) earlier comment about the "send me this message" link. Or maybe "send my message to the original poster" link where you can click on the link, compose your message, and send it through Mailman all without the original sender's address. Mailman or whatever process can figure out the original sender and pass on the your message. Yes, I know it is more work that is why we have computers :)
As for using robots.txt, hmm, it is not the legitimate search engines I care about, it is the search engines/crawlers that do not respect my robots.txt file that I care about. If I had an effective way to consistently identify those non-legitimate crawlers, I would add what I needed to drop them into my firewall as I recognized them.
Chris
Julian Mehnle wrote:
Bob Puff wrote:
That's the logical progression of that argument, and is the good reason why obfuscation or even removal of parts is not only a good idea, its a necessity. Exposing raw email addresses in their normal form is real low-hanging fruit.
Regardless of what I think, my clients will cry bloody murder if emails leak out. I had one person recently google their email address, and found a link to an archive file that should have been private. I had removed all links to the archives, but somehow Google found it, indexed it, and the guy threatened me with bloody murder if I didn't take it down. Sheesh.
There's robots.txt, you know? If this is just about user outcry, then robots.txt will fix it (since all legitimate search engines honor it).
-Julian
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/cnulk%40scu.edu
Security Policy: http://wiki.list.org/x/QIA9
On Aug 31, 2009, at 1:15 PM, C Nulk wrote:
I am pretty sure allowing the raw email addresses to be available is going to go over like a lead balloon here. Anything (however minor)
to help protect the users/clients email addresses is helpful despite what others think. It is fine if someone considers the obfuscation that Mailman uses is trivial, however, anything I can do to make it
harder or more computationally time-invested to get the email address is better than giving it away. Sure bots are out there but if what I do helps slow down someones system to make them look at it (and hopefully get
rid of the bot), then great. But at least give me the choice to be
able to do it.
Agreed.
I happened to like Barry's (?) earlier comment about the "send me this message" link. Or maybe "send my message to the original poster" link where you can click on the link, compose your message, and send it through Mailman all without the original sender's address. Mailman or whatever process can figure out the original sender and pass on the
your message. Yes, I know it is more work that is why we have computers :)
The difficult part about the latter is that I hate web interfaces for
reading/composing email (Gmail included). I want to use my mail
reader for that!
As for using robots.txt, hmm, it is not the legitimate search
engines I care about, it is the search engines/crawlers that do not respect my robots.txt file that I care about. If I had an effective way to consistently identify those non-legitimate crawlers, I would add
what I needed to drop them into my firewall as I recognized them.
Agreed. -Barry
Barry Warsaw wrote:
On Aug 31, 2009, at 1:15 PM, C Nulk wrote:
As for using robots.txt, hmm, it is not the legitimate search
engines I care about, it is the search engines/crawlers that do not respect my robots.txt file that I care about. If I had an effective way to consistently identify those non-legitimate crawlers, I would add
what I needed to drop them into my firewall as I recognized them.Agreed.
The point in the original post about robots.txt was that if you think obfuscation is undesirable and don't do it, but you get complaints from people who find their unobfuscated addresses on your pages via legitimate search engines, you can use robots.txt to keep the search engines out.
However, robots.txt is not completely effective in this. You can use it to prevent Google from crawling your site or portions thereof, but it won't prevent Google from indexing your pages that it finds via external links. To prevent this, you need a <meta name="robots" content="noindex"> tag on the pages themselves.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
Barry Warsaw wrote:
On Aug 31, 2009, at 1:15 PM, C Nulk wrote:
As for using robots.txt, hmm, it is not the legitimate search
engines I care about, it is the search engines/crawlers that do not respect my robots.txt file that I care about. If I had an effective way to consistently identify those non-legitimate crawlers, I would add
what I needed to drop them into my firewall as I recognized them.Agreed.
The point in the original post about robots.txt was that if you think obfuscation is undesirable and don't do it, but you get complaints from people who find their unobfuscated addresses on your pages via legitimate search engines, you can use robots.txt to keep the search engines out.
I understood the original post and I agree.
However, robots.txt is not completely effective in this. You can use it to prevent Google from crawling your site or portions thereof, but it won't prevent Google from indexing your pages that it finds via external links. To prevent this, you need a <meta name="robots" content="noindex"> tag on the pages themselves.
I agree with you here.
The robots.txt and the "<meta" html header work great for search engines that respect those conventions. My point is that neither of them are effective for crawlers that do not respect the conventions. By putting raw email address in the archives without a means to obfuscate them simple hands over the addresses to those disreputable crawlers. And, if I was writing a web crawler to harvest email addresses, I am pretty sure I would ignore convention which stops me from getting what I want. BTW, I DON'T WRITE WEB CRAWLERS so no yelling at me. :)
It is those disreputable crawlers I was addressing in my comment - robots.txt and the "<meta" header are insufficient in that particular case.
Chris
Barry Warsaw wrote:
On Aug 31, 2009, at 1:15 PM, C Nulk wrote:
I am pretty sure allowing the raw email addresses to be available is going to go over like a lead balloon here. Anything (however minor) to help protect the users/clients email addresses is helpful despite what others think. It is fine if someone considers the obfuscation that Mailman uses is trivial, however, anything I can do to make it harder or more computationally time-invested to get the email address is better than giving it away. Sure bots are out there but if what I do helps slow down someones system to make them look at it (and hopefully get rid of the bot), then great. But at least give me the choice to be able to do it.
Agreed.
I happened to like Barry's (?) earlier comment about the "send me this message" link. Or maybe "send my message to the original poster" link where you can click on the link, compose your message, and send it through Mailman all without the original sender's address. Mailman or whatever process can figure out the original sender and pass on the your message. Yes, I know it is more work that is why we have computers :)
The difficult part about the latter is that I hate web interfaces for reading/composing email (Gmail included). I want to use my mail reader for that!
Actually, I had more of a mailto style link in mind that sends the message to the list (run by Mailman naturally) and as part of the body/subject include an encrypted form of the message id (providing it is unique). You would use your mail client to read/compose. Maybe something similar to a list's listname-bounces address but with the message id could be done. Don't know. Mailman would receive your message, decrypt the message id, look up the message, then forward your message to the original sender.
I am not particularly fond of web interfaces for reading/composing email. Well, maybe when I travel overseas without a laptop, then it is minimally okay.
As for using robots.txt, hmm, it is not the legitimate search engines I care about, it is the search engines/crawlers that do not respect my robots.txt file that I care about. If I had an effective way to consistently identify those non-legitimate crawlers, I would add what I needed to drop them into my firewall as I recognized them.
Agreed. -Barry
Now, totally off-topic, anyone have a recommendation for a book on learning Python so I am no longer truly dangerous, just slightly.
Thanks, Chris
On Aug 31, 2009, at 4:39 PM, C Nulk wrote:
Now, totally off-topic, anyone have a recommendation for a book on learning Python so I am no longer truly dangerous, just slightly.
There are zillions of books available now for learning Python (I think
there was only 1 when I first learned it 15 years ago :).
http://wiki.python.org/moin/PythonBooks
For various reasons, it's difficult for me to recommend one over the
other.
-Barry
--On 31 August 2009 10:15:43 -0700 C Nulk <CNulk@scu.edu> wrote:
I am pretty sure allowing the raw email addresses to be available is going to go over like a lead balloon here.
Here, too. Our site would probably deploy some other mailing list software.
Anything (however minor) to help protect the users/clients email addresses is helpful despite what others think.
All the published research evidence is that email address obfuscation helps a lot. At a University site, most student email addresses won't be published anywhere EXCEPT in our mailing list archives. That means that the best way for spammers to acquire student email addresses is to harvest their addresses from our list archives. Students get a lot less spam than academic staff whose addresses appear all over the place. So much so that everyone who's ever fallen foul of phishing here has been a staff member, despite being outnumbered 10:1 by students.
-- Ian Eiloart IT Services, University of Sussex 01273-873148 x3148 For new support requests, see http://www.sussex.ac.uk/its/help/
--On 29 August 2009 04:19:58 +0000 Julian Mehnle <julian@mehnle.net> wrote:
Bob Puff wrote:
That's the logical progression of that argument, and is the good reason why obfuscation or even removal of parts is not only a good idea, its a necessity. Exposing raw email addresses in their normal form is real low-hanging fruit.
Regardless of what I think, my clients will cry bloody murder if emails leak out. I had one person recently google their email address, and found a link to an archive file that should have been private. I had removed all links to the archives, but somehow Google found it, indexed it, and the guy threatened me with bloody murder if I didn't take it down. Sheesh.
There's robots.txt, you know? If this is just about user outcry, then robots.txt will fix it (since all legitimate search engines honor it).
But, the legitimate search engines aren't the problem. It's the harvesters, which probably don't honour robots.txt. If you prevent Google from indexing the archive, then you just hide the problem.
-Julian
-- Ian Eiloart IT Services, University of Sussex 01273-873148 x3148 For new support requests, see http://www.sussex.ac.uk/its/help/
the archives, but somehow Google found it, indexed it, and the guy threatened me with bloody murder if I didn't take it down.
Yes. It is critical to keep user perception in mind. Specifically, if you don't keep email addresses off the global search engines, there will be a deluge of vocal complaints from users who neither care about nor understand the technical aspects. That can be as simple as robots.txt configuration, or as fancy as using a captcha based system to reveal addresses like the one offered by reCaptcha. But my main point is you need to cover the user perception angle almost independtly from the core technical aspects of anti-harvesting.
For the record, I prefer keeping data as unadulterated as possible because it helps interoperability. But we also need to keep users happy.
-Jeff
On Aug 29, 2009, at 12:21 AM, Jeff Breidenbach wrote:
Yes. It is critical to keep user perception in mind. Specifically,
if you don't keep email addresses off the global search engines, there will
be a deluge of vocal complaints from users who neither care about nor
understand the technical aspects. That can be as simple as robots.txt
configuration, or as fancy as using a captcha based system to reveal addresses like
the one offered by reCaptcha. But my main point is you need to cover the user perception angle almost independtly from the core technical aspects of anti-harvesting.For the record, I prefer keeping data as unadulterated as possible
because it helps interoperability. But we also need to keep users happy.
Trust me, I'm keenly aware of this as I probably get 3x the nasty hate
mail that most of you get. I try to be nice and patient and that
usually calms people down. :)
Mailman will always still collect the raw data for messages sent to
the list. There are legitimate uses for allowing outsiders access to
that data (say, the list is moving and you want to migrate the
archives), so I think we always want to support this. The question is
how much if any of the raw data does the general public get access to?
-Barry
- On 31 Aug 2009, Barry Warsaw wrote:
Mailman will always still collect the raw data for messages sent to the list. There are legitimate uses for allowing outsiders access to that data (say, the list is moving and you want to migrate the archives), so I think we always want to support this. The question is how much if any of the raw data does the general public get access to?
It seems clear that there are legitimate use cases for raw archives, so I'll skip the justifications and just address how we can balance between transparency and security.
I'm going to embracing and extend something Barry suggested in private mail. He suggested a list setting that permits signed-in list subscribers to download raw archives if they have some 'archive-approved' status. What if that is a three-way switch: approved, unapproved, and blacklisted? New subscribers would always be unapproved. An unapproved subscriber who successfully posted to the list, clearing any approval mechanisms in place and subject to a list configuration option, would get approved for raw archive access. (Automatic posting-equals-approval would not be desirable for all lists, but it would for many.) An approved user could be blacklisted by moderator action or by an automated moderation filter. Coming off blacklist status would require manual action by the moderator.
And there could be a form in the application to request approval or de-blacklisting, of course.
-- -D. dgc@uchicago.edu NSIT University of Chicago
On Aug 31, 2009, at 4:48 PM, David Champion wrote:
I'm going to embracing and extend something Barry suggested in private mail. He suggested a list setting that permits signed-in list subscribers to download raw archives if they have some 'archive-approved' status. What if that is a three-way switch: approved, unapproved, and blacklisted? New subscribers would always be unapproved. An unapproved subscriber who successfully posted to the list, clearing any approval mechanisms in place and subject to a list configuration option, would get approved for raw archive access. (Automatic posting-equals-approval would not be desirable for all lists, but it would for many.) An approved user could be blacklisted by moderator action or by an automated moderation filter. Coming off blacklist status would require manual action by the moderator.
And there could be a form in the application to request approval or de-blacklisting, of course.
Launchpad's mailing lists have a very similar concept, although it's
not used for access to the archives. The concept there is called
"standing" and currently has four levels: excellent, good, poor, and
unknown. You start out with unknown standing, but after you prove
yourself (in much the same way as you describe above), you get to be
in good standing, which gives you other benefits, such as being able
to email a list you're not on without moderation. You can't get to
excellent standing on your own and there are currently no benefits of
excellent over good standing. Poor standing is much like your
blacklist idea.
The way I look at it is that Launchpad prototyped this concept and I
do think it could be useful in Mailman itself.
-Barry
On Aug 25, 2009, at 7:42 AM, skip@pobox.com wrote:
The other thing about Mailman's obfuscation is that I sorta think
that by now the spammers have figured it out. I mean, "skip at pobox.com"?
Come on. Even Barry stands a good chance of writing a regular expression
that can locate something like that, his self-deprecation about his r.e.
prowess notwithstanding. :-) If nothing else, all an enterprising spammer
would have to do is steal Mailman's email address matcher and replace "@"
with " at ". Oh, wait, it's open source. They wouldn't even have to steal
the code.
I've always wanted to re-architect the archives so that they would /
always/ vend the messages from an active process. I wouldn't have any
static files, except a cache for efficiency, and I would generate the
HTML on demand. My guess is that 99% of all archived messages are
never read by a human. The problem of course is spiders but I guess
they'll just warm up your cache. ;/
This would allow:
- easy redeployment of new obfuscation techniques
- on demand take downs or sanitization
- easy site regeneration for style changes.
-Barry
participants (10)
-
Barry Warsaw
-
Bob Puff
-
C Nulk
-
David Champion
-
Ian Eiloart
-
Jeff Breidenbach
-
Julian Mehnle
-
Mark Sapiro
-
Rich Kulawiec
-
skip@pobox.com