Proposed: remove address-obfuscation code from Mailman 3
Summary: Spammers now have so many ways of "harvesting" addresses from so many systems, and so many ways of exchanging those with each other, that any email address which is actually used WILL eventually be harvested. (Where what "eventually" means varies widely, of course, but can be expected to steadily decrease.) Pretending that address obfuscation in mailing list [or newsgroup] archives will have any meaningful effect on this process gives users a false sense of security and has zero anti-spam value.
Summary of summary: It's pointless.
Explanation: Spammers maintain extensive databases of email addresses. Some of those databases are merely lists of addresses; others are more sophisticated and include data such as "harvested-date", "havesting-method", "last-seen date", "last-seen-context", "last-known-valid date" and more. Some of these databases are private; others are available for sale/lease. Some are maintained by spammers themselves, others by spammer support services don't directly engage in spamming.
The harvesting engines used to acquire email addresses are myriad, as are the methods by which spammers acquire the raw data to use as input to them. *Some* of those methods, and there are many more, include:
- subscribing to mailing lists
- acquiring Usenet news (NNTP) feeds
- querying mail servers
- acquiring corporate email directories
- insecure LDAP servers
- insecure AD servers
- use of backscatter/outscatter
- use of auto-responders
- use of mailing list mechanisms
- use of abusive "callback" mechanisms
- dictionary attacks
- construction of plausible addresses (e.g. "firstname.lastname")
- purchase of addresses in bulk on the open market.
- purchase of addresses from vendors, web sites, etc.
- purchase of addresses from registrars, ISPs, web hosts, etc.
- domain registration (some registrars ARE spammers) [1]
- misplaced/lost/sold media (hard disk, tape, CD, DVD,
USB stick, etc.)
and perhaps most significantly:
- harvesting of the mail, address books and any other files
present on any of the hundreds of millions of compromised
Windows systems [2]
Consider for example: the first time a newly-created address is used by someone (who is sending a message to it), it's now present on their system: in their saved outbound mail, or perhaps in their address book (if they have one), or in some cache. Any sensible malware resident on their system will of course pick it up and eventually hand it over to a harvesting agent. (Competent malware will harvest it in real time *and* associate it with the sender's address.)
And if that particular system happens to be clean? Doesn't help much, because the more times that address is used, the more systems it's present on. And the more systems it's present on, the greater the probability that one of them is already compromised or will be soon.
Thus even if we eliminate the originating end-user system as a possible source, we still have to consider the outbound mail server used by that end-user system, which is also a candidate for compromise. And then the inbound mail server used by the recipient, and then the recipient end-user system. And if there's some filtering appliance or intermediate system in place at either end, then it's there too. If the message is forwarded to a third party, then another set of systems is in play. If mail server logs are rolled up and moved to some central location, then it's there too. If backups are made, then it's present there, and subsequently may be present on any system where the backups are read/restored. And finally, if the destination of a mail message isn't an individual user, but an entire mailing list, then we must multiply the number of possible harvesting points by at least the number of people on the mailing list plus a factor for mail servers/gateways/filters/etc. (modulo overlaps). This in turns means that messages to sent to lists of any appreciable size (say, 1000 members) will turn up on considerably more than 1000 systems -- and the chances that all 1000-plus are secure are microscopic.
Please note that the previous paragraph's recitation only covered the last vector I enumerated in the [indented] list above: compromised systems. That laundry list of methods also affords many other opportunities for addresses to find their way into spammers' hands. As just one pointed example out of a great many more that could be cited: how do you know that the address user@example.com which has just subscribed to the list you run is a real person and not just the front-end for an address-harvester that will pick up every address used to send traffic to the list?
And so on. There are far too many others to enumerate, all of which have discussed at great length in anti-spam forums for many years, and are depressingly familiar to experienced practitioners working in the field.
The bottom line is that any email address which is actually used [3], *especially* any email address used to send traffic to a mailing list, is going to be harvested. It's only a matter of when, not if, and "when" is getting sooner all the time.
Incidentally, everyone (including me) can produce anecdotal tales of addresses that have remained surprisingly under-targeted by spammers over long periods of time. But this is clearly not the way to bet: it is in spammers' interests to ferret out as many addresses as possible and to use them as soon and as often as possible. Note, however, that some addresses are *deliberately* un-/under-targeted, so lack of substantial spam traffic to a given address is NOT an indicator that the address hasn't been harvested. That's because along with target lists, spammers maintain "suppression" lists, which they use to avoid hitting the addresses of people they think are likely to cause issues for them. [4] And obviously, people with postmaster or mailing list roles would be good candidates for membership on those lists. I know that if I were in their shoes, I'd add everyone who's ever sent a message to the mailman-* mailing lists to mine: a quick check indicates that it's on the order of only 10K addresses. Skipping those would be inconsequential when sending spam to a few hundred million addresses, and I trust it's obvious why spammers would benefit from doing so.
With all this in mind, it's clearly pointless to pretend that address obfuscation in archives provides any protection at all. [5] It would be better to remove the code entirely than to continue to maintain the facade that it actually has any anti-spam value. Everyone should simply presume that all email addresses are in the hands of spammers and prepare defenses accordingly -- because even if that's not quite true yet, it will be soon enough.
Notes:
[1] I deliberately didn't mention mass WHOIS queries. While some efforts in this direction were made by spammers years ago, they've found it far more efficient and cost-effective to simply buy WHOIS data in bulk. There's always someone who wants to sell, and a CD/DVD or USB stick will suffice. This is why attempts by registrars to rate-limit queries or restrict access are not only foolish, but disengenuous: spammers already *have* the data, and can acquire updates at will, and they are clearly doing so via processes that lead back to registrars themselves.
[2] The exact number of such systems is not only unknown, but unknowable, since any compromised system which (a) doesn't make its presence known (b) to a suitable detector will remain undetected indefinitely. However, two things are clear: (1) any estimate under 100 million should be laughed out of the room, and (2) there is no reason to suspect that the number is decreasing, and there are numerous reasons to suspect that it's increasing. Note, incidentally, that some detectors have reported observing 200,000 new such systems in a single day; and further note that it's now quite routine for individual botnets with several million *known* members to turn up.
[3] Addresses which aren't used may remain out of spammer view for considerable time, depending on the care with which they're selected and maintained. However, this obviously excludes addresses used for participation in mailing lists.
[4] For the purpose of this discussion, I'm just talking about suppression lists which enumerate individual email addresses. It's well-known that spammers also maintain suppression lists of MX's, domains, network allocations, ASNs, etc., in an attempt to avoid hitting spamtraps and/or hitting the mailboxes of those who might be in a position to file complaints or take action against them.
[5] The only people left who are impeded in the slightest by obfuscation code are NON-spammers: that is, people who are trying to contact someone who has previously sent a message to some mailing list.
---Rsk
Thanks for such a detailed and compelling post..but I must disagree. I can't refute any of the arguments you made, they are all quite sound, but I do take issue with your conclusion.
Obfuscating the email addresses is just a part of 'defense in depth' - same as patching your computer, using a firewall, etc. Each layer, no matter how thin, still adds something.
Cheers, Justin
On Mon, Aug 24, 2009 at 10:37 -0400, Rich Kulawiec wrote:
Summary: Spammers now have so many ways of "harvesting" addresses from so many systems, and so many ways of exchanging those with each other, that any email address which is actually used WILL eventually be harvested. (Where what "eventually" means varies widely, of course, but can be expected to steadily decrease.) Pretending that address obfuscation in mailing list [or newsgroup] archives will have any meaningful effect on this process gives users a false sense of security and has zero anti-spam value.
Just a little sample: usually I obfuscated addresses in my .signature. Due to the same arguments you are elaborating I used an unobfuscated one for 3 days. Now this address is contaminated and I'm refraining to obfuscating.
my 2¢ Siggy
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org+ |53 days until|Open Source in Northern Germany: www.free-it.org| |www.Ubucon.de| tech contact: bsb-at-free-dash-it-dot-de| +-------> ceterum censeo javascriptum esse restrictam <--------+
Rich Kulawiec writes:
Pretending that address obfuscation in mailing list [or newsgroup] archives will have any meaningful effect on this process gives users a false sense of security and has zero anti-spam value.
You're missing the point. Our (often non-technical) users demand this feature. Even our technical audience (see Siggy's parallel post for example) perceives benefits from obfuscation, based on empirical tests.
So you can explain why, in theory and in practice, obfuscation doesn't work. But the user base will (stubbornly, if you like) refuse to accept your logic.
On Aug 25, 2009, at 1:35 AM, Stephen J. Turnbull wrote:
Rich Kulawiec writes:
Pretending that address obfuscation in mailing list [or newsgroup] archives will have any meaningful effect on this process gives users a false sense of security and has zero anti-spam value.
You're missing the point. Our (often non-technical) users demand this feature. Even our technical audience (see Siggy's parallel post for example) perceives benefits from obfuscation, based on empirical
tests.So you can explain why, in theory and in practice, obfuscation doesn't work. But the user base will (stubbornly, if you like) refuse to accept your logic.
As usual, Stephen hits the nail on the head.
I can't disagree with much in Rich's post, and yet it's likely that
we'll still obfuscate and/or conceal email addresses in the archives
because users will demand it. You can and should educate them, but
this is not a battle I wish to fight because I think we can't win it.
The costs of obfuscation are 1) increased code complexity; 2) denying
legitimate third party uses. 1) is not insignificant. Regexp filters
are tricky/impossible to get 100% right, but not too bad to get maybe
90% right. They are low fidelity because scanning headers isn't
enough; people embed email addresses in all kinds of weird places in
the body and HTML filtering is brain hurty. Obfuscation techniques
will be busted so only concealment is future proof. This is all
pretty boring coding though.
- is more interesting. What kinds of uses are we talking about? You
see a message in an archive from three years ago and you want to
contact the OP about it? Why not just follow up and contact the
mailing list? IOW, if there was an easy way to inject yourself into
an old thread, perhaps one that was created before you joined the
list, wouldn't that cover a large part of the use case?
Do you want to be contacted off-list for on-list topics? Well, things
like an email forwarding service could solve that, although I think
it's not worth the effort as much as the first use case. What other
kinds of legitimate third party uses does obfuscation/concealment
prevent?
-Barry
Barry Warsaw writes:
- is more interesting. What kinds of uses are we talking about? You
see a message in an archive from three years ago and you want to
contact the OP about it? Why not just follow up and contact the
mailing list?
For all the reasons why Reply-To Munging Considered Harmful.
Do you want to be contacted off-list for on-list topics? Well, things
like an email forwarding service could solve that, although I think
it's not worth the effort as much as the first use case. What other
kinds of legitimate third party uses does obfuscation/concealment
prevent?
Obfuscation is a minor annoyance, but concealment is problematic in cases where the email is the identity, eg, matching list posts to issue tracker IDs.
For example, I signed up for and log in to Launchpad as "stephen@xemacs.org", but I have to tell bzr that my ID is "stephen-xemacs". Wow, that's transparent. But at least it's guessable. Getting from "Stephen J. Turnbull <email concealed>" to "stephen-xemacs" is not going to be easy if you don't already know me.
On Aug 25, 2009, at 8:30 AM, Stephen J. Turnbull wrote:
- is more interesting. What kinds of uses are we talking about?
You see a message in an archive from three years ago and you want to contact the OP about it? Why not just follow up and contact the mailing list?For all the reasons why Reply-To Munging Considered Harmful.
What I'm thinking is that there should be a "send me this message"
link in the archive, which gets you a copy as it was originally sent
to the list. That let's you jump into a conversation as if you'd been
there originally.
Something like this would be cool for another reason. Assuming you
could trust the long term storage at the archive site (enough) it
would eliminate the last reason why I locally archive any public
mailing list messages.
Do you want to be contacted off-list for on-list topics? Well,
things like an email forwarding service could solve that, although I think it's not worth the effort as much as the first use case. What other kinds of legitimate third party uses does obfuscation/concealment prevent?Obfuscation is a minor annoyance, but concealment is problematic in cases where the email is the identity, eg, matching list posts to issue tracker IDs.
For example, I signed up for and log in to Launchpad as "stephen@xemacs.org", but I have to tell bzr that my ID is "stephen-xemacs". Wow, that's transparent. But at least it's guessable. Getting from "Stephen J. Turnbull <email concealed>" to "stephen-xemacs" is not going to be easy if you don't already know me.
True. -Barry
On Fri, Aug 28, 2009 at 18:03 -0400, Barry Warsaw wrote:
What I'm thinking is that there should be a "send me this message" link in the archive, which gets you a copy as it was originally sent to the list. That let's you jump into a conversation as if you'd been there originally.
Another use case comes up when coming back from temporarily disabled delivery where you want to participate in an ongoing discussion. I've always dreamed of a ml-request@listdomain function that retransmits any messages in References to me. It's clear that MM has to delegate this to the archiver.
Something like this would be cool for another reason. Assuming you could trust the long term storage at the archive site (enough) it would eliminate the last reason why I locally archive any public mailing list messages.
... indicating your internet connection is by orders of magnitude better than mine :)
To get on topic again: regarding address obfuscation in the archives, I noted:
- obfuscate by default,
- the archive admin may choose not to obfuscate but this fact will be stated clearly on every archive page à la: Email addresses are visible per choice of mailto:archiv-owner.
Regards Siggy
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org+ |48 days until|Open Source in Northern Germany: www.free-it.org| |www.Ubucon.de| tech contact: bsb-at-free-dash-it-dot-de| +-------> ceterum censeo javascriptum esse restrictam <--------+
On Aug 29, 2009, at 1:10 AM, Bernd Siggy Brentrup wrote:
On Fri, Aug 28, 2009 at 18:03 -0400, Barry Warsaw wrote:
What I'm thinking is that there should be a "send me this message" link in the archive, which gets you a copy as it was originally sent to the list. That let's you jump into a conversation as if you'd been there originally.
Another use case comes up when coming back from temporarily disabled delivery where you want to participate in an ongoing discussion. I've always dreamed of a ml-request@listdomain function that retransmits any messages in References to me. It's clear that MM has to delegate this to the archiver.
I dream of a 'vacation' setting where you could tell Mailman the start
and end dates of your "delivery stop" and then those messages would
just be forwarded to you (perhaps as a digest) upon your return.
Almost exactly like what the US Post Office does IRL.
Something like this would be cool for another reason. Assuming you could trust the long term storage at the archive site (enough) it would eliminate the last reason why I locally archive any public mailing list messages.
... indicating your internet connection is by orders of magnitude better than mine :)
And yet, it's never enough! :)
To get on topic again: regarding address obfuscation in the archives, I noted:
- obfuscate by default,
- the archive admin may choose not to obfuscate but this fact will be stated clearly on every archive page à la: Email addresses are visible per choice of mailto:archiv-owner.
Yep, something like that.
-Barry
Barry Warsaw writes:
What I'm thinking is that there should be a "send me this message"
link in the archive, which gets you a copy as it was originally sent
to the list. That let's you jump into a conversation as if you'd been
there originally.
I don't understand. Do you mean the raw message received by the list, or the processed message as distributed by the list? The former means you don't have RFC 2369 headers, etc. I'm not sure I understand what the efficacy of the latter is; does address-munging happen only in the archives? I find it hard to believe that could be at all effective, except for what I would think is an unusual case (a closed-subscription list with public archives).
On Aug 29, 2009, at 3:01 AM, Stephen J. Turnbull wrote:
Barry Warsaw writes:
What I'm thinking is that there should be a "send me this message" link in the archive, which gets you a copy as it was originally sent to the list. That let's you jump into a conversation as if you'd
been there originally.I don't understand. Do you mean the raw message received by the list, or the processed message as distributed by the list? The former means you don't have RFC 2369 headers, etc. I'm not sure I understand what the efficacy of the latter is; does address-munging happen only in the archives? I find it hard to believe that could be at all effective, except for what I would think is an unusual case (a closed- subscription list with public archives).
Yes, address munging only happens in the HTML archives and in the
outgoing queue processor. Mailman keeps a copy of the raw received
message which for MM2 is only in the mbox file, but for MM3 will be in
a "message store".
Let's say I just joined the XEmacs development mailing list after a
long absence. I find a message in the archive from two years ago that
is relevant to an issue I'm having. I'd like to follow up to that
message using my normal mail toolchain, but I found the archive page
through Google. I should be able to click on a link on that page,
enter my email address (perhaps through some validation dance, or
subject to a request governor) and then the message -- as it was
originally copied to the list membership -- would show up in my inbox,
exactly as if I were a list member at the time.
Now I can hit 'reply' and inject myself seamlessly into that 2 year
old thread.
-Barry
Barry Warsaw wrote:
Let's say I just joined the XEmacs development mailing list after a long absence. I find a message in the archive from two years ago that is relevant to an issue I'm having. I'd like to follow up to that message using my normal mail toolchain, but I found the archive page through Google. I should be able to click on a link on that page, enter my email address (perhaps through some validation dance, or subject to a request governor) and then the message -- as it was originally copied to the list membership -- would show up in my inbox, exactly as if I were a list member at the time.
Now I can hit 'reply' and inject myself seamlessly into that 2 year old thread.
As long as the mailing list name/address hasn't migrated/changed in the interim...
...perhaps the original message munged to ensure current accuracy of the to/cc/reply-to fields?
-Dale
On Aug 31, 2009, at 3:00 PM, Dale Newfield wrote:
Barry Warsaw wrote:
Let's say I just joined the XEmacs development mailing list after a
long absence. I find a message in the archive from two years ago
that is relevant to an issue I'm having. I'd like to follow up to
that message using my normal mail toolchain, but I found the
archive page through Google. I should be able to click on a link
on that page, enter my email address (perhaps through some
validation dance, or subject to a request governor) and then the
message -- as it was originally copied to the list membership --
would show up in my inbox, exactly as if I were a list member at
the time. Now I can hit 'reply' and inject myself seamlessly into that 2 year
old thread.As long as the mailing list name/address hasn't migrated/changed in
the interim...
Good point.
...perhaps the original message munged to ensure current accuracy of
the to/cc/reply-to fields?
Not sure I understand; can you elaborate?
-Barry
Barry Warsaw wrote:
Now I can hit 'reply' and inject myself seamlessly into that 2 year old thread.
As long as the mailing list name/address hasn't migrated/changed in the interim...
Good point.
...perhaps the original message munged to ensure current accuracy of the to/cc/reply-to fields?
Not sure I understand; can you elaborate?
We can tell from a mailing list's configuration what the distribution address should be, but I guess we don't know what previous addresses it had, so it's not as simple as I was thinking to do this munging (I was thinking just a search/replace).
Maybe the appropriate modifications from the original message would be to add as a "To" address the current list address iff it does not appear in the To or CC addresses in the archived message (and to re-set ReplyTo, if reply-to-munging is set).
-Dale
On Aug 31, 2009, at 4:41 PM, Dale Newfield wrote:
Maybe the appropriate modifications from the original message would
be to add as a "To" address the current list address iff it does not
appear in the To or CC addresses in the archived message (and to re- set ReplyTo, if reply-to-munging is set).
That seems reasonable.
-Barry
Barry Warsaw writes:
Let's say I just joined the XEmacs development mailing list after a
long absence.
Hey, welcome back! Do you plan to return to Supercite maintenance?<wink>
I find a message in the archive from two years ago that is relevant to an issue I'm having. I'd like to follow up to that message using my normal mail toolchain, but I found the archive page through Google.
Sure, that's a valid use case. I'm not sure that it couldn't be handled by an appropriate mailto URL, though. And I suspect it's less common than the case of private messages (no evidence, just introspection).
On Tue, Aug 25, 2009 at 06:39:29AM -0400, Barry Warsaw wrote:
So you can explain why, in theory and in practice, obfuscation doesn't work. But the user base will (stubbornly, if you like) refuse to accept your logic.
As usual, Stephen hits the nail on the head.
I can't disagree with much in Rich's post, and yet it's likely that
we'll still obfuscate and/or conceal email addresses in the archives
because users will demand it. You can and should educate them, but this is not a battle I wish to fight because I think we can't win it.
I've thought this over for quite some time (obviously), and have done some homework elsewhere to ascertain whether both Stephen's and your (Barry's) comments are accurate. They are. Very much so.
There now exists a "cargo cult" mentality which insists that obfuscation has some anti-spam/security value, in spite of overwhelming evidence and experience that conclusively proves it has none whatsoever.
(As an aside, not to either of you but in response to other comments in the thread, I'm well aware of the concept of defense-in-depth and practiced it years before the term became common. But for any measure to be part of defense-in-depth, it must first qualify as a defense, albeit perhaps a weak or half-hearted one. Address obfuscation obviously fails to clear this bar, even as low as it's set.)
I don't know how to dispell this widely-shared delusion. It may not be possible, at least in the near future. And it's probably not the role of Mailman's (or any other software package's) developers to tackle this issue; there's only so much policy that can be promulgated by code.
I think perhaps the best that can be done is to insert a statement in Mailman's documentation indicating that this measure is provided for people who want to use it, but that it really has zero value. Whether or not y'all want to do that is of course up to you, but I think at least a nod to reality in the documentation might get some of the better mail system admins to at least start thinking about the issue. And maybe that's the best that can be done for now.
---Rsk
participants (7)
-
Barry Warsaw
-
Barry Warsaw
-
Bernd Siggy Brentrup
-
Dale Newfield
-
Hopkins, Justin
-
Rich Kulawiec
-
Stephen J. Turnbull