Summary: Spammers now have so many ways of "harvesting" addresses from so
many systems, and so many ways of exchanging those with each other, that
any email address which is actually used WILL eventually be harvested.
(Where what "eventually" means varies widely, of course, but can be
expected to steadily decrease.) Pretending that address obfuscation
in mailing list [or newsgroup] archives will have any meaningful effect
on this process gives users a false sense of security and has zero
Summary of summary: It's pointless.
Explanation: Spammers maintain extensive databases of email addresses.
Some of those databases are merely lists of addresses; others are
more sophisticated and include data such as "harvested-date",
"havesting-method", "last-seen date", "last-seen-context",
"last-known-valid date" and more. Some of these databases are private;
others are available for sale/lease. Some are maintained by spammers
themselves, others by spammer support services don't directly engage
The harvesting engines used to acquire email addresses are myriad, as
are the methods by which spammers acquire the raw data to use as input
to them. *Some* of those methods, and there are many more, include:
- subscribing to mailing lists
- acquiring Usenet news (NNTP) feeds
- querying mail servers
- acquiring corporate email directories
- insecure LDAP servers
- insecure AD servers
- use of backscatter/outscatter
- use of auto-responders
- use of mailing list mechanisms
- use of abusive "callback" mechanisms
- dictionary attacks
- construction of plausible addresses (e.g. "firstname.lastname")
- purchase of addresses in bulk on the open market.
- purchase of addresses from vendors, web sites, etc.
- purchase of addresses from registrars, ISPs, web hosts, etc.
- domain registration (some registrars ARE spammers) 
- misplaced/lost/sold media (hard disk, tape, CD, DVD,
USB stick, etc.)
and perhaps most significantly:
- harvesting of the mail, address books and any other files
present on any of the hundreds of millions of compromised
Windows systems 
Consider for example: the first time a newly-created address is used
by someone (who is sending a message to it), it's now present on their
system: in their saved outbound mail, or perhaps in their address book
(if they have one), or in some cache. Any sensible malware resident on
their system will of course pick it up and eventually hand it over to a
harvesting agent. (Competent malware will harvest it in real time *and*
associate it with the sender's address.)
And if that particular system happens to be clean? Doesn't help much,
because the more times that address is used, the more systems it's
present on. And the more systems it's present on, the greater the
probability that one of them is already compromised or will be soon.
Thus even if we eliminate the originating end-user system as a possible
source, we still have to consider the outbound mail server used by that
end-user system, which is also a candidate for compromise. And then
the inbound mail server used by the recipient, and then the recipient
end-user system. And if there's some filtering appliance or intermediate
system in place at either end, then it's there too. If the message
is forwarded to a third party, then another set of systems is in play.
If mail server logs are rolled up and moved to some central location,
then it's there too. If backups are made, then it's present there,
and subsequently may be present on any system where the backups are
read/restored. And finally, if the destination of a mail message isn't
an individual user, but an entire mailing list, then we must multiply
the number of possible harvesting points by at least the number of people
on the mailing list plus a factor for mail servers/gateways/filters/etc.
(modulo overlaps). This in turns means that messages to sent to lists
of any appreciable size (say, 1000 members) will turn up on considerably
more than 1000 systems -- and the chances that all 1000-plus are secure
Please note that the previous paragraph's recitation only covered the last
vector I enumerated in the [indented] list above: compromised systems.
That laundry list of methods also affords many other opportunities for
addresses to find their way into spammers' hands. As just one pointed
example out of a great many more that could be cited: how do you know that
the address user(a)example.com which has just subscribed to the list you
run is a real person and not just the front-end for an address-harvester
that will pick up every address used to send traffic to the list?
And so on. There are far too many others to enumerate, all of which
have discussed at great length in anti-spam forums for many years, and are
depressingly familiar to experienced practitioners working in the field.
The bottom line is that any email address which is actually used ,
*especially* any email address used to send traffic to a mailing list,
is going to be harvested. It's only a matter of when, not if, and "when"
is getting sooner all the time.
Incidentally, everyone (including me) can produce anecdotal tales of
addresses that have remained surprisingly under-targeted by spammers
over long periods of time. But this is clearly not the way to bet: it
is in spammers' interests to ferret out as many addresses as possible
and to use them as soon and as often as possible. Note, however,
that some addresses are *deliberately* un-/under-targeted, so lack of
substantial spam traffic to a given address is NOT an indicator that the
address hasn't been harvested. That's because along with target lists,
spammers maintain "suppression" lists, which they use to avoid hitting
the addresses of people they think are likely to cause issues for them. 
And obviously, people with postmaster or mailing list roles would be
good candidates for membership on those lists. I know that if I were in
their shoes, I'd add everyone who's ever sent a message to the mailman-*
mailing lists to mine: a quick check indicates that it's on the order
of only 10K addresses. Skipping those would be inconsequential when
sending spam to a few hundred million addresses, and I trust it's
obvious why spammers would benefit from doing so.
With all this in mind, it's clearly pointless to pretend that address
obfuscation in archives provides any protection at all.  It would be
better to remove the code entirely than to continue to maintain the
facade that it actually has any anti-spam value. Everyone should simply
presume that all email addresses are in the hands of spammers and prepare
defenses accordingly -- because even if that's not quite true yet, it will
be soon enough.
 I deliberately didn't mention mass WHOIS queries. While some efforts
in this direction were made by spammers years ago, they've found it far
more efficient and cost-effective to simply buy WHOIS data in bulk.
There's always someone who wants to sell, and a CD/DVD or USB stick
will suffice. This is why attempts by registrars to rate-limit queries
or restrict access are not only foolish, but disengenuous: spammers
already *have* the data, and can acquire updates at will, and they
are clearly doing so via processes that lead back to registrars themselves.
 The exact number of such systems is not only unknown, but unknowable,
since any compromised system which (a) doesn't make its presence
known (b) to a suitable detector will remain undetected indefinitely.
However, two things are clear: (1) any estimate under 100 million should
be laughed out of the room, and (2) there is no reason to suspect that
the number is decreasing, and there are numerous reasons to suspect that
it's increasing. Note, incidentally, that some detectors have reported
observing 200,000 new such systems in a single day; and further note that
it's now quite routine for individual botnets with several million
*known* members to turn up.
 Addresses which aren't used may remain out of spammer view for
considerable time, depending on the care with which they're selected
and maintained. However, this obviously excludes addresses used for
participation in mailing lists.
 For the purpose of this discussion, I'm just talking about
suppression lists which enumerate individual email addresses. It's
well-known that spammers also maintain suppression lists of MX's, domains,
network allocations, ASNs, etc., in an attempt to avoid hitting
spamtraps and/or hitting the mailboxes of those who might be in a position
to file complaints or take action against them.
 The only people left who are impeded in the slightest by obfuscation
code are NON-spammers: that is, people who are trying to contact someone
who has previously sent a message to some mailing list.
Am looking for an experienced mailman developer to hire for a few small
installations/upgrade to our deployment, including: personalized
header/footer fields (using qmail injection), attachment issue for HTML
messages, and a few others.
--On 24 August 2009 13:15:03 -0500 "Hopkins, Justin"
> Thanks for such a detailed and compelling post..but I must disagree. I
> can't refute any of the arguments you made, they are all quite sound, but
> I do take issue with your conclusion.
> Obfuscating the email addresses is just a part of 'defense in depth' -
> same as patching your computer, using a firewall, etc. Each layer, no
> matter how thin, still adds something.
Quite right. Rich's argument is, essentially, that obfuscation isn't 100%
effective so it shouldn't be used. Frankly, if it's 10% effective, then
it's worth doing in my view.
Further, Rich offers no evidence of significant harm done by obfuscation.
Finally, there are other privacy concerns than spam harvesting that may
also be mitigated by address obfuscation.
IT Services, University of Sussex
For new support requests, see http://www.sussex.ac.uk/its/help/
Justin Hopkins writes:
> Obfuscating the email addresses is just a part of 'defense in
> depth' - same as patching your computer, using a firewall,
> etc. Each layer, no matter how thin, still adds something.
That's true. Rich's argument is more subtle than a claim that
obfuscation is worth nothing, though. It is that benefits to
obfuscation are small, and the cost is significantly larger than the
benefit. You have to address the issue of the cost (obfuscating the
address obstructs legitimate third-party users) as well.
Note that the other strategies you mention -- patches, firewalls, etc
-- do not impose costs on third parties, only on you.
Personally, I subscribe to Rich's argument. I do not obfuscate my own
addresses, and I argue against it when I have input into policy for
processes like archiving mailing list posts. But Mailman needs to
serve people who have different cost/benefit tradeoffs than Rich and I
do -- I agree with you and Bernd that Mailman should provide the
facility (though I would advise against relying on it, and generally
deprecate its use, myself).
I am happy to announce the release of the third alpha version of
Mailman 3, code named "Working Man".
This is primarily a preview release so that developers and other
interested people can download the code and participate in Mailman 3's
further development. I believe we are on track for a final release by
the end of the year, and your contributions of code, feedback,
documentation, etc. will be welcome and appreciated!
Please note that this is an alpha release and as such is not ready for
You can get the code from the Cheeseshop:
Mailman 3 is buildout based and requires Python 2.6. To build it, run
this after unpacking the tarball and cd'ing into it
% python bootstrap.py
From there you can run the tests
and build the documentation
Highlights in this release include the start of a REST admin server
for integrating Mailman with external web sites, a combined bin/
mailman uber-command, configuration now done through ini-files using
lazr.config, and better LMTP support.
On Sat, Aug 8, 2009 at 1:25 AM, Malveeka Tewari<malveeka(a)gmail.com> wrote:
> I am working on writing a storm based member adaptor for mailman so that the
> mailman membership data can be stored in a database instead of .pck files.
> The reason for choosing storm is that it provides an abstraction to use any
> underlying db language- either mysql, postgresql or sqllite.
Hi, I'm one of the GSOC mentors for Systers, all of whose students are
working on Mailman enhancements. Some of these (Like Malveeka's) would
be suitable for merging upstream, but the review and acceptance
process is not clear.
* Do people on this list regularly provide review of proposed
* Is there a set of people who explicitly ACK or NAK all proposed patches?
* Once reviewed, merging is via LP merge requests, right? Or does one
do the merge request and *then* get reviewed?
* More fundamentally, is Mailman currently accepting patches from
We have code to contribute. It's not perfect yet but we will make it
better. We need some Official Mailman participation or our
improvements will never make it out of our branch. Even a response of
"we're not interested in X, thanks" would be much superior to no
response at all.
Thanks -- Regards -- Andy
Args, I knew I forgot sth, I should have selected MIME digest :)
On Thu, Aug 06, 2009 at 22:19:30 -0700, Jordan Hayes wrote:
> Here's my solution:
Nice solution Jordan, but I think about a pythonic way to fully
integrate searchable archives into MM. I was using the debian archive
search only as an example for what can be achieved on big archives.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
I am working on writing a storm based member adaptor for mailman so that the
mailman membership data can be stored in a database instead of .pck files.
The reason for choosing storm is that it provides an abstraction to use any
underlying db language- either mysql, postgresql or sqllite.
I have created a branch on launchpad for this adaptor and would be great if
I can get a code review of the adaptor ,t he schema of the db that I am
using for storing memberships data and an opinion of whether such a member
adaptor is desired at all. Presently the storm adaptor is written for
Postgresql but it will support mysql and sqllite with slight modifications.
Please take a look at *Mailman/PgsqlMemberships.py* at *
https://code.launchpad.net/~malveeka/+junk/StormMemberAdaptor* under the
The work is under progress and for now, I using to storm to create a
database with memberships data alongwith the pickle files. The
Mailman/PgsqlMemberships.py subclasses OldStyleMemberships.py and still uses
the accessor methods in OldStyleMemberships instead of accessing the data
from the postgres db.
The ideal case would be to use only a database and no pickle files for
Memberships data but I have not reached there.
I had tried to read and use the data from the database instead of pickle
files and that had broken my Mailman which leaves me with few questions
In OldStyleMemberships.py the lower cased email address is used as a key for
accessing the membership properties.
However in my schema, I am using the (listname, case preserved email
address) as the PK.
Is it possible that not storing and using LCE as a key might break
I also want to make sure that in the database I am caturing all the
Memberships data. Presently my database uses the following class as a storm
abstraction for the database. Do I need to add/remove anything?
__storm_table__ = "mailman_test"
__storm_primary__ = "listname","address"
listname = Unicode()
address = Unicode()
password = Unicode()
lang = Unicode()
name = Unicode()
digest = Unicode()
delivery_status = Int()
user_options = Int()
topics_userinterest = Unicode()
bounce_info = Unicode()
delivery_status_timestamp = Unicode()
Looking forward to your reviews and finally getting the thing in place!!
Thanks a lot all
*shimmied over to mailman-devs* (from -users)
On Fri, Aug 07, 2009 at 10:33:06AM -0400, Barry Warsaw wrote:
> It kind of sucks that there are so many other Mailman command line
> scripts, which is one reason why I've always put them in a separate
> Mailman specific bin directory. With MM3 though I intend to use a
> 'subcommand' approach so that there's only one 'mailman' command.
> Think things like 'mailman listmembers foo'. I'll probably keep
> mailmanctl separate though I haven't decided about that yet.
Would it too much to ask for the 'old' (read 'current') scripts/commands
to be aliased: at least in the first few releases?
e.g., for 'list_members' to link to 'mailman listmembers'
Perhaps as an install/configure/compilation option?
"Create old-style links?"
(maybe that's more for people who package: ISTR a couple of packagers
are on list, yes?)
(wonders how many other people have scripts that would need re-writing