On Mon, Feb 25, 2002 at 12:27:23PM -0500, Barry A. Warsaw wrote:
I /think/ I've caught up on this thread, but I'm sure I've missed a bunch. As I see it there are really these issues to protecting email addresses in Mailman:
- list admin addresses
- public archives
- private archives
- raw archive
- list rosters
I believe you've synopsized it correctly, yes.
For #1, MM2.1 changes what gets included at the bottom of list pages. The admin's personal address is no longer included in the link's text or in mailto: href. In the mailto: you'll see something like email@example.com and in the text you'll see something like "barry at zope.com". I see no point in trying to obscure the former -- or put it behind a web form -- because it's easily guessed given a probe of existing lists, as is every other list-related email address. More on protecting the -owner from spam below. I claim that the guessability is a feature, btw.
And, of course, if it *will* degrade, then address-snarfers will figure out how to *make* it degrade, so it's not worth doing in the first place, at least not for *that* reason.
MM3 will likely integrate admin addresses and list memberships into an object called a "roster" (essentially just a list of email addresses). This will let us define a pipeline for each roster, which could include a spam filter that performs an action based on some criteria (e.g. drop it, reject it, mark a header, etc.). So we can do more protection on the -owner address than we can do now (without hacking).
I do see one problem here, and I don't know if you already address it below. [ looks ] You don't; it's this: if the list-owner addresses go through the MM machinery, as well, then they too can die if MM crashes the wrong way.
This implies, as I believe has already been discussed, that the *server* admin address must be publicly accessible, not be piped into MailMan at all, and preferably, should actually not even be handled by the same machine... ("Single point of failure")
Rosters and the improved user database will allow us to
actually equate admin email addresses with Real Names, so you could conceivably see something like
List run by <a href="mailto:firstname.lastname@example.org">Barry Warsaw</a>
at the bottom of the pages. You'd be within your rights to argue that end users never even need know who admins the list, but I think it helps to avoid the "faceless droid" syndrome.
Mailman should avoid getting deeply into the spam detection and prevention business, except for some really really basic stuff (probably not much more or less than it does now). It should integrate well with external spam detection programs like SpamAssassin or commercial equivalents. E.g. if we always send the message through SA, and the message gets some score, we could decide to hold messages below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.
That sounds good, and if there isn't already a "plugin" API for that, we ought to give some thought to that...
As for #2, I'd go for the low-tech approach of simply discarding the hostname part of the email address in all public archives. Certainly this is easy in the headers, and we'll have to decide whether we're going to expend the resources to do body searches for email addresses, and obfuscate those as well. If people want to make contacts based on some public archive message, they can email the list. Until we've got web-posting, I don't think it matters if they lose the full email address in the public archives.
Well, personally, I don't ever assume that someone who posted a message a year ago with 95% of the answer to my question is even *on* the list anymore -- a situation I don't think you thought of -- but...
As for #3, I don't mind not obscuring the email addresses since a login will be required. If we think we don't trust the current private archive login procedures to be secure against bots, then we can fix that, but I don't see it as a high priority.
#4 is interesting too. I'm not against putting the raw archive behind a turing-test, since I suspect that very few people will ever want it. It means that we won't be able to write an automated wget-ish script to do off-site backups, but so be it.
Is there a difference between raw and private that I'm missing? Do you mean the mbox format files?
Things to note for #'s 2-4:
- The Pipermail implementation has lots of well-known problems. I'm personally not willing to spend a lot of time fixing them, and I still recommend Real Sites use a Real Archiver. I've just thrown the majority of the email obfuscation problems over the fence into someone else's back yard <wink>.
- Adding public archive obfuscation is fine and dandy for new messages added to the archives but what about all the existing archived messages? Re-running Pipermail (i.e. bin/arch) to regenerate from scratch has two significant drawbacks. 1) Message url's can change, especially if you also fix broken From_ delimiters, and that in turn breaks bookmarks, 2) on large mboxes, you simply can't do bin/arch because of memory problems.
See above. :-)
- Someone needs to step up and "own" Pipermail if any of these problems are going to be fixed, or if the obfuscation is going to happen.
Not much danger of that, is there?
- Remember that Pipermail itself is completely optional. An API is defined between Mailman and the archiver and that's all the interaction they have. Maybe the API needs to be more elaborate to support obfuscation. It definitely needs some changes if we ever want to add some of the features I'd like to add (but that's off-topic here).
Well, that's probably the best point yet: this isn't *MailMan's* problem, except to the extent that we "recommend" Piper as out archiver.
- I'll note that one of the early design decisions for Pipermail was that public archives should be vended directly from the file system for performance reasons. That decision may not be appropriate for today's operations. Certainly maintaining two static versions of the pages isn't feasible, so I think you have to vend one or the other (probably the obfuscated version) from a cgi.
No, but the performance reasons aren't as much of an issue now...
Nobody's even mentioned #5, which are available publically via the "Visit Subscriber List" button, or the email command "who" to the -request address. If I were a spam harvester, I wouldn't even bother with scanning the archives if either of these were publically enabled. When you turn them off, especially the former, just remember that you've now made it much harder for Joe User to unsubscribe themselves. Catch 22.
Not enough experience in the field, or I'd probably have mentioned that already.
Jay R. Ashworth email@example.com Member of the Technical Staff Baylink RFC 2100 The Suncoast Freenet The Things I Think Tampa Bay, Florida http://baylink.pitas.com +1 727 647 1274
"If you don't have a dream; how're you gonna have a dream come true?" -- Captain Sensible, The Damned (from South Pacific's "Happy Talk")