"CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
CVR> First, a minor announcement. I'm no longer in charge of the
CVR> mailing lists at apple, sort of. We've hired a person
CVR> full-time, and he's been taking over the lists server as his
CVR> full-time responsibility, allowing me to go off and work on
CVR> other projects. I'm still in the loop, just not "it". I'm
CVR> still going to be heavily involved as we move that box to
CVR> Mailman 2.1, and after that, probably fade a bit more into
CVR> the woodwork (I still run my Mailman box at home, however, so
CVR> I'm not going away. JC, quite jeering)
Congratulations! I think. ;)
CVR> One thing we're definitely doing is moving to a cloaked
CVR> archive. Since we already distribute all archives out of
CVR> HTTP, not FTP, we're working on a CGI that'll strip all
CVR> e-mail information out of messages on the fly (among other
CVR> things, like header cleanup and some trivial formatting
CVR> fixes). The idea is simple -- we've finally hit the point
CVR> where you can't put an e-mail address up on a public site
CVR> under any cirucmstance safely, so we're having to move to a
CVR> system where we simply don't do that.
So these are public archives that need to be scrubbed, right? Until now, Mailman has taken the approach that public archives are feed right off the file system by the http server. We could still do that if we scrubbed the messages before we archived them, although that doesn't help with existing archives unless you re-generate them.
So one question is: does the performance trade-off we made 5 years ago still make sense? Should we just be vetting all archives through a cgi, in which we can do fun stuff like cleanse it of email addresses?
We'd obviously have to get rid of the easy access to the raw mbox file, so another question is whether that's still useful. Occasionally it's damn handy if you're moving a list or gathering statistics on it, but on the other hand, it's a rich source of addresses to mine. Again, if we scrubbed the messages pre-archiving we likely be ok.
Also, what heuristic do you use to search for email addresses, and what do you scrub them with? Do you want to attempt to obscure the address (e.g. "barry--at--python--dot--org") or replace it altogether (e.g. "[hidden email address]"), or maybe just replace it with a truncation (e.g. "[localpart's email address]").
CVR> I think the Mailman stuff needs to think about this, also. It
CVR> impacts the archiving setup and other issues, but the
CVR> harvesters have hit the point where we simply can't risk
CVR> disclosing that info. It creates other problems -- you can't
CVR> see a posting in the archive and send email to that person
CVR> with more questions (or answers), but that seems trivial
CVR> compared to the problems the spammers are causing.
It kind of plays into Reply-To: munging doesn't it? If you won't be able to reply to the original author, because we're anonymizing messages, then you might as well munge Reply-To: to go back to the list because that's the only posting address that makes sense. And what if the original poster isn't a member of the list?
Or should Mailman get into the anonymous resender game? There's probably a lot we could do here, but given the political risks of anonymous resenders, do we even want go there?
CVR> A secondary issue here is the problem of disclosing admins
CVR> and admin addresses.
Note that in MM2.1 we go about 1/2 way here. We include the obscured email addresses of the list owners as the text in a mailto: tag but we actually use the list-owner@ address as the mailto: target. That might not be enough though. When we actually have a Real Database backend we can keep a roster of email+realname and then just include the realname inside the href:mailto tag.
CVR> I know we've hashed that through once, but we've come to the
CVR> (somewhat reluctant) decision to whitelist all public,
CVR> non-personal email addresses. We're going to be implementing
CVR> TMDA to do this, and will be switching all admin to generic
CVR> addresses that filter through TMDA, as well as things like
CVR> postmaster@ and the like. While I hate making users jump
CVR> through hoops to get through to a real person (for those that
CVR> don't know, TMDA is an overt whitelist. If you're not on the
CVR> whitelist, you get mail back telling you to take some action,
CVR> and until you do, the mail isn't delivered), but the abuse by
CVR> the spammers on admin addresses is now so bad I'm declaring
CVR> defeat and going to the whitelist.
Have you looked at SpamAssassin Chuq? It's really done wonders to reduce the amount of spam actually getting through any python.org or zope.org address. I know 'cause I see the daily reports of quarantined messages. Very few false positives too (usually it's email amongst our postmasters talking about spam or SA ;). I feel a lot better about this approach than TMDA'ing essential addresses like postmaster or mailman-owner.
CVR> I'm going to look and see if I can interface TMDA to the
CVR> subscriber databases so that subscribers are by definition
CVR> whitelisted, but we've hit the poiint where we have to do
CVR> this. I'm not happy about it, but the war is lost, I think.
Sigh.
CVR> So what he did was open up his address book and send his
CVR> message to everyone in it. And he's running one of these new
CVR> e-mail clients that happily caches addresses it sees in case
CVR> you want them again. So all of the addresses of people
CVR> posting to the mailing lists he subscribed to were in his
CVR> address book cache, so when he grabbed his address book, he
CVR> grabbed all of those addresses, too.
Wonderful. I think this has been presaged by Klez which does essentially the same thing w/o human intervention or such good intent. ;)
CVR> But now we're wondering if we have to go to some sort of
CVR> address cloaking ON lists, maybe some kind of address
CVR> remapping through the server for replies, something. And I'm
CVR> gritting my teeth at the developers who created those
CVR> @#$@$#@$#23 caches (which are nice in some ways) for not also
CVR> creating some way to flag addresses as not
CVR> cacheable. Because, IMHO, that'd solve this problem.
Yup, but of course it implies that the clients play by the rules, and we know that they don't all, so the question is what we're willing to give up for the security of our online personas. Kinda mirrors today's large questions in the WoT(tm), eh? Maybe people are more willing to give up their rights than their conveniences for some added security.
CVR> Are we hitting a point where mail list servers have to act as
CVR> blind front ends for all of the subscribers, where replies
CVR> are processed by those servers, and the server then takes on
CVR> the job of acting as a troll-exterminator and spam blocker?
CVR> And what does that really mean for things like Mailman?
World domination of course. Because we /could/ add that stuff fairly easily if we had the resources to expend on it. Would it still be useable? For some audiences yes, others no. I'm fairly sure the kind of anonymizing we're talking about would never fly in the Python and Zope community, where as it's probably essential in a less cloistered environment like lists.apple.com. Which leads me to believe that we need to make it much easier to install themes or styles of lists, from the paranoid anonymizer to the laissez-faire discussion list.
CVR> Happy Macworld Expo week, all. If you need me, I'll be in the
CVR> war room, beating my head against a wall.
Any chance you could make it down to DC for a side trip? We could have a Mailman hacking sprint over a few dozen steamed Maryland blue crabs and some cold ones. :)
-Barry