[Mailman-Developers] Opening up a few can o' worms here...

Tue, 16 Jul 2002 17:07:48 -0700

On 7/16/02 2:37 PM, "Barry A. Warsaw" <barry@zope.com> wrote:

>   CVR> the woodwork (I still run my Mailman box at home, however, so
>   CVR> I'm not going away. JC, quite jeering)
> 
> Congratulations!  I think. ;)

Actually, yes. I won't be working 65+ hours a week any more, so I sort of
get my life back, and may actually have time to think stuff through and do
more than emergency patching... (for more, see
<http://www.chuqui.com/cgi-bin/mwf/topic_show.pl?tid=348>). Also means I can
actually start some non-Apple hacking again, I hope. And what I'll be doing
is lots of fun, although the next six weeks is going to be a crunch. Still
doing email, just off building a new custom system for stuff I can't talk
about...

>   CVR> One thing we're definitely doing is moving to a cloaked
>   CVR> archive. Since we already distribute all archives out of

> So these are public archives that need to be scrubbed, right?  Until
> now, Mailman has taken the approach that public archives are feed
> right off the file system by the http server.  We could still do that
> if we scrubbed the messages before we archived them, although that
> doesn't help with existing archives unless you re-generate them.

Here's why I won't do that. I want to keep ONE set of archives. You can't
scrub those archives for two reasons. What if someone writes looking to get
in contact with the author of a message? If the archive is scrubbed, that
info is gone. And (god forbid), you get into a legal tangle? That's your
legal record of what was said on the mail list and who said it. If you scrub
it, and someone does something actionable or libelous and you get a court
order to provide that data? You're hosed.

On a more likely note -- I can see where you might want the option to show
the archives unscrubbed to validated users, and only scrub the public
archives. As paranoid as I'm being today, I'd STILL like to find a way to
let subscribed users see the archives unscrubbed. Which you could do by
setting a cookie that the CGI could accept and change it's behavior.

So I really like leaving the archives unmodified, and doing the scrubbing
via CGI. It also allows you do to other things, like header cleanups (and
you could potentially let a user set a cookie to define minimal or full
headers, say...) and some quickie cleanup against unwrapped text and some
other incidental archive glitches.

I come from a newspaper family, so I have a bias towards "you don't
unpublish stuff, you don't change it once it's published". But I think there
are good reasons to avoid sanitizing the archives, and instead sanitizing
the delivery of those archives -- if only because if your policies change,
all you need to change is the CGI. And it gives you the ability to set up
different sets of abilities per user or per list if you want, too.

> So one question is: does the performance trade-off we made 5 years ago
> still make sense?  Should we just be vetting all archives through a
> cgi, in which we can do fun stuff like cleanse it of email addresses?

One of the big things I dislike about Mhonarc is that archives are a rather
low-usage system, but maintaining the Mhonarc index pages is rather
intensive use of system resources. Sort of like usenet -- you do a lot of
work on everything, in case someone wants anything. I think simply storing
the archives and sanitizing on demand is lower overhead. It also means
pipermail won't need ANY changes -- you simply feed it out through the CGI
instead of directly, and everything magically sanitizes...

> We'd obviously have to get rid of the easy access to the raw mbox
> file, so another question is whether that's still useful.

Honestly? I don't think so. I find them real kludgy. I ended up doing a new
archiving system (one file per message) via a perl script. We're about to
take our new search engine out of beta with the thing, finally.

> Also, what heuristic do you use to search for email addresses, and
> what do you scrub them with?

Still being worked on. Right now, I'm basically doing a
<wordboundary><nonwhitespace>@<nonwhitespaceordot><dot>nonwhitespace><wordbo
undary>. I don't know how strongly we'll refine it.

>Do you want to attempt to obscure the
> address (e.g. "barry--at--python--dot--org")

Anything you programmatically obscure will be programmatically de-obscured.
This technique is bogus and guaranteed to fail as soon as the spammers care
enough. It's pretty clear even the "randomized obscuring" of slashdot is a
failed technique, since spambots don't have to decode ALL of those formats,
just some of them, and then cycle throug the site enough times....

Sorry, I find this is a false security. Makes the users feel better,
accomplishes nothing useful, so in reality, users get lazy and careless. So
to some degree, I feel it's worse than nothing. I'm planning on replacing
email addresses with something  useful like [email address deleted].

>   CVR> disclosing that info. It creates other problems -- you can't
>   CVR> see a posting in the archive and send email to that person
>   CVR> with more questions (or answers), but that seems trivial
>   CVR> compared to the problems the spammers are causing.
> 
> It kind of plays into Reply-To: munging doesn't it?  If you won't be
> able to reply to the original author, because we're anonymizing
> messages, then you might as well munge Reply-To: to go back to the
> list because that's the only posting address that makes sense.

Yes (he says, grimacing).

If you sanitize the archives, I don't think it affects the list. There are
simply NO mailtos any more in the archives.

If you go the step further and anonymize the postings ON the list, so
subscriber email addresses simply are never shown to other subscribers under
any circumstances (ugh. Urp. I can't believe I'm saying that. This is so
anti-community it hurts), you have no choice and reply-to has to point to
the list, since it's the only contact point left.

If you instead turn the list server into a forwarding agent, as in:

> Or should Mailman get into the anonymous resender game?  There's
> probably a lot we could do here, but given the political risks of
> anonymous resenders, do we even want go there?

Is it an anonymous remailer? We're making no pretense of anonymity here.
We're acting as a forwarding agent, ala hotmail.com or mac.com. You mail to
id13194@python.org, and it ends up in my mailbox. The fact that we're not
explicitly denoting the real email address doesn't make us an anonymous
remailer -- that'd be a policy issue, actually. I suppose you could take it
that step further, but you could also set it up so validated subscribers
could get to the real addresses.

The model I'm thinking of is like many forum systems. If you're a guest, you
don't get access to email info. If you're a subscriber, you log on, and they
magically appear. In the case of mailing lists, since oyu lose control of
the e-mail address once it leaves the site again, you handle this by only
using the remailer address in mail that leaves the site, but a subscriber
could go to the list system and look a user up. That gets us away from the
politics of the anonymous stuff.

>   CVR> A secondary issue here is the problem of disclosing admins
>   CVR> and admin addresses.
> 
> Note that in MM2.1 we go about 1/2 way here.  We include the obscured
> email addresses of the list owners as the text in a mailto: tag but we
> actually use the list-owner@ address as the mailto: target.  That
> might not be enough though.  When we actually have a Real Database
> backend we can keep a roster of email+realname and then just include
> the realname inside the href:mailto tag.

I think six months ago it was enough. Now, I just don't think it is. Sigh.
Grumble. 

> Have you looked at SpamAssassin Chuq?

See my other message. SA is a good tool, if you have someone around willing
to update it, monitor it, and make sure it stays up to date technologically
with current releases that are updated to match the spammers changes. Do you
want to require SA to be installed as a requirement for Mailman? What about
sites where they don't have an admin to keep updating it?

SA is only as good as the latest release blocks spam. So you have to keep
updating it. Is that a realistic (and ultimately successful) strategy? I
HATE WHITELISTS. But in the case of public addresses, I'm now convinced
they're needed, because otherwise, you're committing to an ever-escalating
war to stay ahead of the spammers. At best, that's going to cost continuing
manpower and energy and be zero sum. You won't win, you simply continue
surviving by sticking thumbs in the dike.

> Very few false positives too (usually it's
> email amongst our postmasters talking about spam or SA ;).

All it takes is one. Have you seen these stories?

>>Some stuff I've run across while digging out from being on vacation...
>>
>>An interesting take on collaborative anti-spam issues -- that forging email
headers to test/validate an open relay is an illegal trespass on a mail server:
>>
>><http://www.newarchitectmag.com/documents/s=2442/na0802g/index.html>
>>
>>Lincoln Stein saying the heck with it and deciding that manual filtering is
better than the alternatives:
>>
>><http://www.newarchitectmag.com/documents/s=2445/na0802h/index.html>
>>
>>And in  case you didn't see it, cNet's article on why the RBLs are creating
false positive problems. It really looks like the blackhole systems have now hit
a critical mass where they're being noticed, and not favorably. The folks at
SPEWS, if you read what has happened through their stuff and how their attitude
leaked all over their responses, hasn't helped their cause much.
>>
>><http://news.com.com/2100-1023-943337.html?tag=fd_lede>
>>
>>Finally, another article, this from TidBits, about the growing problem of BAD
filtering and false positives, and how it creates another set of (probably even
worse) problems.....
>>
>><http://db.tidbits.com/getbits.acgi?tbart=06866>

Also:

>> http://news.com.com/2100-1023-943337.html?tag=fd_lede

>   CVR> @#$@$#@$#23 caches (which are nice in some ways) for not also
>   CVR> creating some way to flag addresses as not
>   CVR> cacheable. Because, IMHO, that'd solve this problem.
> 
> Yup, but of course it implies that the clients play by the rules, and
> we know that they don't all, so the question is what we're willing to
> give up for the security of our online personas.  Kinda mirrors
> today's large questions in the WoT(tm), eh?  Maybe people are more
> willing to give up their rights than their conveniences for some added
> security.

Yeah. I see your Sigh and raise you.

> World domination of course.  Because we /could/ add that stuff fairly
> easily if we had the resources to expend on it.  Would it still be
> useable?  For some audiences yes, others no.  I'm fairly sure the
> kind of anonymizing we're talking about would never fly in the Python
> and Zope community, where as it's probably essential in a less
> cloistered environment like lists.apple.com.  Which leads me to
> believe that we need to make it much easier to install themes or
> styles of lists, from the paranoid anonymizer to the laissez-faire
> discussion list.

You have nailed it on the head. Which is why I brought it up. Not because
this is the way it has to be in the future, but because all this is making
Mailman's job a whole lot more complex (we were whining about that at work
today, or at least I was and everyone was nodding sympathetically and
looking for an open window -- email used to be pretty easy and straight
forward. And now.....). But not just because all this crap is getting in the
way, but also that fixing this crap is overkill for some environments, and
going to be NOT ENOUGH in others.

>   CVR> Happy Macworld Expo week, all. If you need me, I'll be in the
>   CVR> war room, beating my head against a wall.
> 
> Any chance you could make it down to DC for a side trip?  We could
> have a Mailman hacking sprint over a few dozen steamed Maryland blue
> crabs and some cold ones. :)

Damn, that sounds good, but -- I've had to give up crab and shellfish (I've
developed an intermitten sensitivity to it. Sigh!) and I'm staying in
cupertino where I'll be manning the war room this week making sure buttons
get pushed when they need pushed, and not a minute before....

-- 
Chuq Von Rospach, Architech
chuqui@plaidworks.com -- http://www.chuqui.com/

No! No! Dead girl, OFF the table! -- Shrek