I was thinking of the security issues behind HTML encoded mail and one
of the things that you could do is strip out all "<SCRIPT>" stuff
automatically. Normal HTML mail shouldn't generate it and it's one of
the main ways of doing malicious things when a user opens a mail.
> Message: 1
> Date: Fri, 26 Oct 2001 17:22:07 -0400
> To: mailman-developers(a)python.org
> Subject: Re: [Mailman-Developers] New Pipermail hacks (was Re: Ok, it works!
> From: barry(a)zope.com (Barry A. Warsaw)
> Thanks for the really great feedback. I'm about to check in a new
> version of Scrubber.py that addresses the many issues brought up.
> Apologies for not quoting everything.
> - permission problems: fixed
> - problems with multipart/mixed containing gif, html, and jpeg parts:
> - text/html decoding: there's now a new global variable
> ARCHIVE_HTML_SANITIZER which can be 0, 1, or a string.
> # This variable defines what happens to text/html subparts. They can be
> # stripped completely, escaped, or filtered through an external program. The
> # legal values are:
> # 0 - Strip out text/html parts completely, leaving a notice of the removal in
> # the message. If the outer part is text/html, the entire message is
> # discarded.
> # 1 - Remove any embedded text/html parts, leaving them as HTML-escaped
> # attachments which can be separately viewed. Outer text/html parts are
> # simply HTML-escaped.
> # The value can also be a string, in which case it is the name of a command to
> # filter the HTML page through. The resulting output is left in an attachment
> # or as the entirety of the message when the outer part is text/html. The
> # format of the string must include a "%(filename)s" which will contain the
> # name of the temporary file that the program should operate on. It should
> # write the processed message to stdout.
> ARCHIVE_HTML_SANITIZER = '/usr/bin/lynx -dump %(filename)s'
> This seems to work pretty well (will provide examples shortly). As
> with the rest of Scrubber, it's a bit of a kludge, but perhaps not
> horrible. It could definitely use more testing by you guys.
> It's actually rather difficult to get Pipermail to /not/ HTML-escape
> attachments, so I'm punting on that for now. Plus, I just feel it's
> way too dangerous to support.
> - storing in get_filename() if available: fixed, and I've also
> implemented the idea of sticking each message's attachments in a
> separate subdir off of archives/private/mylist/attachments. The
> subdir is based on the Message-ID: and files inside there are
> uniquified if necessary.
> - problems with the attachment url: what we really needed was a more
> elaborate PUBLIC_ARCHIVE_URL format string. It now accepts
> %(hostname)s as well as %(listname)s, and the former gets
> interpolated with the list's web host name (as looked up in the
> inverted VIRTUAL_HOSTS dictionary, and defaulting to
> Watch for checkins shortly.