[Mailman-Developers] New Pipermail hacks (was Re: Ok, it works! ...)

Donal Hunt donal.hunt2@mail.dcu.ie
Sun, 28 Oct 2001 12:34:29 +0000

hey everyone...

I was thinking of the security issues behind HTML encoded mail and one
of the things that you could do is strip out all "<SCRIPT>" stuff
automatically.  Normal HTML mail shouldn't generate it and it's one of
the main ways of doing malicious things when a user opens a mail.



mailman-developers-request@python.org wrote:
> Message: 1
> Date: Fri, 26 Oct 2001 17:22:07 -0400
> To: mailman-developers@python.org
> Subject: Re: [Mailman-Developers] New Pipermail hacks (was Re: Ok, it works!
>  ...)
> From: barry@zope.com (Barry A. Warsaw)
> Folks,
> Thanks for the really great feedback.  I'm about to check in a new
> version of Scrubber.py that addresses the many issues brought up.
> Apologies for not quoting everything.
> - permission problems: fixed
> - problems with multipart/mixed containing gif, html, and jpeg parts:
>   fixed.
> - text/html decoding: there's now a new global variable
>   ARCHIVE_HTML_SANITIZER which can be 0, 1, or a string.
> # This variable defines what happens to text/html subparts.  They can be
> # stripped completely, escaped, or filtered through an external program.  The
> # legal values are:
> # 0 - Strip out text/html parts completely, leaving a notice of the removal in
> #     the message.  If the outer part is text/html, the entire message is
> #     discarded.
> # 1 - Remove any embedded text/html parts, leaving them as HTML-escaped
> #     attachments which can be separately viewed.  Outer text/html parts are
> #     simply HTML-escaped.
> #
> # The value can also be a string, in which case it is the name of a command to
> # filter the HTML page through.  The resulting output is left in an attachment
> # or as the entirety of the message when the outer part is text/html.  The
> # format of the string must include a "%(filename)s" which will contain the
> # name of the temporary file that the program should operate on.  It should
> # write the processed message to stdout.
> ARCHIVE_HTML_SANITIZER = '/usr/bin/lynx -dump %(filename)s'
>   This seems to work pretty well (will provide examples shortly).  As
>   with the rest of Scrubber, it's a bit of a kludge, but perhaps not
>   horrible.  It could definitely use more testing by you guys.
>   It's actually rather difficult to get Pipermail to /not/ HTML-escape
>   attachments, so I'm punting on that for now.  Plus, I just feel it's
>   way too dangerous to support.
> - storing in get_filename() if available: fixed, and I've also
>   implemented the idea of sticking each message's attachments in a
>   separate subdir off of archives/private/mylist/attachments.  The
>   subdir is based on the Message-ID: and files inside there are
>   uniquified if necessary.
> - problems with the attachment url: what we really needed was a more
>   elaborate PUBLIC_ARCHIVE_URL format string.  It now accepts
>   %(hostname)s as well as %(listname)s, and the former gets
>   interpolated with the list's web host name (as looked up in the
>   inverted VIRTUAL_HOSTS dictionary, and defaulting to
> Watch for checkins shortly.
> -Barry