[Mailman-Developers] New Pipermail hacks (was Re: Ok, it works! ...)

Barry A. Warsaw barry@zope.com
Fri, 26 Oct 2001 17:22:07 -0400


Thanks for the really great feedback.  I'm about to check in a new
version of Scrubber.py that addresses the many issues brought up.
Apologies for not quoting everything.

- permission problems: fixed

- problems with multipart/mixed containing gif, html, and jpeg parts:

- text/html decoding: there's now a new global variable
  ARCHIVE_HTML_SANITIZER which can be 0, 1, or a string.

# This variable defines what happens to text/html subparts.  They can be
# stripped completely, escaped, or filtered through an external program.  The
# legal values are:
# 0 - Strip out text/html parts completely, leaving a notice of the removal in
#     the message.  If the outer part is text/html, the entire message is
#     discarded.
# 1 - Remove any embedded text/html parts, leaving them as HTML-escaped
#     attachments which can be separately viewed.  Outer text/html parts are
#     simply HTML-escaped.
# The value can also be a string, in which case it is the name of a command to
# filter the HTML page through.  The resulting output is left in an attachment
# or as the entirety of the message when the outer part is text/html.  The
# format of the string must include a "%(filename)s" which will contain the
# name of the temporary file that the program should operate on.  It should
# write the processed message to stdout.
ARCHIVE_HTML_SANITIZER = '/usr/bin/lynx -dump %(filename)s'

  This seems to work pretty well (will provide examples shortly).  As
  with the rest of Scrubber, it's a bit of a kludge, but perhaps not
  horrible.  It could definitely use more testing by you guys.

  It's actually rather difficult to get Pipermail to /not/ HTML-escape
  attachments, so I'm punting on that for now.  Plus, I just feel it's
  way too dangerous to support.

- storing in get_filename() if available: fixed, and I've also
  implemented the idea of sticking each message's attachments in a
  separate subdir off of archives/private/mylist/attachments.  The
  subdir is based on the Message-ID: and files inside there are
  uniquified if necessary.

- problems with the attachment url: what we really needed was a more
  elaborate PUBLIC_ARCHIVE_URL format string.  It now accepts
  %(hostname)s as well as %(listname)s, and the former gets
  interpolated with the list's web host name (as looked up in the
  inverted VIRTUAL_HOSTS dictionary, and defaulting to

Watch for checkins shortly.