[Mailman-Developers] New Pipermail hacks (was Re: Ok, it works!
...)
Barry A. Warsaw
barry@zope.com
Fri, 26 Oct 2001 17:22:07 -0400
Folks,
Thanks for the really great feedback. I'm about to check in a new
version of Scrubber.py that addresses the many issues brought up.
Apologies for not quoting everything.
- permission problems: fixed
- problems with multipart/mixed containing gif, html, and jpeg parts:
fixed.
- text/html decoding: there's now a new global variable
ARCHIVE_HTML_SANITIZER which can be 0, 1, or a string.
# This variable defines what happens to text/html subparts. They can be
# stripped completely, escaped, or filtered through an external program. The
# legal values are:
# 0 - Strip out text/html parts completely, leaving a notice of the removal in
# the message. If the outer part is text/html, the entire message is
# discarded.
# 1 - Remove any embedded text/html parts, leaving them as HTML-escaped
# attachments which can be separately viewed. Outer text/html parts are
# simply HTML-escaped.
#
# The value can also be a string, in which case it is the name of a command to
# filter the HTML page through. The resulting output is left in an attachment
# or as the entirety of the message when the outer part is text/html. The
# format of the string must include a "%(filename)s" which will contain the
# name of the temporary file that the program should operate on. It should
# write the processed message to stdout.
ARCHIVE_HTML_SANITIZER = '/usr/bin/lynx -dump %(filename)s'
This seems to work pretty well (will provide examples shortly). As
with the rest of Scrubber, it's a bit of a kludge, but perhaps not
horrible. It could definitely use more testing by you guys.
It's actually rather difficult to get Pipermail to /not/ HTML-escape
attachments, so I'm punting on that for now. Plus, I just feel it's
way too dangerous to support.
- storing in get_filename() if available: fixed, and I've also
implemented the idea of sticking each message's attachments in a
separate subdir off of archives/private/mylist/attachments. The
subdir is based on the Message-ID: and files inside there are
uniquified if necessary.
- problems with the attachment url: what we really needed was a more
elaborate PUBLIC_ARCHIVE_URL format string. It now accepts
%(hostname)s as well as %(listname)s, and the former gets
interpolated with the list's web host name (as looked up in the
inverted VIRTUAL_HOSTS dictionary, and defaulting to
DEFAULT_URL_HOST).
Watch for checkins shortly.
-Barry