[Mailman-Users] Preserve HTML format while scrubbing fileattachments

Mark Sapiro mark at msapiro.net
Sun Mar 9 04:22:38 CET 2008


Darrell Burkey wrote:
>
>I've read the FAQs until my eyes are square and I can see some discussions
>in the mailing list archives but no definitive answer to essentially this
>post in 2006:
>
>http://www.mail-archive.com/mailman-users@python.org/msg40147.html
>
>which asks if there is a way to preserve html formatted messages while still
>having file attachments scrubbed?


No. (Is that definitive enough?)


>I don't think the replies to the above
>message really understood what was being asked as I'm not worried about the
>archives at this point in time, rather I just want a message that a user
>sends that contains html formatting to retain that formatting and have any
>file attachment sent converted to a link (scrubbed).


The replies to the above referenced post talk about archiving because
the job of the scrubber is to produce a flat, text/plain message with
non text/plain parts stored and replaced by links for the pipermail
archiver. The fact that the scrubber can also be used to do the same
job on a message to be delivered to the list, and also does the same
job for the plain format digest doesn't change its job description.


>What I get now if a html email with a file attachment comes through to a
>list is a message distributed with the following scrub notifications (and
>the message in plain text):
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL:
>http://www.communitylists.org.au/mailman/private/case-test/attachments/20080
>309/f127997e/attachment.html 
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: CASE_SLA.pdf
>Type: application/pdf
>Size: 91582 bytes
>Desc: not available
>Url :
>http://www.communitylists.org.au/mailman/private/case-test/attachments/20080
>309/f127997e/attachment.pdf 


That's exactly what the scrubber is supposed to do.


>I'm using MailMan 2.1.9, fairly stock CentOS system with sendmail. Lovely
>mhonarc/htdig integration patches by OpenInfo. 
>
>I've set ARCHIVE_HTML_SANITIZER to various values and tried every
>combination of Content Filtering, Non-Digest file scrubbing settings a
>person could ever come up with. Many hours of testing gone by. 
>
>I Would really appreciate some assistance as it just doesn't seem possible
>to me that MailMan can't preserver html messages and at the same time scrub
>file attachments given all the great work that has been done on it in the
>last few years.


Its never been a requirement. ARCHIVE_HTML_SANITIZER doesn't come into
it. That has to do with how the removed HTML parts are stored in the
archive.

The problem is, as I try to explain above, that pipermail can only
archive plain text so the scrubber's job is to produce a flat
text/plain message suitable for pipermail archiving, and that's the
job it does whether it does it for the archive, the plain digest or
all messages.

If you want HTML delivered to the list, you have to set scrub_nondigest
to No. Then, your options are to pass the message as is to the list,
in which case you set filter_content to No, or to remove (and not
store) some MIME parts based on Content-Type or filename extension, in
which case you set filter_content to Yes, collapse_alternatives and
convert_html_to_plaintext to No and the other content filtering
settings as you wish.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list