URL of scrubbed attachments missing in the list archive

Hi,
as listadmin I received a notification, that a 3mb email has been moderated. I found the scrubber option and activated it. Afterwards I let the mail pass and viewed it in the archive. The mail is there, but without attachment (as intended) nor any link to it (contrary to my expectation).
I have seen the scrubbing feature has been implemented in 2004 [1] and assume it is well tested. The pipeline option [2] came to my attantion, but I would like to keep the current order. There have been no other FAQ entries regarding attachment than this [3]. This is not related to the long URL issue [4] as the URL does not appear at all. I also did not find it among the launchpad bugs [5]. Also I searched the archive for missing URL etc. so I assume this has not been reported yet.
To check, if this is an instance of #265869 (Scrubber/attachment bug on NetBSD) a I asked the mailman admins but there was no such log entry, also they use debian.
To reproduce one can try:
- setup a ML wihtout scrubbing enabled
- send a mail to the list that exceeds the size limit
- after the mail appeared for moderation, activate scrubbing
- accept the mail and look in the archive if the URL is there
Can somebody reproduce this?
Thanks, Kardan
1] http://sourceforge.net/p/mailman/patches/291/ 2] http://wiki.list.org/pages/viewpage.action?pageId=7602227 3] http://wiki.list.org/pages/viewpage.action?pageId=4030548 4] http://wiki.list.org/display/DEV/Stable+URLs 5] https://bugs.launchpad.net/mailman/+bug/265869 6] https://bugs.launchpad.net/mailman/+bugs?field.searchtext=attachment+URL

On 06/25/2013 06:56 AM, kardan wrote:
Perhaps the attachment was removed by Mailman's content filtering.
I can easily reproduce this if the large MIME part in the message is a Content-Type that will be removed by Mailman's content filtering.
If not, and if by "activate scrubbing" you mean set scrub_nondigest to Yes, then it should work as you expect and the large MIME part should be removed and replaced by a URL in both the archive and in the post delivered to the list.
Note that non text/plain MIME parts that pass content filtering are always stored aside and replaced by links in the plain format digest and the pipermail archive regardless of the setting of scrub_nondigest.
So the underlying question is what was the MIME Content-Type of the large message part and what are your list's content filtering settings?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
thanks for the fast answer.
On Tue, 25 Jun 2013 14:41:35 -0700 Mark Sapiro <mark@msapiro.net> wrote:
The tree is like this (according to claws-mail)
- message/rfc822 (3.29MB)
** multipart/alternative (3.29MB)
*** text/plain (1.14KB)
*** multipart/related (3.28MB)
**** text/html (3.83KB)
**** image/jpeg (3.28MB)
Mime-Version: 1.0 (Apple Message framework v753.1) Content-Type: multipart/alternative; boundary=Apple-Mail-63-807922156
The filter options:
- filter_content: yes
- filter_mime_types: <none>
- pass_mime_types:
- filter_filename_extensions:
- pass_filename_extensions: <none>
- collapse_alternatives: yes
- convert_html_to_plaintext: yes
- filter_action: discard
The resulted mail contained only (3) and had these headers: X-Mailman-Approved-At: Tue, 25 Jun 2013 04:35:52 +0200 X-ContentX-Mailman-Approved-At: Tue, 25 Jun 2013 04:35:52 +0200 X-Content-Filtered-By: Mailman/MimeDel 2.1.13-Filtered-By: Mailman/MimeDel 2.1.13 Content-Type: text/plain; charset="utf-8"; Format="flowed"; DelSp="yes"
So multipart/related is not in the allowed MIME type and was filtered. I think it is no bad idea to have the above filenames filtered, while everything else should pass landing in the archive. Please give me a hint, how to archieve this.
Thanks, Kardan

On 06/25/2013 04:32 PM, kardan wrote:
[...]
[...]
As you surmise, your settings do not pass multipart/related so the multipart/related part including its text/html and image/jpeg subparts were removed.
Note that even if you were to change your pass_mime_types to
multipart text/plain text/html image/jpeg
so that all the parts of the message are accepted, the result would still only be the text/plain part because collapse_alternatives = Yes means replace the multipart/alternative part with the first (the text/plain) sub-part.
If you want to filter only on filename extensions and pass all MIME types that don't have associated file names with the filter_filename_extensions extensions, you want pass_mime_types to be empty and collapse_alternatives and convert_html_to_plaintext to be No, but this will potentially accept all kinds of malware which may have Content-Type: application/octet-stream and no file name.
Whether this is safe or not depends on other things like discarding non-member posts and knowing your list members.
The real question is do you really want some list members 3.2 Mbyte jpeg stationery background (if that's what it was) in your archive and distributed to your list?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
On Tue, 25 Jun 2013 17:50:20 -0700 Mark Sapiro <mark@msapiro.net> wrote:
I deactivated the collapse_alternatives as this was not what I intended.
colour and formatting settings as default. So far none of the list members complained. RFC8220 [1] does not say anything about MIME types and I don't know which others are possible so I better disable mime type filtering. However accepting application/octet-stream seems risky and I see no way to handle that properly, except whitelisting all accepted types like pdf, jpg, png and all documents. However odt with embedded macros can be harmful as well. So there is probably no easy fix for that.
Kardan

On 06/26/2013 02:57 PM, kardan wrote:
If you prefer plain text to HTML or other fancy text, you probably DO want collapse_alternatives = Yes as that will normally select a plain text alternative in preference to an HTML alternative.
RFC8220 [1] does not say anything about MIME types and I don't know which others are possible so I better disable mime type filtering.
See <http://www.iana.org/assignments/media-types>.
Note that filtering/accepting based on file extension is not at all reliable as many inline images with media types like image/jpeg, image/gif, image/png, etc. will not have an associated file name and therefore cannot be filtered/accepted based on filename extension. The same is also sometimes true for application/pdf and many other media types.
[...]
If by the above, you mean the option (scrub_nondigest) to remove, store aside and link to attachments in individual messages and MIME format digests, then you are correct in what it does, however, attachments are always removed, stored aside and replaced by links in archived posts and plain format digests regardless of this option. The option only controls at what point in the process the removal/replacement occurs.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
On Wed, 26 Jun 2013 15:38:30 -0700 Mark Sapiro <mark@msapiro.net> wrote:
I found, that rfc1521 shows an overview content and MIME types http://www.faqs.org/rfcs/rfc1521.html
7.4.1. The Application/Octet-Stream (primary) subtype To reduce the danger of transmitting rogue programs through the mail, it is strongly recommended that implementations NOT implement a path-search mechanism whereby an arbitrary program named in the Content-Type parameter (e.g., an "interpreter=" parameter) is found and executed using the mail body as input.
I came to the conclusion, especially because I know that many users do not care about security, not even about technology so much, it is my task as listadmin to take most risks out of their way instead of leaving the possibilities of harmful content with obligations they do not understand. Even if octet garbage is propably no harm for my system as it is treated as non-executable I should not burden anybody with the possibility of unwanted script execution.
I cannot take care of in which way a user sends attachments. I neither want to filter them nor should they be forwarded, but stored aside. You said "your settings do not pass multipart/related so the multipart/related part including its text/html and image/jpeg subparts were removed", which is not what I want.
Summarizing i need to change the following options to make mailman pass_mime_types = <none>
- send only plain text messages to the user
- strip all (inline) attachments, store them and link to it in both, the archived and the fordwarded version
collapse_alternatives = Yes convert_html_to_plaintext = Yes
Is there anything I missed?
Thanks for all your help so far! Kardan

On 06/26/2013 05:18 PM, kardan wrote:
Assuming you have set scrub_nondigest to Yes, then the above should do more or less what you say you want, but consider the message whose structure you posted at <http://mail.python.org/pipermail/mailman-users/2013-June/075332.html>. For this message, collapse_alternatives = Yes will keep only the text/plain alternative from the multipart/alternative part and will remove the multipart/related alternative together with its text/html and image/jpeg sub-parts.
If this is what you want in this case, then your settings are good. On the other hand, if in this case you want the image/jpeg part stored aside and linked, then you might as well set filter_content to No and not filter content at all.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 06/25/2013 06:56 AM, kardan wrote:
Perhaps the attachment was removed by Mailman's content filtering.
I can easily reproduce this if the large MIME part in the message is a Content-Type that will be removed by Mailman's content filtering.
If not, and if by "activate scrubbing" you mean set scrub_nondigest to Yes, then it should work as you expect and the large MIME part should be removed and replaced by a URL in both the archive and in the post delivered to the list.
Note that non text/plain MIME parts that pass content filtering are always stored aside and replaced by links in the plain format digest and the pipermail archive regardless of the setting of scrub_nondigest.
So the underlying question is what was the MIME Content-Type of the large message part and what are your list's content filtering settings?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
thanks for the fast answer.
On Tue, 25 Jun 2013 14:41:35 -0700 Mark Sapiro <mark@msapiro.net> wrote:
The tree is like this (according to claws-mail)
- message/rfc822 (3.29MB)
** multipart/alternative (3.29MB)
*** text/plain (1.14KB)
*** multipart/related (3.28MB)
**** text/html (3.83KB)
**** image/jpeg (3.28MB)
Mime-Version: 1.0 (Apple Message framework v753.1) Content-Type: multipart/alternative; boundary=Apple-Mail-63-807922156
The filter options:
- filter_content: yes
- filter_mime_types: <none>
- pass_mime_types:
- filter_filename_extensions:
- pass_filename_extensions: <none>
- collapse_alternatives: yes
- convert_html_to_plaintext: yes
- filter_action: discard
The resulted mail contained only (3) and had these headers: X-Mailman-Approved-At: Tue, 25 Jun 2013 04:35:52 +0200 X-ContentX-Mailman-Approved-At: Tue, 25 Jun 2013 04:35:52 +0200 X-Content-Filtered-By: Mailman/MimeDel 2.1.13-Filtered-By: Mailman/MimeDel 2.1.13 Content-Type: text/plain; charset="utf-8"; Format="flowed"; DelSp="yes"
So multipart/related is not in the allowed MIME type and was filtered. I think it is no bad idea to have the above filenames filtered, while everything else should pass landing in the archive. Please give me a hint, how to archieve this.
Thanks, Kardan

On 06/25/2013 04:32 PM, kardan wrote:
[...]
[...]
As you surmise, your settings do not pass multipart/related so the multipart/related part including its text/html and image/jpeg subparts were removed.
Note that even if you were to change your pass_mime_types to
multipart text/plain text/html image/jpeg
so that all the parts of the message are accepted, the result would still only be the text/plain part because collapse_alternatives = Yes means replace the multipart/alternative part with the first (the text/plain) sub-part.
If you want to filter only on filename extensions and pass all MIME types that don't have associated file names with the filter_filename_extensions extensions, you want pass_mime_types to be empty and collapse_alternatives and convert_html_to_plaintext to be No, but this will potentially accept all kinds of malware which may have Content-Type: application/octet-stream and no file name.
Whether this is safe or not depends on other things like discarding non-member posts and knowing your list members.
The real question is do you really want some list members 3.2 Mbyte jpeg stationery background (if that's what it was) in your archive and distributed to your list?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
On Tue, 25 Jun 2013 17:50:20 -0700 Mark Sapiro <mark@msapiro.net> wrote:
I deactivated the collapse_alternatives as this was not what I intended.
colour and formatting settings as default. So far none of the list members complained. RFC8220 [1] does not say anything about MIME types and I don't know which others are possible so I better disable mime type filtering. However accepting application/octet-stream seems risky and I see no way to handle that properly, except whitelisting all accepted types like pdf, jpg, png and all documents. However odt with embedded macros can be harmful as well. So there is probably no easy fix for that.
Kardan

On 06/26/2013 02:57 PM, kardan wrote:
If you prefer plain text to HTML or other fancy text, you probably DO want collapse_alternatives = Yes as that will normally select a plain text alternative in preference to an HTML alternative.
RFC8220 [1] does not say anything about MIME types and I don't know which others are possible so I better disable mime type filtering.
See <http://www.iana.org/assignments/media-types>.
Note that filtering/accepting based on file extension is not at all reliable as many inline images with media types like image/jpeg, image/gif, image/png, etc. will not have an associated file name and therefore cannot be filtered/accepted based on filename extension. The same is also sometimes true for application/pdf and many other media types.
[...]
If by the above, you mean the option (scrub_nondigest) to remove, store aside and link to attachments in individual messages and MIME format digests, then you are correct in what it does, however, attachments are always removed, stored aside and replaced by links in archived posts and plain format digests regardless of this option. The option only controls at what point in the process the removal/replacement occurs.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
On Wed, 26 Jun 2013 15:38:30 -0700 Mark Sapiro <mark@msapiro.net> wrote:
I found, that rfc1521 shows an overview content and MIME types http://www.faqs.org/rfcs/rfc1521.html
7.4.1. The Application/Octet-Stream (primary) subtype To reduce the danger of transmitting rogue programs through the mail, it is strongly recommended that implementations NOT implement a path-search mechanism whereby an arbitrary program named in the Content-Type parameter (e.g., an "interpreter=" parameter) is found and executed using the mail body as input.
I came to the conclusion, especially because I know that many users do not care about security, not even about technology so much, it is my task as listadmin to take most risks out of their way instead of leaving the possibilities of harmful content with obligations they do not understand. Even if octet garbage is propably no harm for my system as it is treated as non-executable I should not burden anybody with the possibility of unwanted script execution.
I cannot take care of in which way a user sends attachments. I neither want to filter them nor should they be forwarded, but stored aside. You said "your settings do not pass multipart/related so the multipart/related part including its text/html and image/jpeg subparts were removed", which is not what I want.
Summarizing i need to change the following options to make mailman pass_mime_types = <none>
- send only plain text messages to the user
- strip all (inline) attachments, store them and link to it in both, the archived and the fordwarded version
collapse_alternatives = Yes convert_html_to_plaintext = Yes
Is there anything I missed?
Thanks for all your help so far! Kardan

On 06/26/2013 05:18 PM, kardan wrote:
Assuming you have set scrub_nondigest to Yes, then the above should do more or less what you say you want, but consider the message whose structure you posted at <http://mail.python.org/pipermail/mailman-users/2013-June/075332.html>. For this message, collapse_alternatives = Yes will keep only the text/plain alternative from the multipart/alternative part and will remove the multipart/related alternative together with its text/html and image/jpeg sub-parts.
If this is what you want in this case, then your settings are good. On the other hand, if in this case you want the image/jpeg part stored aside and linked, then you might as well set filter_content to No and not filter content at all.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
kardan
-
Mark Sapiro