Mailman 3 Re: [Mailman-Users] HTML content from GMX gets scrubbed in archive - Mailman-Users

newer
importing mailman list archive...

Re: [Mailman-Users] HTML content from GMX gets scrubbed in archive

older
Email formatting

Mark Sapiro

7 Oct 2014 7 Oct '14

2:14 a.m.

On 10/06/2014 04:20 AM, Peter Wetz wrote:

...

So the answer to your simple question is simply "you can't".
maybe my use of "how can html-mails be properly displayed" was misleading. i dont want any html formatting to be preserved. i simply want an html mail be converted to plain text and then get shown in the archive without any scrubbed attachments. is that possible?

i think that my problem is more specific: when i write an HTML mail from gmail to my list, it gets archived properly, i.e., i can see the html as plain text. all is fine. however, when I sent an html mail via the web interface of the popular freemail gmx, the message's html get scrubbed as an attachment and nothing is schon besides the scrubbed attachment.

why is it working with html mails from gmail and why isn't it working with html mails from gmx?

Last question first. The mail from gmail is multipart/alternative with text/plain and text/html alternative parts. The archiver still scrubs the text/html part and replaces it with a link, but there is also the text/plain part which is archived inline.

The mail from gmx is html only, so there is no text/plain part to archive inline.

You can accomplish what you want in the archives using Mailman's content filtering, but this will also affect posts delivered to the list members. If that is acceptable, you want the following in the lists Content filtering settings.

filter_content = Yes

filter_mime_types -> totally empty, not even any whitespace

pass_mime_types -> at least the following 3 lines, maybe more if you want to allow other attachment types.

multipart text/plain text/html

filter_filename_extensions -> the default list is probably OK

pass_filename_extensions -> totally empty, not even any whitespace

collapse_alternatives -> probably Yes. If you make this No, for a multipart/alternative message with both text/plain and text/html alternatives, you will wind up with a message with both the original text/plain part and a second text/plain part containing Mailman's conversion of the text/html part.

convert_html_to_plaintext -> Yes

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Show replies by date

Peter Wetz

7 Oct 7 Oct

7:58 a.m.

New subject: HTML content from GMX gets scrubbed in archive

Thanks for your help. I remember reading about this configuration somewhere else on this list already and I already tried it to no avail.

of course, i tried it again now, and still, it doesn't work...

this is a screenshot of the configuration, as recommended by you: http://i.imgur.com/76egNgW.png for "collapse_alternatives" i tried both, yes and no. i have some other mime types in "pass_mime_types", however, i think they should do no harm? the "pass_filename_extensions" and "filter_mime_types" are definitely empty. i selected the text-boxes did a "mark all" and "delete" just to be 100% sure.

this is how it looks like in the archive. it is empty: http://i.imgur.com/1uIJo89.png

this is the message header source of the mail as received by list subscribers: http://pastebin.com/fM0rREKd it only contains the footer.

i don't know what's my issue here, since I exactly followed your proposed steps. i'd be glad if you could help.

do you think there could be some inconsistency between the actual config-files and what is showin in the text boxes? for instance, it could be the case that the parameters "pass_filename_extensions" and "filter_mime_types" are not empty in the config files themselves? since i dont have direct access to these files, i'd need to contact the admin. but i am not sure, if that's even a possibility.

best, Peter

On Tue, Oct 7, 2014 at 4:14 AM, Mark Sapiro <mark@msapiro.net> wrote:

...

On 10/06/2014 04:20 AM, Peter Wetz wrote:

...
So the answer to your simple question is simply "you can't".
maybe my use of "how can html-mails be properly displayed" was misleading. i dont want any html formatting to be preserved. i simply want an html mail be converted to plain text and then get shown in the archive without any scrubbed attachments. is that possible?

i think that my problem is more specific: when i write an HTML mail from gmail to my list, it gets archived properly, i.e., i can see the html as plain text. all is fine. however, when I sent an html mail via the web interface of the popular freemail gmx, the message's html get scrubbed as an attachment and nothing is schon besides the scrubbed attachment.

why is it working with html mails from gmail and why isn't it working with html mails from gmx?
Last question first. The mail from gmail is multipart/alternative with text/plain and text/html alternative parts. The archiver still scrubs the text/html part and replaces it with a link, but there is also the text/plain part which is archived inline.

The mail from gmx is html only, so there is no text/plain part to archive inline.

You can accomplish what you want in the archives using Mailman's content filtering, but this will also affect posts delivered to the list members. If that is acceptable, you want the following in the lists Content filtering settings.

filter_content = Yes

filter_mime_types -> totally empty, not even any whitespace

pass_mime_types -> at least the following 3 lines, maybe more if you want to allow other attachment types.

multipart text/plain text/html

filter_filename_extensions -> the default list is probably OK

pass_filename_extensions -> totally empty, not even any whitespace

collapse_alternatives -> probably Yes. If you make this No, for a multipart/alternative message with both text/plain and text/html alternatives, you will wind up with a message with both the original text/plain part and a second text/plain part containing Mailman's conversion of the text/html part.

convert_html_to_plaintext -> Yes

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Peter Wetz

1:02 p.m.

New subject: HTML content from GMX gets scrubbed in archive

UPDATE: lynx was missing on the machine mailman was running on. since i don't have root access (or at least i could not find out on my own, if lynx is running), it was quite hard for me to figure that one out. just after i read that some others on this list had problems with "blank messages in the archive after conversion of html to plain-text mails", i think this was something worth to investigate.

so now that lynx is installed and running, the html-to-plain-text conversion works.

one final question: since this requires content filtering to be turned on, i basically have to whitelist all mime-types i want to let through. is that right?

best, Peter

On Tue, Oct 7, 2014 at 4:14 AM, Mark Sapiro <mark@msapiro.net> wrote:

...

On 10/06/2014 04:20 AM, Peter Wetz wrote:

...
So the answer to your simple question is simply "you can't".
maybe my use of "how can html-mails be properly displayed" was misleading. i dont want any html formatting to be preserved. i simply want an html mail be converted to plain text and then get shown in the archive without any scrubbed attachments. is that possible?

i think that my problem is more specific: when i write an HTML mail from gmail to my list, it gets archived properly, i.e., i can see the html as plain text. all is fine. however, when I sent an html mail via the web interface of the popular freemail gmx, the message's html get scrubbed as an attachment and nothing is schon besides the scrubbed attachment.

why is it working with html mails from gmail and why isn't it working with html mails from gmx?
Last question first. The mail from gmail is multipart/alternative with text/plain and text/html alternative parts. The archiver still scrubs the text/html part and replaces it with a link, but there is also the text/plain part which is archived inline.

The mail from gmx is html only, so there is no text/plain part to archive inline.

You can accomplish what you want in the archives using Mailman's content filtering, but this will also affect posts delivered to the list members. If that is acceptable, you want the following in the lists Content filtering settings.

filter_content = Yes

filter_mime_types -> totally empty, not even any whitespace

pass_mime_types -> at least the following 3 lines, maybe more if you want to allow other attachment types.

multipart text/plain text/html

filter_filename_extensions -> the default list is probably OK

pass_filename_extensions -> totally empty, not even any whitespace

collapse_alternatives -> probably Yes. If you make this No, for a multipart/alternative message with both text/plain and text/html alternatives, you will wind up with a message with both the original text/plain part and a second text/plain part containing Mailman's conversion of the text/html part.

convert_html_to_plaintext -> Yes

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro

3:53 p.m.

New subject: HTML content from GMX gets scrubbed in archive

On 10/07/2014 06:02 AM, Peter Wetz wrote:

...

UPDATE: lynx was missing on the machine mailman was running on. since i don't have root access (or at least i could not find out on my own, if lynx is running), it was quite hard for me to figure that one out. just after i read that some others on this list had problems with "blank messages in the archive after conversion of html to plain-text mails", i think this was something worth to investigate.

If you have access to Mailman's logs, you would see errors about this in Mailman's 'error' log.

...

so now that lynx is installed and running, the html-to-plain-text conversion works.

Good.

N.B. I use elinks by setting

HTML_TO_PLAIN_TEXT_COMMAND = '/usr/bin/elinks -dump %(filename)s'

in mm_cfg.py. I like the plain text conversion a bit better.

...

one final question: since this requires content filtering to be turned on, i basically have to whitelist all mime-types i want to let through. is that right?

It depends what you want to do. If you want to pass everything and just do the html to plaintext conversion, you can set all 4 of filter_mime_types, pass_mime_types, filter_filename_extensions and pass_filename_extensions empty. Then nothing will be removed based on MIME type or filename extension.

Otherwise, you can either blacklist or whitelist using filter_mime_types or pass_mime_types respectively. The filters are applied in the following order.

If filter_mime_types is non-empty, any part with MIME type in filter_mime_types is removed. Then, if pass_mime_types is non-empty, any part with MIME type NOT in pass_mime_types is removed. Then the filename_extensions tests are applied in the same order to parts that have an associated filename.

Note also that entries in *_mime_types can be either 'maintype' or 'maintype/subtype' (as in e.g., 'image' or 'image/jpeg'). If it is just 'maintype' it will match all parts with that maintype regardless of subtype.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Peter Wetz

13 Oct 13 Oct

12:03 p.m.

New subject: HTML content from GMX gets scrubbed in archive

On Tue, Oct 7, 2014 at 5:53 PM, Mark Sapiro <mark@msapiro.net> wrote:

...

On 10/07/2014 06:02 AM, Peter Wetz wrote:

...
UPDATE: lynx was missing on the machine mailman was running on. since i don't have root access (or at least i could not find out on my own, if lynx is running), it was quite hard for me to figure that one out. just after i read that some others on this list had problems with "blank messages in the archive after conversion of html to plain-text mails", i think this was something worth to investigate.

If you have access to Mailman's logs, you would see errors about this in Mailman's 'error' log.

that's good to know. however, in my case, this would not have been possible.

...

so now that lynx is installed and running, the html-to-plain-text

...
conversion works.

Good.

N.B. I use elinks by setting

HTML_TO_PLAIN_TEXT_COMMAND = '/usr/bin/elinks -dump %(filename)s'

in mm_cfg.py. I like the plain text conversion a bit better.

...
one final question: since this requires content filtering to be turned on, i basically have to whitelist all mime-types i want to let through. is that right?

It depends what you want to do. If you want to pass everything and just do the html to plaintext conversion, you can set all 4 of filter_mime_types, pass_mime_types, filter_filename_extensions and pass_filename_extensions empty. Then nothing will be removed based on MIME type or filename extension.

Otherwise, you can either blacklist or whitelist using filter_mime_types or pass_mime_types respectively. The filters are applied in the following order.

If filter_mime_types is non-empty, any part with MIME type in filter_mime_types is removed. Then, if pass_mime_types is non-empty, any part with MIME type NOT in pass_mime_types is removed. Then the filename_extensions tests are applied in the same order to parts that have an associated filename.

Note also that entries in *_mime_types can be either 'maintype' or 'maintype/subtype' (as in e.g., 'image' or 'image/jpeg'). If it is just 'maintype' it will match all parts with that maintype regardless of subtype.

thanks for detailed explanation. makes perfect sense.

3722

Age (days ago)

3728

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Mark Sapiro
Peter Wetz

Re: [Mailman-Users] HTML content from GMX gets scrubbed in archive

Mark Sapiro

Peter Wetz

Peter Wetz

Mark Sapiro

Peter Wetz

tags

participants (2)