[Mailman-Users] problem with accented characters, converting HTML to plain text

Mark Sapiro mark at msapiro.net
Mon Jul 20 20:04:08 CEST 2015


On 7/19/15 1:13 PM, Dominique Asselineau wrote:
> Hello,
> 
> When a e-mail in text/html content-type is converted in to plain text,
> the accented characters are not treated correctly.


There are potential issues with this. Mailman gets the content of the
text/html part and calls HTML_TO_PLAINTEXT_COMMAND (lynx -dump in the
default case) to convert the HTML to a plain text rendering and replaces
the content of the part with that and changes the Content-Type: to
text/plain while maintaining any charset= parameter.

Lynx normally does not recode any characters, so the output of lynx
-dump should be in the same charset is the input and it should be OK.

Problems arise if the input has characters represented as HTML entities
such as á or è. In this case, lynx outputs the characters
encoded in a charset which may not match the messages encoding.

In order to say more, I would need to see a raw message as sent to the
list with all headers intact and the resultant raw message from the list
with all headers intact.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list