New subject: problem with accented characters, converting HTML to plain text

July 21, 2015 · *charset="?([-a-zA-Z0-9_]*


      In a message of Mon, 20 Jul 2015 11:04:08 -0700, Mark Sapiro writes:
...
On 7/19/15 1:13 PM, Dominique Asselineau wrote:
...
Hello,
When a e-mail in text/html content-type is converted in to plain text,
the accented characters are not treated correctly.
There are potential issues with this. Mailman gets the content of the
text/html part and calls HTML_TO_PLAINTEXT_COMMAND (lynx -dump in the
default case) to convert the HTML to a plain text rendering and replaces
the content of the part with that and changes the Content-Type: to
text/plain while maintaining any charset= parameter.
Lynx normally does not recode any characters, so the output of lynx
-dump should be in the same charset is the input and it should be OK.
Problems arise if the input has characters represented as HTML entities
such as á or è. In this case, lynx outputs the characters
encoded in a charset which may not match the messages encoding.
In order to say more, I would need to see a raw message as sent to the
list with all headers intact and the resultant raw message from the list
with all headers intact.
--
Mark Sapiro <mark@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
I had enough trouble with lynx over this -- it used to be how I
converted all html mail my mail reader saw, but such characters
are not rare in the mail I receive -- that I gave up on lynx.
My new rule in my mailer for how to display html text is:
w3m -dump -o display_link_number=1 -cols 78 -T text/html -
I "$(echo %a | sed -r 's/.*charset="?([-a-zA-Z0-9_]*).*/\1/')" -O utf-8 | less
which is one heck of a mouthful, but hasn't caused me any problems since.
Just in case somebody else wants to ditch lynx ...
Laura

Re: [Mailman-Users] problem with accented characters, converting HTML to plain text

Laura Creighton

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro

tags

participants (2)