[Mailman-Users] Dealing with multiple charsets (list messages and web archive)

Mon May 12 01:30:10 CEST 2008

Stefan Förster wrote:
>
>* Mark Sapiro <mark at msapiro.net> wrote:
>
>> - signatures will get broken
>
>What kind of signatures do you mean?

PGP and other signed mail. Domain keys and DKIM.

>> - with multipart/alternative, the text/plain part will be aggregated
>> with the other text/plain parts and the text/html or other
>> alternatives will be separately attached.
>
>If this handler is called after MimeDel or Scrubber, there should be
>no more text/html parts left in the message. But then again, I'm not
>sure about that yet. Need to do more reading, I'm not sure yet where
>to add flatten.py.

Scrubber will turn anything into a single plain text message, so
calling this handler after Scrubber if scrub_nondigest is Yes will do
nothing. The main difference between this handler and Scrubber aside
from the fact that Scrubber is more robust is that this handler leaves
the 'other parts' attached to the message instead of storing them
aside and replacing them with links to the stored parts.

The handler could come anywhere between MimeDel and ToDigest, but
between MimeDel and Scrubber may make the most sense.

>> - text/plain parts without a specified charset will not be aggregated
>> but will be separately attached. This is a difficult issue because
>> many mainstream MUAs will attach an arbitrary .txt attachment without
>> specifying a charset. If you then assume it is say iso-8859-1 and
>> convert it to unicode and in fact it was euc-jp or koi8-r or even
>> utf-8, you can garble it irreversably.
>
>If a .txt file without encoding is attached, it is always look if the
>receiver will be able to read the file. I'd say "gzip it". Really.

So if I understand you correctly, you could assume per standards that
any text/plain part without a charset is us-ascii (or any other
particular charset). This could be accomplished by changing

        if part.get_content_type() == 'text/plain' and
part.get_content_charset():

to

        if part.get_content_type() == 'text/plain':

and

            cset = part.get_content_charset()

to

            cset = part.get_content_charset('us-ascii')

>> flatten.py is written so that it could be installed as is in Mailman as
>> a custom Handler.
>
>I will try this out tomorrow.
>
>> Note that this will not address separate attachment of headers and
>> footers. If the resultant 'flattened' message is multipart for any
>> reason, msg_header and msg_footer will still be attached as separate
>> MIME parts.
>
>After rebuilding the text parts, could we call "decorate" on the
>message before we attach any other parts?

That's a bit tricky. If you were to do this, then after calling
Decorate.process, you would need to set

    msgdata['nodecorate'] = True

so that when Decorate is called again by SMTPDirect, it will just
return. Also, if you are going to call Decorate from this handler, you
have a dilema regarding digests. If you call this handler before
ToDigest, then every message in the digest is decorated with
msg_header and msg_footer in addition to the digest itself being
decorated with digest_header and digest_footer. Of course, plain
digest messages are scrubbed anyway, so if you do defer this handler
until after ToDigest, you only have to be concerned about the MIME
digest.

You also won't be able to have any personalized substitutions in
msg_header or msg_footer because at this point, you aren't decorating
individual recipients messages.

The bigest problem may be that as flatten.py is written, there is no
point at which msg is the plain text message without attachments. You
would have to create a text/plain message without the attached parts,
pass that message to decorate and then add the other parts to the
decorated message. Or possibly easier, you could call Decorate at the
beginning before doing anything else, and then flatten the decorated
message.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan