[Mailman-Users] Dealing with multiple charsets (list messages and web archive)

Stefan Förster cite at incertum.net
Sun May 11 20:38:11 CEST 2008

Hello Mark,

first of all, thank you very much for your help. This looks very
promising indeed.

* Mark Sapiro <mark at msapiro.net> wrote:
> What you want is more like the attached flatten.py.txt file (.txt added
> for content filtering). Note that this is far from production quality
> and probably doesn't even work on some messages.

I will perform a full set of tests then - would have done anyways.
Thanks for the warning, though.

> Problems I am aware of are things like
> - no i18n for canned text strings

Hm, I think I can handle that. After all, you already showed me how to
do this ;-)

> - signatures will get broken

What kind of signatures do you mean?

> - with multipart/alternative, the text/plain part will be aggregated
> with the other text/plain parts and the text/html or other
> alternatives will be separately attached.

If this handler is called after MimeDel or Scrubber, there should be
no more text/html parts left in the message. But then again, I'm not
sure about that yet. Need to do more reading, I'm not sure yet where
to add flatten.py.

> - text/plain parts without a specified charset will not be aggregated
> but will be separately attached. This is a difficult issue because
> many mainstream MUAs will attach an arbitrary .txt attachment without
> specifying a charset. If you then assume it is say iso-8859-1 and
> convert it to unicode and in fact it was euc-jp or koi8-r or even
> utf-8, you can garble it irreversably.

If a .txt file without encoding is attached, it is always look if the
receiver will be able to read the file. I'd say "gzip it". Really.

> flatten.py is written so that it could be installed as is in Mailman as
> a custom Handler.

I will try this out tomorrow.

> Note that this will not address separate attachment of headers and
> footers. If the resultant 'flattened' message is multipart for any
> reason, msg_header and msg_footer will still be attached as separate
> MIME parts.

After rebuilding the text parts, could we call "decorate" on the
message before we attach any other parts?

> The basic flow in the process is
[very clear explanation cut]

I think I'm beginning to like Python.

Stefan Förster     http://www.incertum.net/     Public Key: 0xBBE2A9E9
FdI #186: Admin-Handy - Elektronisches Würgehalsband (Holger Köpke)

More information about the Mailman-Users mailing list