[Mailman-Users] language encoding for archives

Mark Sapiro msapiro at value.net
Fri May 26 18:11:52 CEST 2006

"kristina clair" wrote:
>What I'm still having a problem with is the display of some messages
>on the archive pages.  Some of the messages appear correctly in
>cyrillic, but some of the messages do not.  The archive menus and
>headers, etc, appear with the correct character set - it is just the
>message itself which in some cases appears as gibberish.
>Looking at the html source of the archive pages, it seems like the
>message content is inserted into the archive page with <PRE> tags?

Yes. This is how it's done.

Prior to this, the message is processed by Mailman/Handlers/Scrubber.py
(unless replaced by setting ARCHIVE_SCRUBBER in mm_cfg.py). Scrubber
removes non-text and "character set unspecified" text attachments and
replaces them with a link to a separate file where they are stored.

Scrubber then converts all remaining text parts (in the multipart case)
from their specified character set to the character set of the list.

If the message is a single text/plain part (not MIME multipart),
Scrubber doesn't change it. In this case HyperArch.py attempts to
convert the character set of the message, but if it is unspecified, it
is assumed to be that of the list, and if it isn't, the message will
be garbled.

You need to find one of these 'garbled' messages in the
archives/private/listname.mbox/listname.mbox file. This should be the
raw message as sent by Mailman to the list. You may be able to see
from this message what the issue is. If not, post the raw message, and
we will try to help.

>The list administrator claims that when the emails are sent to the
>list, they all appear with the correct character set.  I am wondering,
>though, if the problem could be the the email programs of some list
>members are setting the character set differently such that it is not
>getting through to mailman.

Could be. See above for how to check.

>Sorry to ask such a vague question, but I'm just trying to get a
>handle on how messages with different character sets get into the
>archive pages.  What factors could cause them to be displayed with the
>incorrect character set on the archive pages?

Incorrect character set specification on the message or a sub-part or
no character set specification at all on a text/plain message. Maybe
other things too.

Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

More information about the Mailman-Users mailing list