language encoding for archives
Hello,
I've noticed that changing the language preference for a list will change the character set for the list pages, but not for the archives. Do I need to modify the archive templates in order to have the archive pages use the same character set? Or is there another way?
Thanks, Kristina
kristina clair wrote:
I've noticed that changing the language preference for a list will change the character set for the list pages, but not for the archives. Do I need to modify the archive templates in order to have the archive pages use the same character set? Or is there another way?
It depends which archive pages you're talking about. The table of contents is rebuilt with every message and should be OK. The periodic (monthly) index pages are only rebuilt when a post is added so old periods are not usually updated. The html message pages themselves are never updated.
Thus if you make changes that affect these such as changing the list language, you have to rebuild the archives to update most of the html pages. See
bin/arch --help
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 4/27/06, Mark Sapiro <msapiro@value.net> wrote:
kristina clair wrote:
I've noticed that changing the language preference for a list will change the character set for the list pages, but not for the archives. Do I need to modify the archive templates in order to have the archive pages use the same character set? Or is there another way?
It depends which archive pages you're talking about. The table of contents is rebuilt with every message and should be OK. The periodic (monthly) index pages are only rebuilt when a post is added so old periods are not usually updated. The html message pages themselves are never updated.
Thus if you make changes that affect these such as changing the list language, you have to rebuild the archives to update most of the html pages. See
bin/arch --help
Hmmm, I just regenerated the archives, and the table of contents page looks the same. According to the html header, the page is using charset utf-8, so the text is just a lot of symbols. The main list info page has cyrillic characters, but no charset specified in the html. I think what this particular list manager wants is for the archive page to show cyrillic characters - is that possible?
I'm not entirely sure which language he has selected because everything is in the other language (even the output of bin/config_list?), but if I'm reading the cyrillic right then it might be Ukrainian.
Thanks, Kristina
kristina clair wrote:
On 4/27/06, Mark Sapiro <msapiro@value.net> wrote:
It depends which archive pages you're talking about. The table of contents is rebuilt with every message and should be OK. The periodic (monthly) index pages are only rebuilt when a post is added so old periods are not usually updated. The html message pages themselves are never updated.
Thus if you make changes that affect these such as changing the list language, you have to rebuild the archives to update most of the html pages. See
bin/arch --help
Hmmm, I just regenerated the archives, and the table of contents page looks the same.
As I said above, the table of contents is rebuilt with every message, so rebuilding the archive won't change it unless there were no messages since the language change.
According to the html header, the page is using charset utf-8, so the text is just a lot of symbols. The main list info page has cyrillic characters, but no charset specified in the html. I think what this particular list manager wants is for the archive page to show cyrillic characters - is that possible?
I'm not entirely sure which language he has selected because everything is in the other language (even the output of bin/config_list?), but if I'm reading the cyrillic right then it might be Ukrainian.
If the list language is Ukranian, the character set should be utf-8. One potential issue is that the web server can be overriding this with its own Content-Transfer-Encoding: header that specifies something other than utf-8.
See the material near the bottom of the page at <http://www.list.org/mailman-install/node10.html>.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
If the list language is Ukranian, the character set should be utf-8. One potential issue is that the web server can be overriding this with its own Content-Transfer-Encoding: header that specifies something other than utf-8.
Hi,
I am still having some trouble understanding encoding and archives. You were correct that the webserver was overriding the utf-8 encoding on the archive pages, so the contents pages look great now.
What I'm still having a problem with is the display of some messages on the archive pages. Some of the messages appear correctly in cyrillic, but some of the messages do not. The archive menus and headers, etc, appear with the correct character set - it is just the message itself which in some cases appears as gibberish. Looking at the html source of the archive pages, it seems like the message content is inserted into the archive page with <PRE> tags? The list administrator claims that when the emails are sent to the list, they all appear with the correct character set. I am wondering, though, if the problem could be the the email programs of some list members are setting the character set differently such that it is not getting through to mailman.
Sorry to ask such a vague question, but I'm just trying to get a handle on how messages with different character sets get into the archive pages. What factors could cause them to be displayed with the incorrect character set on the archive pages?
Thanks, Kristina
"kristina clair" wrote:
What I'm still having a problem with is the display of some messages on the archive pages. Some of the messages appear correctly in cyrillic, but some of the messages do not. The archive menus and headers, etc, appear with the correct character set - it is just the message itself which in some cases appears as gibberish. Looking at the html source of the archive pages, it seems like the message content is inserted into the archive page with <PRE> tags?
Yes. This is how it's done.
Prior to this, the message is processed by Mailman/Handlers/Scrubber.py (unless replaced by setting ARCHIVE_SCRUBBER in mm_cfg.py). Scrubber removes non-text and "character set unspecified" text attachments and replaces them with a link to a separate file where they are stored.
Scrubber then converts all remaining text parts (in the multipart case) from their specified character set to the character set of the list.
If the message is a single text/plain part (not MIME multipart), Scrubber doesn't change it. In this case HyperArch.py attempts to convert the character set of the message, but if it is unspecified, it is assumed to be that of the list, and if it isn't, the message will be garbled.
You need to find one of these 'garbled' messages in the archives/private/listname.mbox/listname.mbox file. This should be the raw message as sent by Mailman to the list. You may be able to see from this message what the issue is. If not, post the raw message, and we will try to help.
The list administrator claims that when the emails are sent to the list, they all appear with the correct character set. I am wondering, though, if the problem could be the the email programs of some list members are setting the character set differently such that it is not getting through to mailman.
Could be. See above for how to check.
Sorry to ask such a vague question, but I'm just trying to get a handle on how messages with different character sets get into the archive pages. What factors could cause them to be displayed with the incorrect character set on the archive pages?
Incorrect character set specification on the message or a sub-part or no character set specification at all on a text/plain message. Maybe other things too.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Sorry to ask such a vague question, but I'm just trying to get a handle on how messages with different character sets get into the archive pages. What factors could cause them to be displayed with the incorrect character set on the archive pages?
Incorrect character set specification on the message or a sub-part or no character set specification at all on a text/plain message. Maybe other things too.
Thanks so much! I discovered that messages that are using one character set (charset="windows-1251") are appearing correctly and messages that are using another character set (charset=iso-8859-1) are not appearing correctly.
Is the only fix for this for the list members to use the character set which is appearing correctly after mailman processing?
Kristina
kristina clair wrote:
Thanks so much! I discovered that messages that are using one character set (charset="windows-1251") are appearing correctly and messages that are using another character set (charset=iso-8859-1) are not appearing correctly.
Is the only fix for this for the list members to use the character set which is appearing correctly after mailman processing?
Sorry for not answering sooner. This one got buried in my inbox.
Those users who are sending charset=iso-8859-1 need to do something different in their MTAs or with the setup of their computers. The iso-8859-1 character set is also known as Latin-1 or western. It does not contain encodings for cyrillic characters. The ISO character set with cyrillic characters is iso-8859-5. Windows-1251 is also a cyrillic character set.
See for example <http://czyborra.com/charsets/iso8859.html> for descriptions of the various iso-8859 character sets and <http://czyborra.com/charsets/cyrillic.html> for various cyrillic character sets.
Apparently your problem users are using a cyrillic setting on their computers so that they type and see cyrillic characters on their displays, but their MUA's are mis-identifying this as iso-8859-1. This is basically a setup issue on the sender's computers.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
kristina clair
-
Mark Sapiro