[Mailman-i18n] How can we switch mailman to KOI8-R charset for Russian?

Sergey Maslennikov for-mm-at-python at rplab.ru
Sun Jul 2 10:25:20 EDT 2017


In general it works. Thank you very much. We, however, did not switch
mailman but added KOI8-R charset for Russian. Below there is a
description.

On Sun, 2017-06-25 at 09:18 -0700, Mark Sapiro wrote:

1) add this line to mm_cfg.py ...
> LC_DESCRIPTIONS['ru'] = ('Russian', 'koi8-r', 'ltr')
 
In principle, we may want to run multilingual (UTF-8 encoded) lists on
our machines. Therefore we did not replace Russian language. Instead we
added new one:

LC_DESCRIPTIONS['ru_koi8'] = ('Russian-KOI8', 'koi8-r', 'ltr')

KOI8-R is English/Russian charset and we usually mark both the languages
as possible options for a mailing list. List description may be
KOI8-R-encoded while default charset for English is UTF-8. In such a
case, if somebody chooses “English” as preferable language at
http(s)://lists.<domain>/listinfo/<list>, the list description becomes
unreadable. Therefore we

  a) added one more language:
     LC_DESCRIPTIONS['en_koi8'] = ('English-KOI8', 'koi8-r', 'ltr')

  b) created link in templates:
     ln -s en en_koi8

> 2) recode messages/ru/LC_MESSAGES/mailman.po with 'iconv f=utf-8
> t=koi8-r' and run 'msgfmt -o messages/ru/LC_MESSAGES/mailman.mo
> messages/ru/LC_MESSAGES/mailman.po' to recompile the message catalog.

We copied messages/ru to messages/ru_koi8 then, as you wrote, recoded
messages/ru_koi8/LC_MESSAGES/mailman.po by iconv and recompiled by
msgfmt.

> 3) recode all the templates in templates/ru

We recoded them to templates/ru_koi8.

As Mailman/Cgi/private.py tuned browsers to receive UTF-8 text (KOI8-R
Russian text was unreadable), we corrected that script to tune browsers
for the charset of the list preferred language.

In Mailman/Cgi/listinfo.py and Mailman/Cgi/admin.py there are functions
"overview" which list the lists and their descriptions. That functions
use the charset of the default server language while the charsets of the
descriptions may be different. For instance, if charset of the
description is KOI8-R, description is written in Russian, and charset of
default server language is UTF-8 then description is unreadable.
Therefore we changed that scripts to recode descriptions into charset of
the default server language in cases mailing list language charset !=
document (default server) language charset.

This way may be useful for those who want to use one byte (usually
bilingual (English/another language)) charsets. In particular it may be
useful for Russian users because the information density of Russian
language (information / character) is a bit lower than that of English
one [1] while each Russian symbol takes twice more memory in UTF-8 as
well as because of excessive computations when data are being
compressed, copied, stored, etc.

[1] http://www.kwintessential.co.uk/blog/translation/translation-text-expansion-how-it-affects-design

Sergey Maslennikov
Moscow



More information about the Mailman-i18n mailing list