How can we switch mailman to KOI8-R charset for Russian?
Hi All, We tried to upgrade Mailman from v. 2.1.9 to v. 2.1.20 (default for Ubuntu 16.04) and discovered that for Russian language, mailman scripts tuned the browsers to receive UTF-8 encoded text. How can we turn mailman back to KOI8-R? Our problem is that Russian symbols take 2 bytes each in UTF-8. We do not want to inflate our lists twice. We want to use UTF-8 _only_ for very mixed (English/Russian/Chinese, for instance) texts/lists, while for English/Russian texts/lists we want to use one byte per symbol encodings like KOI8-R or CP1251. Sergey Maslennikov
On 6/25/17 2:15 AM, Sergey Maslennikov wrote:
We tried to upgrade Mailman from v. 2.1.9 to v. 2.1.20 (default for Ubuntu 16.04) and discovered that for Russian language, mailman scripts tuned the browsers to receive UTF-8 encoded text. How can we turn mailman back to KOI8-R?
There are multiple things going on here. In Mailman 2.1.19, the character set for Russian (and Romanian) was changed to UTF-8. This was described in the NEWS for 2.1.19 as follows:
- Mailman's character set for Russian has been changed from koi8-r to utf-8 and the templates and messages recoded. This change will require running 'bin/arch --wipe' on any existing Russian language lists in order to recode the list's archives, and will require recoding any edited templates in lists/LISTNAME/ru/*, templates/DOMAIN/ru/* and templates/site/ru/*. It may also require recoding any existing koi8-r text in list attributes. (LP: #1418448)
- Mailman's versions.py has been augmented to help with the above two character set changes. The first time a list with preferred_language of Romanian or Russian is accessed or upon upgrade to this release, any list attributes which have string values such as description, info, welcome_msg, etc. that appear to be in the old character set will be converted to utf-8. This is done recursively for the values (but not the keys) of dictionary attributes and the elements of list and tuple attributes.
Independently, Debian (upon which Ubuntu is based) responded to <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=398777> with <https://sources.debian.net/patches/mailman/1:2.1.18-2%2Bdeb8u1/91_utf8.patch/> which forces all languages to UTF-8 encoding and because it neglects to do the list attribute string recoding, caused other issues. See <https://mail.python.org/pipermail/mailman-users/2016-January/080278.html> and <https://bugs.launchpad.net/mailman/+bug/1462755>. Probably the least disruptive way for you to reverse this is the following: 1) add this line to mm_cfg.py LC_DESCRIPTIONS['ru'] = ('Russian', 'koi8-r', 'ltr') You can't use add_language here because the Debian patch redefines that to force utf-8. 2) recode messages/ru/LC_MESSAGES/mailman.po with `iconv f=utf-8 t=koi8-r' and run 'msgfmt -o messages/ru/LC_MESSAGES/mailman.mo messages/ru/LC_MESSAGES/mailman.po' to recompile the message catalog. 3) recode all the templates in templates/ru -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
In general it works. Thank you very much. We, however, did not switch mailman but added KOI8-R charset for Russian. Below there is a description. On Sun, 2017-06-25 at 09:18 -0700, Mark Sapiro wrote: 1) add this line to mm_cfg.py ...
LC_DESCRIPTIONS['ru'] = ('Russian', 'koi8-r', 'ltr')
In principle, we may want to run multilingual (UTF-8 encoded) lists on our machines. Therefore we did not replace Russian language. Instead we added new one: LC_DESCRIPTIONS['ru_koi8'] = ('Russian-KOI8', 'koi8-r', 'ltr') KOI8-R is English/Russian charset and we usually mark both the languages as possible options for a mailing list. List description may be KOI8-R-encoded while default charset for English is UTF-8. In such a case, if somebody chooses “English” as preferable language at http(s)://lists.<domain>/listinfo/<list>, the list description becomes unreadable. Therefore we a) added one more language: LC_DESCRIPTIONS['en_koi8'] = ('English-KOI8', 'koi8-r', 'ltr') b) created link in templates: ln -s en en_koi8
2) recode messages/ru/LC_MESSAGES/mailman.po with 'iconv f=utf-8 t=koi8-r' and run 'msgfmt -o messages/ru/LC_MESSAGES/mailman.mo messages/ru/LC_MESSAGES/mailman.po' to recompile the message catalog.
We copied messages/ru to messages/ru_koi8 then, as you wrote, recoded messages/ru_koi8/LC_MESSAGES/mailman.po by iconv and recompiled by msgfmt.
3) recode all the templates in templates/ru
We recoded them to templates/ru_koi8. As Mailman/Cgi/private.py tuned browsers to receive UTF-8 text (KOI8-R Russian text was unreadable), we corrected that script to tune browsers for the charset of the list preferred language. In Mailman/Cgi/listinfo.py and Mailman/Cgi/admin.py there are functions "overview" which list the lists and their descriptions. That functions use the charset of the default server language while the charsets of the descriptions may be different. For instance, if charset of the description is KOI8-R, description is written in Russian, and charset of default server language is UTF-8 then description is unreadable. Therefore we changed that scripts to recode descriptions into charset of the default server language in cases mailing list language charset != document (default server) language charset. This way may be useful for those who want to use one byte (usually bilingual (English/another language)) charsets. In particular it may be useful for Russian users because the information density of Russian language (information / character) is a bit lower than that of English one [1] while each Russian symbol takes twice more memory in UTF-8 as well as because of excessive computations when data are being compressed, copied, stored, etc. [1] http://www.kwintessential.co.uk/blog/translation/translation-text-expansion-... Sergey Maslennikov Moscow
participants (2)
-
Mark Sapiro
-
Sergey Maslennikov