Hi, I've just noticed that Mailman sends wrongly encoded German mails for admin approval of subscriptions that looks like this --- Ihre Genehmigung ist für den folgenden Abonnementswunsch erforderlich: Für: xxx@lrz.de Liste: yyy@lists.lrz.de Bitte besuchen Sie bei Gelegenheit https://lists.lrz.de/mailman/admindb/yyyy um diese Anfrage zu beantworten. --- The charset of the mail is UTF-8, and templates/de/subauth.txt is also UTF-8, but the text looks like the UTF-8 template has been parsed as ISO8859-1 and recoded into UTF-8 again. Recoding templates/de/subauth.txt to ISO8859-1 fixes the issue. We have a lot of different encodings even within the same language and filetype mailman/templates/de (1) % file * adminaddrchgack.txt: UTF-8 Unicode text admindbdetails.html: HTML document, ASCII text admindbpreamble.html: HTML document, ASCII text admindbsummary.html: HTML document, ASCII text adminsubscribeack.txt: ASCII text, with no line terminators adminunsubscribeack.txt: ASCII text admlogin.html: HTML document, ASCII text approve.txt: ISO-8859 text archidxentry.html: HTML document, ASCII text archidxfoot.html: HTML document, ASCII text archidxhead.html: HTML document, ASCII text archlistend.html: ASCII text archliststart.html: HTML document, ASCII text archtocentry.html: HTML document, ASCII text archtoc.html: HTML document, ASCII text archtocnombox.html: HTML document, ASCII text article.html: HTML document, ASCII text bounce.txt: ISO-8859 text checkdbs.txt: ISO-8859 text convert.txt: ISO-8859 text cronpass.txt: ISO-8859 text disabled.txt: ISO-8859 text emptyarchive.html: HTML document, ASCII text headfoot.html: HTML document, ASCII text help.txt: ISO-8859 text invite.txt: ISO-8859 text listinfo.html: HTML document, ASCII text masthead.txt: UTF-8 Unicode text newlist.txt: ISO-8859 text nomoretoday.txt: UTF-8 Unicode text options.html: HTML document, ASCII text postack.txt: ASCII text postauth.txt: ISO-8859 text postheld.txt: ISO-8859 text private.html: HTML document, ASCII text probe.txt: UTF-8 Unicode text refuse.txt: UTF-8 Unicode text roster.html: HTML document, ASCII text subauth.txt: UTF-8 Unicode text subscribeack.txt: ISO-8859 text subscribe.html: HTML document, ASCII text unsubauth.txt: ASCII text unsub.txt: ISO-8859 text userpass.txt: ISO-8859 text verify.txt: ISO-8859 text I don't quite get the code, but it looks like at least *.txt should be ISO8859-1 at the moment. Best Regards, Bernhard -- Bernhard Schmidt Netzbetrieb / IPv6 / DNSSEC Leibniz-Rechenzentrum Leibniz Supercomputing Centre Boltzmannstr. 1 D-85748 Garching b. Muenchen Tel: +49 89 35831-7885 E-Mail/Jabber: Bernhard.Schmidt@lrz.de
On 07/13/2016 03:56 AM, Bernhard Schmidt wrote:
I've just noticed that Mailman sends wrongly encoded German mails for admin approval of subscriptions that looks like this
...
We have a lot of different encodings even within the same language and filetype
mailman/templates/de (1) % file * adminaddrchgack.txt: UTF-8 Unicode text admindbdetails.html: HTML document, ASCII text admindbpreamble.html: HTML document, ASCII text admindbsummary.html: HTML document, ASCII text adminsubscribeack.txt: ASCII text, with no line terminators adminunsubscribeack.txt: ASCII text admlogin.html: HTML document, ASCII text approve.txt: ISO-8859 text archidxentry.html: HTML document, ASCII text archidxfoot.html: HTML document, ASCII text archidxhead.html: HTML document, ASCII text archlistend.html: ASCII text archliststart.html: HTML document, ASCII text archtocentry.html: HTML document, ASCII text archtoc.html: HTML document, ASCII text archtocnombox.html: HTML document, ASCII text article.html: HTML document, ASCII text bounce.txt: ISO-8859 text checkdbs.txt: ISO-8859 text convert.txt: ISO-8859 text cronpass.txt: ISO-8859 text disabled.txt: ISO-8859 text emptyarchive.html: HTML document, ASCII text headfoot.html: HTML document, ASCII text help.txt: ISO-8859 text invite.txt: ISO-8859 text listinfo.html: HTML document, ASCII text masthead.txt: UTF-8 Unicode text newlist.txt: ISO-8859 text nomoretoday.txt: UTF-8 Unicode text options.html: HTML document, ASCII text postack.txt: ASCII text postauth.txt: ISO-8859 text postheld.txt: ISO-8859 text private.html: HTML document, ASCII text probe.txt: UTF-8 Unicode text refuse.txt: UTF-8 Unicode text roster.html: HTML document, ASCII text subauth.txt: UTF-8 Unicode text subscribeack.txt: ISO-8859 text subscribe.html: HTML document, ASCII text unsubauth.txt: ASCII text unsub.txt: ISO-8859 text userpass.txt: ISO-8859 text verify.txt: ISO-8859 text
I don't quite get the code, but it looks like at least *.txt should be ISO8859-1 at the moment.
Thank you for the report. As you surmise, all the .txt files should be iso-8859-1 encoded, not utf-8. ASCII text is OK as that is a subset of iso-8859-1. I have reported this at <https://bugs.launchpad.net/mailman/+bug/1602779> and fixed it for the next release. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Am 13.07.2016 um 19:17 schrieb Mark Sapiro: Hi
Thank you for the report. As you surmise, all the .txt files should be iso-8859-1 encoded, not utf-8. ASCII text is OK as that is a subset of iso-8859-1. I have reported this at <https://bugs.launchpad.net/mailman/+bug/1602779> and fixed it for the next release.
Thanks a lot. For my understanding, is there a per-language default charset somewhere in the code I've missed? There are several more UTF-8 .txt files in other languages, some of which cannot be represented with ISO8859 (zh_CN for example). Bernhard
On 07/13/2016 10:23 AM, Bernhard Schmidt wrote:
Thanks a lot. For my understanding, is there a per-language default charset somewhere in the code I've missed? There are several more UTF-8 .txt files in other languages, some of which cannot be represented with ISO8859 (zh_CN for example).
There is a table at the end of Defaults.py which defines the supported languages and their Mailman character sets. Many languages are already utf-8 encoded. You might think that changing the character set for a language is a simple matter of just redefining the character set and recoding the message catalog and templates, but it's more complicated than that. See the thread at <https://mail.python.org/pipermail/mailman-users/2016-January/080275.html> and the bug report at <https://bugs.launchpad.net/mailman/+bug/1462755/>. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Bernhard Schmidt
-
Mark Sapiro