Re: [Mailman-Users] Encoding problem with 2.15 to 2.18 upgrade with Finnish

I tried to approach this problem by getting the source package from the debian repository. We are still running wheezy, so we need to stick to 2.1.18 for now.
When I built the package, the resulting mailman.mo file for Finnish ends up utf-8 encoded (which I deduced by converting it back to a .po file by msgunfmt. The encoding seems to happen in debian/patches/91_utf8.patch which does have a list of correct encodings, but this is what seems to happen: (not only to Finnish, but to all listed languages):
msgconv -t utf-8 fi/LC_MESSAGES/mailman.po | tee fi/LC_MESSAGES/mailman.po.utf-8 | msgfmt -o fi/LC_MESSAGES/mailman.mo -
Also: http://sources.debian.net/patches/patch/mailman/1:2.1.18-2/91_utf8.patch/ <https://mail.google.com/>
If this means that debian is forcing everything to utf-8, what should I do, if every package comes with this forced utf-8 encoding?
Here is the trackback we got in October:
Oct 28 10:04:28 2015 (22144) Uncaught runner exception: 'utf8' codec can't decode byte 0xe4 in position 11: invalid continuation byt e Oct 28 10:04:28 2015 (22144) Traceback (most recent call last): File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop self._onefile(msg, msgdata) File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose more = self._dopipeline(mlist, msg, msgdata, pipeline) File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline sys.modules[modname].process(mlist, msg, msgdata) File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 239, in process i18ndesc = uheader(mlist, mlist.description, 'List-Id', maxlinelen=998) File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 65, in uheader return Header(s, charset, maxlinelen, header_name, continuation_ws) File "/usr/lib/python2.7/email/header.py", line 183, in __init__ self.append(s, charset, errors) File "/usr/lib/python2.7/email/header.py", line 267, in append ustr = unicode(s, incodec, errors) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 11: invalid continuation byte
- Eva Isaksson
2016-01-05 21:29 GMT+02:00 Mark Sapiro <mark@msapiro.net>:

On 01/07/2016 04:40 AM, Eva Isaksson wrote:
This is a Debian issue. The FAQ at <http://wiki.list.org/x/12812344> addresses that, but probably won't be much comfort to you.
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
More below ...
This is exactly the issue described at <https://bugs.launchpad.net/mailman/+bug/1462755>. Note that the person who reported that is Thijs Kinkhorst who is the Debian maintainer of Mailman, so they are well aware of this issue yet they continue to distribute a package which causes this issue for their users. See in particular <https://bugs.launchpad.net/mailman/+bug/1462755/comments/3>. Had I realized when I wrote that that it was Debian and not the user that had changed the encoding for the language, that reply would have been much stronger.
The problem in your case is that even though the character set for Finnish in your Mailman is UTF-8, your list(s) still have strings in their attributes which are ISO-8859-1 encoded. In this particular case the list's General Options -> description contains one or more characters encoded in ISO-8859-1.
I think simply re-entering the description and saving changes will fix this, but you should probably go through the entire web admin UI and do the same for any other non-ascii strings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/07/2016 11:44 AM, Mark Sapiro wrote:
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
Just to be clear, by install from source I mean our source at e.g. <https://launchpad.net/mailman/+download>, not some packager's source package.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

2016-01-07 22:12 GMT+02:00 Mark Sapiro <mark@msapiro.net>: On 01/07/2016 11:44 AM, Mark Sapiro wrote:
Am I right in thinking that this option would mean choosing the debian utf-8 alternative? We have some 750 lists, so checking all of them would have to happen outside the web admin UI, i.e. from command line.
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
Does this allow for keeping all the settings and passwords from the old package installation intact? The number of lists is so big that I want to be on the safe side.
- Eva Isaksson

On 01/07/2016 12:40 PM, Eva Isaksson wrote:
You can keep the Debian package with it's utf-8 character sets and programmatically change all your lists so they work. In MM 2.1.19 I changed the character sets for Russian and Romanian from koi8-r and iso-8859-2 respectively to utf-8 because I was convinced that the former character sets were not appropriate for those languages. In so doing, I augmented Mailman's version updater to identify strings in list attributes that were not valid utf-8 and convert them from the former character set.
This is an imperfect process. It does guarantee that all the strings in list attributes are valid utf-8 encodings, but not necessarily that they are 'correct'.
I am working on a stand-alone script to do this. I will post it later today.
Yes. Installing the latest GNU Mailman source over the Debian package following <http://wiki.list.org/x/17891606> will not change any list data, but there may be other issues depending on what if any Debian patches you choose to install.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/07/2016 01:20 PM, Mark Sapiro wrote:
There is now a first cut at this script. The functions that do the recoding are lifted straight from Mailman/versions.py so they have been fairly well exercised. I didn't just import them because I wanted the caller to know if they had changed anything and that required a minor addition to the top level function to return a flag.
The script is at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>)
Install the script in Mailman's bin/ directory and run it with the -h option for full info. It uses Python's argparse function and as such requires python 2.7.
In your case, to convert all Finnish preferred_language lists strings from iso-8859-1 encoding to utf-8
recode_list --language=fi
would suffice. There are options to specify other than iso-8859-1 to utf-8 and to produce output about skipped or unchanged lists, but the defaults should be good for you and will report which lists were changed. In no case does it report the actual changes, just what lists were changed.
I recommend making backups of your lists/*/config.pck files before running this in case something goes wrong.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

I did the 2.1.18 upgrade this weekend, after some weighing of the options. In the long run, using the Debian package is less work intensive, so in the end my choice was the utf-8 conversion.
The script is at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>)
To cut the long story short, the script worked without any problems. Taking backups took more time than running the script and doing the package upgrade.
Such a relief - thank you.
- Eva Isaksson
2016-01-08 2:55 GMT+02:00 Mark Sapiro <mark@msapiro.net>:

On 01/16/2016 06:24 AM, Eva Isaksson wrote:
I'm glad the script worked for you.
I have mentioned this issue to the Debian people who are responsible for the Mailman package, but it might be helpful if you report your experiences to Debian. They don't seem to realize that they are potentially causing this pain for large numbers of users of their Mailman package.
While I have "fixed" <https://bugs.launchpad.net/mailman/+bug/1462755>, that fix will not be released until 2.1.21 (soon I hope) and it only addresses the symptom and only in one place. Potentially, other strings can cause the same issue.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 1/7/2016 2:40 PM, Eva Isaksson wrote:
I have posted this before, and I will post it again. When I was an administrator for a Mailman system running on Ubuntu, I decided that the Debian package had problems. Besides having patches that were undocumented, there was one patch that reomoved a library that sometimes was needed. So I figured out how to create a Mailman package from the SourceForge source. The last Mailman package I built was 2.1.15; I assume that the same process would work for the latest Mailman. What I did is available to anyone who wants it; send me a private e-mail to get the output from "script" of the session when I built the last package. The only Debian patch I kept was one that placed libraries in the proper directories for Debian/Ubuntu.
--Barry Finkel

On 01/07/2016 04:40 AM, Eva Isaksson wrote:
This is a Debian issue. The FAQ at <http://wiki.list.org/x/12812344> addresses that, but probably won't be much comfort to you.
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
More below ...
This is exactly the issue described at <https://bugs.launchpad.net/mailman/+bug/1462755>. Note that the person who reported that is Thijs Kinkhorst who is the Debian maintainer of Mailman, so they are well aware of this issue yet they continue to distribute a package which causes this issue for their users. See in particular <https://bugs.launchpad.net/mailman/+bug/1462755/comments/3>. Had I realized when I wrote that that it was Debian and not the user that had changed the encoding for the language, that reply would have been much stronger.
The problem in your case is that even though the character set for Finnish in your Mailman is UTF-8, your list(s) still have strings in their attributes which are ISO-8859-1 encoded. In this particular case the list's General Options -> description contains one or more characters encoded in ISO-8859-1.
I think simply re-entering the description and saving changes will fix this, but you should probably go through the entire web admin UI and do the same for any other non-ascii strings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/07/2016 11:44 AM, Mark Sapiro wrote:
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
Just to be clear, by install from source I mean our source at e.g. <https://launchpad.net/mailman/+download>, not some packager's source package.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

2016-01-07 22:12 GMT+02:00 Mark Sapiro <mark@msapiro.net>: On 01/07/2016 11:44 AM, Mark Sapiro wrote:
Am I right in thinking that this option would mean choosing the debian utf-8 alternative? We have some 750 lists, so checking all of them would have to happen outside the web admin UI, i.e. from command line.
You could install Mailman from source. See the FAQ at <http://wiki.list.org/x/17891606> for info.
Does this allow for keeping all the settings and passwords from the old package installation intact? The number of lists is so big that I want to be on the safe side.
- Eva Isaksson

On 01/07/2016 12:40 PM, Eva Isaksson wrote:
You can keep the Debian package with it's utf-8 character sets and programmatically change all your lists so they work. In MM 2.1.19 I changed the character sets for Russian and Romanian from koi8-r and iso-8859-2 respectively to utf-8 because I was convinced that the former character sets were not appropriate for those languages. In so doing, I augmented Mailman's version updater to identify strings in list attributes that were not valid utf-8 and convert them from the former character set.
This is an imperfect process. It does guarantee that all the strings in list attributes are valid utf-8 encodings, but not necessarily that they are 'correct'.
I am working on a stand-alone script to do this. I will post it later today.
Yes. Installing the latest GNU Mailman source over the Debian package following <http://wiki.list.org/x/17891606> will not change any list data, but there may be other issues depending on what if any Debian patches you choose to install.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/07/2016 01:20 PM, Mark Sapiro wrote:
There is now a first cut at this script. The functions that do the recoding are lifted straight from Mailman/versions.py so they have been fairly well exercised. I didn't just import them because I wanted the caller to know if they had changed anything and that required a minor addition to the top level function to return a flag.
The script is at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>)
Install the script in Mailman's bin/ directory and run it with the -h option for full info. It uses Python's argparse function and as such requires python 2.7.
In your case, to convert all Finnish preferred_language lists strings from iso-8859-1 encoding to utf-8
recode_list --language=fi
would suffice. There are options to specify other than iso-8859-1 to utf-8 and to produce output about skipped or unchanged lists, but the defaults should be good for you and will report which lists were changed. In no case does it report the actual changes, just what lists were changed.
I recommend making backups of your lists/*/config.pck files before running this in case something goes wrong.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

I did the 2.1.18 upgrade this weekend, after some weighing of the options. In the long run, using the Debian package is less work intensive, so in the end my choice was the utf-8 conversion.
The script is at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>)
To cut the long story short, the script worked without any problems. Taking backups took more time than running the script and doing the package upgrade.
Such a relief - thank you.
- Eva Isaksson
2016-01-08 2:55 GMT+02:00 Mark Sapiro <mark@msapiro.net>:

On 01/16/2016 06:24 AM, Eva Isaksson wrote:
I'm glad the script worked for you.
I have mentioned this issue to the Debian people who are responsible for the Mailman package, but it might be helpful if you report your experiences to Debian. They don't seem to realize that they are potentially causing this pain for large numbers of users of their Mailman package.
While I have "fixed" <https://bugs.launchpad.net/mailman/+bug/1462755>, that fix will not be released until 2.1.21 (soon I hope) and it only addresses the symptom and only in one place. Potentially, other strings can cause the same issue.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 1/7/2016 2:40 PM, Eva Isaksson wrote:
I have posted this before, and I will post it again. When I was an administrator for a Mailman system running on Ubuntu, I decided that the Debian package had problems. Besides having patches that were undocumented, there was one patch that reomoved a library that sometimes was needed. So I figured out how to create a Mailman package from the SourceForge source. The last Mailman package I built was 2.1.15; I assume that the same process would work for the latest Mailman. What I did is available to anyone who wants it; send me a private e-mail to get the output from "script" of the session when I built the last package. The only Debian patch I kept was one that placed libraries in the proper directories for Debian/Ubuntu.
--Barry Finkel
participants (3)
-
Barry S. Finkel
-
Eva Isaksson
-
Mark Sapiro