On 01/17/2017 01:05 PM, Mark Dale wrote:
Hi Mark,
I've been in touch with the Polish Language maintainer (Stefan Plewako) regarding the non-ascii characters that I am seeing.
In the language template files for Polish (Mailman 2.1.23) I see that the non-ascii characters (letters with diacritics) are replaced with question marks. These question marks get displayed in the Mailman web pages for lists using Polish.
Stefan directed me to GitHub for the Polish language files and in those I files I can see the Polish letters okay. (letters with diacritics)
I loaded Stefan's files into Mailman (replacing the existing) and all is now well. I was surprised as I was thinking that the non-ascii characters would need to be replaced with HTML entities - as you had done for the Hungarian files a couple of months ago. Stefan had advised me that doing that shouldn't be needed, and it seems he might be correct.
I can easily convert all the non-ascii in the Polish language templates to html entities which is the correct way to deal with this. I have been reluctant to do this for certain languages in the past because of the sheer numbers of html entities involved, essentially every character in Greek for example. Polish is not so bad, but the majority of non-ascii characters have only numeric html entities. For example, the snippet you quote becomes
<td colspan="2"> Wiadomości do wszystkich prenumeratorów listy wysyłaj na adres: <A HREF="mailto:<MM-Posting-Addr>"><MM-Posting-Addr></A>.
<p>Możesz zapisać się na listę lub zmienić op cje prenumeraty korzystając z poniższych sekcji. </td>
which will render correctly in a browser that recognizes those entities but is no more readable to humans in other contexts than the � characters are. The underlying issue here is Mailman's character set for Polish is iso-8859-2. Mailman sends those web pages built from those templates with a Content-Type: text/html; charset=iso-8859-2 header, but some web servers are configured to override that. E.g., see <http://httpd.apache.org/docs/2.4/mod/core.html#adddefaultcharset> for a description of the Apache directive. Stefan's templates are UTF-8 encoded and the html templates will work in an environment where the web server 'forces' utf-8, but the .txt templates if utf-8 encoded will break in a Mailman whose character set for Polish is still iso-8859-2, because they will be sent in email with Content-Type: text/plain; charset=iso-8859-2 but with utf-8 encoded characters. The ultimate solution is to make everything utf-8 encoded. Individual sites can do this, but I can't for the reasons discussed at <https://mail.python.org/pipermail/mailman-i18n/2015-February/001854.html>. Also see see the thread "Encoding problem with 2.15 to 2.18 upgrade with Finnish" beginning at <https://mail.python.org/pipermail/mailman-users/2015-December/080221.html> and continuing at <https://mail.python.org/pipermail/mailman-users/2016-January/080275.html> for some of the fallout after Debian arbitrarily changed the character set for several languages to utf-8 in their Mailman package. Bottom line is I have converted the Polish html templates to use html entities at <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1688> and will install those at mail.python.org with the intent of releasing that with 2.1.24. It should be OK, but if I get pushback from the Polish lists on mpo, I may have to reverse. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan