[Mailman-Developers] Mailman3: decorating non-ascii message templates
barry at list.org
Sat Oct 19 00:01:07 CEST 2013
On Oct 14, 2013, at 03:42 PM, Aurélien Bompard wrote:
>I first thought about opening a bug for this, but I think it needs a
>small discussion first.
Thanks for the thorough investigation!
>In Mailman3 the message templates are stored on disk (welcome.txt,
>footer-generic.txt, ...) However, unless I missed something, there is no
>hint about the encoding of those files. As a result, when I try the
>decorate() function (from mailman.handlers.decorate) on a non-ascii file, it
>crashes with a classic UnicodeDecodeError: 'ascii' codec can't decode byte
>[...]. Note: if the file contains no string to replace (like
>$fqdn_listname), it is passed through unchanged and it works.
>So how should we deal with this? I think that the TemplateLoader in
>mailman.app.templates should return unicode strings, because that's the
>closest to the moment when files are read, and unicode conversion should
>happen on the "external borders" of the application.
Completely agreed. We need to convert to unicode at the edges and treat
strings internally as unicode. I want MM3 to eventually be a Python 3
application, so we'll have to do this anyway. I'm sure it'll be painful, but
let's start now in any way we can.
>Thus the TemplateLoader's get() method seems to be the right place. I see
>- We require that all template files are stored in either ascii or
>utf-8. That's the easiest way to go, and we just decode the text after
>getting the file.
>- We use the fact that our Language entities contain encoding values.
>When the template is loaded from an internal URL containing the
>language, we add the corresponding encoding to the result metadata (to
>be retrieved with the info() method) and use that to decode the
>contents. This means that templates in non-localized directories still
>have to be ascii-only, and the same goes for templates retrieved from
>non-internal URLs (not starting with mailman://). It's more complex but
>it may seem more natural to the administrator, since he won't have to
>force UTF-8 encoding when editing a file. We must also think of not
>making it too hard for Postorius, which will probably only get UTF-8
>posted from the webpage (since it's always displayed in UTF-8 IIRC).
>I would vote for the easy way for just requiring UTF-8 encoding, but I'd
>like to hear your thoughts on this.
I would vote for UTF-8 for all files, internal or external, but maybe there
are some languages for which this will cause problems. We do have a `charset`
variable in the config file for languages, but I wonder if we'll actually use
anything other than UTF-8. Note that some of the charsets in MM2.1 are not
UTF-8, but I'm not sure if any of them are UTF-8 incompatible. MM3 only
defines a setting for USA English by default, and that's currently us-ascii,
but maybe even that should be UTF-8.
So I guess unless we can identify actually languages that would be harmed by
UTF-8, we should just require that. Maybe Steve can weigh in on the issue.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: not available
More information about the Mailman-Developers