[Mailman-Developers] Mailman3: decorating non-ascii message templates

Sat Oct 19 00:01:07 CEST 2013

On Oct 14, 2013, at 03:42 PM, Aurélien Bompard wrote:

>I first thought about opening a bug for this, but I think it needs a
>small discussion first.

Thanks for the thorough investigation!

>In Mailman3 the message templates are stored on disk (welcome.txt,
>footer-generic.txt, ...)  However, unless I missed something, there is no
>hint about the encoding of those files.  As a result, when I try the
>decorate() function (from mailman.handlers.decorate) on a non-ascii file, it
>crashes with a classic UnicodeDecodeError: 'ascii' codec can't decode byte
>[...].  Note: if the file contains no string to replace (like
>$fqdn_listname), it is passed through unchanged and it works.
>
>So how should we deal with this? I think that the TemplateLoader in
>mailman.app.templates should return unicode strings, because that's the
>closest to the moment when files are read, and unicode conversion should
>happen on the "external borders" of the application.

Completely agreed.  We need to convert to unicode at the edges and treat
strings internally as unicode.  I want MM3 to eventually be a Python 3
application, so we'll have to do this anyway.  I'm sure it'll be painful, but
let's start now in any way we can.

>Thus the TemplateLoader's get() method seems to be the right place.  I see
>two options:
>
>- We require that all template files are stored in either ascii or
>utf-8. That's the easiest way to go, and we just decode the text after
>getting the file.
>
>- We use the fact that our Language entities contain encoding values.
>When the template is loaded from an internal URL containing the
>language, we add the corresponding encoding to the result metadata (to
>be retrieved with the info() method) and use that to decode the
>contents. This means that templates in non-localized directories still
>have to be ascii-only, and the same goes for templates retrieved from
>non-internal URLs (not starting with mailman://). It's more complex but
>it may seem more natural to the administrator, since he won't have to
>force UTF-8 encoding when editing a file. We must also think of not
>making it too hard for Postorius, which will probably only get UTF-8
>posted from the webpage (since it's always displayed in UTF-8 IIRC).
>
>I would vote for the easy way for just requiring UTF-8 encoding, but I'd
>like to hear your thoughts on this.

I would vote for UTF-8 for all files, internal or external, but maybe there
are some languages for which this will cause problems.  We do have a `charset`
variable in the config file for languages, but I wonder if we'll actually use
anything other than UTF-8.  Note that some of the charsets in MM2.1 are not
UTF-8, but I'm not sure if any of them are UTF-8 incompatible.  MM3 only
defines a setting for USA English by default, and that's currently us-ascii,
but maybe even that should be UTF-8.

So I guess unless we can identify actually languages that would be harmed by
UTF-8, we should just require that.  Maybe Steve can weigh in on the issue.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20131018/1313bbd6/attachment.sig>