[Mailman-i18n] HTML entities (é) in es, it, no translations
Martin von Loewis
loewis@informatik.hu-berlin.de
31 Jan 2002 11:13:07 +0100
Ben Gertzfield <che@debian.org> writes:
> Does anyone have any comments?
I agree that the message catalogs should use the preferred encoding of
the language, and not HTML entity or character references. There are a
few issues to double-check before going forward with that, though:
- for this to work, Mailman needs to properly declare the encoding of
each generated HTML page, and the declaration needs to match the
actual content. For Latin-1, this is not strictly necessary, since
that is the default encoding of HTML, anyway, but there may be plans
to move to XHTML some day, at which time even this assumption
breaks.
- Problems will arise if Mailman inserts strings from various sources
into the same template, especially if these use different encodings.
If that can ever happen, you need to recode all strings to the same
encoding. If that fails (e.g. because the encoding is unknown, or
because the string cannot be represented in the encoding), HTML
entities may be your only option. Please have a look at
http://www2.iro.umontreal.ca/~pinard/po/registry.cgi?team=tr
This document is encoded in ISO-8859-9 (for Turkish); but it still
contains French accepts. Using entities is the only choice here,
short of using UTF-8 for the entire page.
In short, using the language's preferred encoding requires Mailman
to carefully track the encoding of the message through its entire
processing chain. If the encoding is supported by the codecs
library, an alternative would be to use ugettext (so that the
encoding is implied by the string being a Unicode
object).
Unfortunately, not all encodings in mailman are supported (the East
Asians ones are missing). In general, I'd encourage usage of Unicode
throughout in mailman, even if this means that additional codecs must
be bundled with the distribution.
Regards,
Martin