[Mailman-i18n] HTML entities (é) in es, it, no translations

Martin von Loewis loewis@informatik.hu-berlin.de
31 Jan 2002 11:13:07 +0100


Ben Gertzfield <che@debian.org> writes:

> Does anyone have any comments?  

I agree that the message catalogs should use the preferred encoding of
the language, and not HTML entity or character references. There are a
few issues to double-check before going forward with that, though:

- for this to work, Mailman needs to properly declare the encoding of
  each generated HTML page, and the declaration needs to match the
  actual content. For Latin-1, this is not strictly necessary, since
  that is the default encoding of HTML, anyway, but there may be plans
  to move to XHTML some day, at which time even this assumption
  breaks.

- Problems will arise if Mailman inserts strings from various sources
  into the same template, especially if these use different encodings.
  If that can ever happen, you need to recode all strings to the same
  encoding. If that fails (e.g. because the encoding is unknown, or
  because the string cannot be represented in the encoding), HTML
  entities may be your only option. Please have a look at

http://www2.iro.umontreal.ca/~pinard/po/registry.cgi?team=tr

  This document is encoded in ISO-8859-9 (for Turkish); but it still
  contains French accepts. Using entities is the only choice here,
  short of using UTF-8 for the entire page.

  In short, using the language's preferred encoding requires Mailman
  to carefully track the encoding of the message through its entire
  processing chain. If the encoding is supported by the codecs
  library, an alternative would be to use ugettext (so that the
  encoding is implied by the string being a Unicode
  object). 

Unfortunately, not all encodings in mailman are supported (the East
Asians ones are missing). In general, I'd encourage usage of Unicode
throughout in mailman, even if this means that additional codecs must
be bundled with the distribution.

Regards,
Martin