[Mailman-i18n] Subject lines in Archives

31 Mar 2002 17:47:35 +0200

Ben Gertzfield <che@debian.org> writes:

>     Martin> If
>     Martin> conversion fails, HTML character references are
>     Martin> emitted.

> I'm a bit confused.  How exactly do you propose converting non-Latin
> encoded text to Latin?  Since it cannot ever be converted, are you
> going to emit Unicode HTML character references?

Yes, that's what I said, and that's what it does.

> Also, what do you do to map charsets to Python Unicode codecs?

codecs.lookup (actually, just unicode(str, encoding)).

> They're not one-to-one; for example, ISO-2022-JP goes to
> japanese.iso-2022-jp.

That is actually a bug in the Japanese codecs package; it ought to
register a lookup function, instead of relying on the default lookup
function. If that bug is not fixed, modifying
codecs.encodings.aliases.aliases might be appropriate.

Regards,
Martin