Ben Gertzfield
Martin> If Martin> conversion fails, HTML character references are Martin> emitted.
I'm a bit confused. How exactly do you propose converting non-Latin encoded text to Latin? Since it cannot ever be converted, are you going to emit Unicode HTML character references?
Yes, that's what I said, and that's what it does.
Also, what do you do to map charsets to Python Unicode codecs?
codecs.lookup (actually, just unicode(str, encoding)).
They're not one-to-one; for example, ISO-2022-JP goes to japanese.iso-2022-jp.
That is actually a bug in the Japanese codecs package; it ought to register a lookup function, instead of relying on the default lookup function. If that bug is not fixed, modifying codecs.encodings.aliases.aliases might be appropriate. Regards, Martin