utf - string translation
Fredrik Lundh
fredrik at pythonware.com
Wed Nov 29 17:11:27 EST 2006
John Machin wrote:
> Another point: there are many non-latin1 characters that could be
> mapped to ASCII. For example:
> u"\u0141ukasziewicz".translate(unaccented_map())
> doesn't work unless an entry is added to the no-decomposition table:
> 0x0141: u"L", # LATIN CAPITAL LETTER L WITH STROKE
>
> It looks like generating extra entries like that could be done, with
> the aid of unicodedata.name():
>
> LATIN CAPITAL LETTER X WITH blahblah -> "X"
> LATIN SMALL LETTER X WITH blahblah -> "X".lower()
>
> This would require a fair bit of care -- obviously there are special
> cases like LATIN CAPITAL LETTER O WITH STROKE. Eyeballing by regional
> experts is probably required.
see the comments over at
http://effbot.org/zone/unicode-convert.htm
for an extended table, eyeballed by a regional expert (and since he
makes the same point about OE vs Oe as you do, I'll probably have to
change the code ;-)
</F>
More information about the Python-list
mailing list