unicode issue
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue Oct 6 05:06:19 EDT 2009
En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald <walter at livinglogic.de>
escribió:
> On 01.10.09 16:09, Hyuga wrote:
>> On Sep 30, 3:34 am, gentlestone <tibor.b... at hotmail.com> wrote:
>>> _MAP = {
>>> # LATIN
>>> u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
>>> u'Æ': 'AE', u'Ç':'C', [...long table...]
>>> }
>>>
>>> def downcode(name):
>>> """
>>> >>> downcode(u"Žabovitá zmiešaná kaša")
>>> u'Zabovita zmiesana kasa'
>>> """
>>> for key, value in _MAP.iteritems():
>>> name = name.replace(key, value)
>>> return name
>
> import unicodedata
>
> def downcode(name):
> return unicodedata.normalize("NFD", name)\
> .encode("ascii", "ignore")\
> .decode("ascii")
This article [1] shows a mixed technique, decomposing characters when such
info is available in the Unicode tables, and also allowing for a custom
mapping when not.
[1] http://effbot.org/zone/unicode-convert.htm
--
Gabriel Genellina
More information about the Python-list
mailing list