unicode issue

Thu Oct 1 11:50:15 EDT 2009

On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald <walter at livinglogic.de>  
wrote:

> On 01.10.09 16:09, Hyuga wrote:
>> On Sep 30, 3:34 am, gentlestone <tibor.b... at hotmail.com> wrote:
>>> Why don't work this code on Python 2.6? Or how can I do this job?
>>>
>>> [snip _MAP]
>>>
>>> def downcode(name):
>>>     """
>>>     >>> downcode(u"Žabovitá zmiešaná kaša")
>>>     u'Zabovita zmiesana kasa'
>>>     """
>>>     for key, value in _MAP.iteritems():
>>>         name = name.replace(key, value)
>>>     return name
>>
>> Though C Python is pretty optimized under the hood for this sort of
>> single-character replacement, this still seems pretty inefficient
>> since you're calling replace for every character you want to map.  I
>> think that a better approach might be something like:
>>
>> def downcode(name):
>>     return ''.join(_MAP.get(c, c) for c in name)
>>
>> Or using string.translate:
>>
>> import string
>> def downcode(name):
>>     table = string.maketrans(
>>         'ÀÁÂÃÄÅ...',
>>         'AAAAAA...')
>>     return name.translate(table)
>
> Or even simpler:
>
> import unicodedata
>
> def downcode(name):
>    return unicodedata.normalize("NFD", name)\
>           .encode("ascii", "ignore")\
>           .decode("ascii")
>
> Servus,
>    Walter

As I understand it, the "ignore" argument to str.encode *removes* the  
undecodable characters, rather than replacing them with an ASCII  
approximation. Is that correct? If so, wouldn't that rather defeat the  
purpose?

-- 
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" --  
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)