Encoding/decoding: Still don't get it :-/

Mon Mar 16 11:13:12 EDT 2009

On 2009-03-13, Johannes Bauer <dfnsonfsduifb at gmx.de> wrote:
> Peter Otten schrieb:
>
>> encoding = sys.stdout.encoding or "ascii"
>> for row in rows:
>>     id, address = row[:2]
>>     print id, address.encode(encoding, "replace")
>> 
>> Example:
>> 
>>>>> u"ähnlich lölich üblich".encode("ascii", "replace")
>> '?hnlich l?lich ?blich'
>
> A very good tip, Peter - I've also had this problem before and didn't
> know about your solution.

If you know before hand that you will be using ascii, you can eliminate
the accents, so that you will get the unaccentuated letter (followed by
a question mark if you prefer) instead of a question mark

>>> from unicodedata import normalize, combining
>>> example = u"ähnlich lölich üblich"
>>> normalised =  normalize('NFKD', example)
>>> normalised.encode("ascii", "replace")
'a?hnlich lo?lich u?blich'
>>> eliminated = u''.join(l for l in normalised if not combining(l))
>>> eliminated.encode("ascii", "replace")
'ahnlich lolich ublich'

-- 
Antoon Pardon