Encoding/decoding: Still don't get it :-/
Antoon Pardon
apardon at forel.vub.ac.be
Mon Mar 16 11:13:12 EDT 2009
On 2009-03-13, Johannes Bauer <dfnsonfsduifb at gmx.de> wrote:
> Peter Otten schrieb:
>
>> encoding = sys.stdout.encoding or "ascii"
>> for row in rows:
>> id, address = row[:2]
>> print id, address.encode(encoding, "replace")
>>
>> Example:
>>
>>>>> u"ähnlich lölich üblich".encode("ascii", "replace")
>> '?hnlich l?lich ?blich'
>
> A very good tip, Peter - I've also had this problem before and didn't
> know about your solution.
If you know before hand that you will be using ascii, you can eliminate
the accents, so that you will get the unaccentuated letter (followed by
a question mark if you prefer) instead of a question mark
>>> from unicodedata import normalize, combining
>>> example = u"ähnlich lölich üblich"
>>> normalised = normalize('NFKD', example)
>>> normalised.encode("ascii", "replace")
'a?hnlich lo?lich u?blich'
>>> eliminated = u''.join(l for l in normalised if not combining(l))
>>> eliminated.encode("ascii", "replace")
'ahnlich lolich ublich'
--
Antoon Pardon
More information about the Python-list
mailing list