[Tutor] Problems with encoding
mjekl at iol.pt
mjekl at iol.pt
Fri Jul 28 21:45:42 CEST 2006
>kent37 at tds.net wrote:
>>mjekl at iol.pt wrote:
>> Hi,
>>
>>
>> My interpreter in set via sitecustomize.py to use utf-8 as default encoding.
>>
>> I'm reading fields from a dbf table to a firebird db with encoding set to win1252.
>> I guess it's original encoding is cp850, but am not sure, and have been addressing exceptions one by one with lines of:
>>
>> r = r.replace(u'offending_code', u'ok_code')
>>
>Why don't you just convert from cp850 to cp1252 directly? Python
>supports both encodings, it's as simple as
>some_string.decode('cp850').encode('cp1252')
In the mean while somewhat accidently (read some stuff) I followed a similar approach.
[...] ORIGINAL POST DELETED HERE [...]
>My guess is a coding error on your part, otherwise something would have changed...can you show some context in import_pcfcli.py?
I also expect it's my error ;-( ;-)
The following snippet of my present code isn't giving me any problems. Although I'm not really sure why it works. Also I had some problems encoding to 'cp1252' and not to 'utf-8'.
Anyone as a pointer to a nice resource that can help me understand this decode / encode biz better?
try:
# TODO: Check the str.translate() method
r = recordSet.Fields(fieldsDict[f]).Value.strip()
r.decode('cp850')
r = r.replace(u'\x8f', u'')
r = r.replace(u'\u20ac', u'\xc7')
r = r.replace(u'\xa6', u'\xaa')
r = r.replace(u'\u2122', u'\xd5')
r = r.replace(u'\u017d', u'\xc3')
r = r.replace(u'\xa7', u'\xba')
# The following line does not work if with 'cp1252' !?
rec.append(r.encode('utf-8')) # kinterbasdb makes conversions by itself ;-)
except UnicodeDecodeError, UnicodeEncodeError:
print f
return None
Txs,
Miguel
_______________________________________________________________________________________
Uma mensalidade a medida da sua carteira.
Saber mais em http://www.iol.pt/correio/rodape.php?dst=0607191
More information about the Tutor
mailing list