[Tutor] Problems with encoding

Wed Jul 26 23:03:36 CEST 2006

mjekl at iol.pt wrote:
> Hi,
>
>
> My interpreter in set via sitecustomize.py to use utf-8 as default encoding.
>
> I'm reading fields from a dbf table to a firebird db with encoding set to win1252.
> I guess it's original encoding is cp850, but am not sure, and have been addressing exceptions one by one with lines of:
>
> r = r.replace(u'offending_code', u'ok_code')
>   
Why don't you just convert from cp850 to cp1252 directly? Python 
supports both encodings, it's as simple as
some_string.decode('cp850').encode('cp1252')
> But now it seems I've run into a brick wall!
> I've this exception I can't seem to avoid with my strategy:
> """
> Traceback (most recent call last):
>   File "C:\Python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 307, in RunScript
>     debugger.run(codeObject, __main__.__dict__, start_stepping=0)
>   File "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
>     _GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
>   File "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 631, in run
>     exec cmd in globals, locals
>   File "C:\Documents and Settings\mel.TECNICON\Desktop\Statistics\fromDBF\import_pcfcli.py", line 216, in ?
>     fbCr.execute(sqlTemplate, rec)
>   File "C:\Python24\Lib\site-packages\kinterbasdb\typeconv_text_unicode.py", line 108, in unicode_conv_in
>     return unicodeString.encode(pyEncodingName)
>   File "C:\Python24\Lib\encodings\cp1252.py", line 18, in encode
>     return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x99 in position 33: unexpected code byte
> """
>
> I also print the offending record so I can inspect it and apply my strategy:
> ('', 'A', '', '', 'EDIFICIOS 3B,SOCIEDADE DE CONTRUC\x99ES LDA', 'LISBOA', 'Plafond: 2494, Prazo de Pagamento: 30 Metodo de pagamento: ', '', '', 'RUA JO\x8eO SILVA,N..4,10A', '', '1900-271', '502216972', '218482733', '1663')
>
> The problem is with '\x99' :-(
> I added this line to the code:
>
> r = r.replace(u'\x99', u'O')
>
> But it I get exactly the same Traceback!

My guess is a coding error on your part, otherwise something would have 
changed...can you show some context in import_pcfcli.py?

Kent