[Tutor] Problems with encoding
Kent Johnson
kent37 at tds.net
Wed Jul 26 23:03:36 CEST 2006
mjekl at iol.pt wrote:
> Hi,
>
>
> My interpreter in set via sitecustomize.py to use utf-8 as default encoding.
>
> I'm reading fields from a dbf table to a firebird db with encoding set to win1252.
> I guess it's original encoding is cp850, but am not sure, and have been addressing exceptions one by one with lines of:
>
> r = r.replace(u'offending_code', u'ok_code')
>
Why don't you just convert from cp850 to cp1252 directly? Python
supports both encodings, it's as simple as
some_string.decode('cp850').encode('cp1252')
> But now it seems I've run into a brick wall!
> I've this exception I can't seem to avoid with my strategy:
> """
> Traceback (most recent call last):
> File "C:\Python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 307, in RunScript
> debugger.run(codeObject, __main__.__dict__, start_stepping=0)
> File "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
> _GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
> File "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 631, in run
> exec cmd in globals, locals
> File "C:\Documents and Settings\mel.TECNICON\Desktop\Statistics\fromDBF\import_pcfcli.py", line 216, in ?
> fbCr.execute(sqlTemplate, rec)
> File "C:\Python24\Lib\site-packages\kinterbasdb\typeconv_text_unicode.py", line 108, in unicode_conv_in
> return unicodeString.encode(pyEncodingName)
> File "C:\Python24\Lib\encodings\cp1252.py", line 18, in encode
> return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x99 in position 33: unexpected code byte
> """
>
> I also print the offending record so I can inspect it and apply my strategy:
> ('', 'A', '', '', 'EDIFICIOS 3B,SOCIEDADE DE CONTRUC\x99ES LDA', 'LISBOA', 'Plafond: 2494, Prazo de Pagamento: 30 Metodo de pagamento: ', '', '', 'RUA JO\x8eO SILVA,N..4,10A', '', '1900-271', '502216972', '218482733', '1663')
>
> The problem is with '\x99' :-(
> I added this line to the code:
>
> r = r.replace(u'\x99', u'O')
>
> But it I get exactly the same Traceback!
My guess is a coding error on your part, otherwise something would have
changed...can you show some context in import_pcfcli.py?
Kent
More information about the Tutor
mailing list