[Tutor] Unicode trouble

Wed Nov 30 16:29:44 CET 2005

Øyvind wrote:
> 
>>Where are you getting these errors (what line of the program)? Do you
> 
> know >what kind of strings objSelection.Find.Execute() is expecting?
> 
>>Kent
> 
> 
> The program stops working and gives me these errors when I try to run it
> when it encounters a non-english letter.
> 
> This is the full error:
> Traceback (most recent call last):
>   File
> "C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
> line 310, in RunScript
>     exec codeObject in __main__.__dict__
>   File "C:\Python\BA\Oversett.py", line 47, in ?
>   File "C:\Python\BA\Oversett.py", line 23, in kjor
>     en = i.split('\t')[0]
>   File "C:\Python23\lib\codecs.py", line 388, in readlines
>     return self.reader.readlines(sizehint)
>   File "C:\Python23\lib\codecs.py", line 314, in readlines
>     return self.decode(data, self.errors)[0].splitlines(1)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 168-170:
> invalid data

This is fairly strange as the line
  en = i.split('\t')[0]
should not call any method in codecs. I don't know how you can get such a stack trace. Maybe try deleting all the .pyc files to make sure they are in sync with the source and try again?

The actual error indicates that the input data is not valid utf-8. Are you sure that is the correct encoding for the input file? If the file is utf-8 and has bad characters you could pass error='ignore' or error='replace' as a parameter to codecs.open() to change the error handling style to something more forgiving.
> 
> and
> 
> Traceback (most recent call last):
>   File
> "C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
> line 310, in RunScript
>     exec codeObject in __main__.__dict__
>   File "C:\Python\BA\Oversett.py", line 49, in ?
>   File "C:\Python\BA\Oversett.py", line 33, in kjor
>     if t % 1000 == 0:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 17:
> ordinal not in range(128)

Again this stack trace doesn't make sense, the indicated line doesn't do any string operation.

This error message normally occurs when a non-ascii string is converted to unicode using the default encoding (which is 'ascii'). Often the conversion is implicit in some other operation but I don't see any such operation here.
> 
> objSelection.Find.Execute() is supposed to accept any kind of string. (It
> is the function Search & Replace in MS Word).

It has to make some assumption about the type of the string. Does it want unicode or encoded bytes? If encoded bytes, what encoding does it expect?

Kent
-- 
http://www.kentsjohnson.com