2to3 chokes on bad character

Frank Millman frank at chagford.com
Wed Feb 23 03:47:36 EST 2011


Hi all

I don't know if this counts as a bug in 2to3.py, but when I ran it on my 
program directory it crashed, with a traceback but without any indication of 
which file caused the problem.

Here is the traceback -

Traceback (most recent call last):
  File "C:\Python32\Tools\Scripts\2to3.py", line 5, in <module>
    sys.exit(main("lib2to3.fixes"))
  File "C:\Python32\lib\lib2to3\main.py", line 172, in main
    options.processes)
  File "C:\Python32\lib\lib2to3\refactor.py", line 700, in refactor
    items, write, doctests_only)
  File "C:\Python32\lib\lib2to3\refactor.py", line 294, in refactor
    self.refactor_dir(dir_or_file, write, doctests_only)
  File "C:\Python32\lib\lib2to3\refactor.py", line 314, in refactor_dir
    self.refactor_file(fullname, write, doctests_only)
  File "C:\Python32\lib\lib2to3\refactor.py", line 741, in refactor_file
    *args, **kwargs)
  File "C:\Python32\lib\lib2to3\refactor.py", line 336, in refactor_file
    input, encoding = self._read_python_source(filename)
  File "C:\Python32\lib\lib2to3\refactor.py", line 332, in 
_read_python_source
    return _from_system_newlines(f.read()), encoding
  File "C:\Python32\lib\codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055: 
invalid start byte

On investigation, I found some funny characters in docstrings that I 
copy/pasted from a pdf file.

Here are the details if they are of any use. Oddly, I found two instances 
where characters 'look like' apostrophes when viewed in my text editor, but 
one of them was accepted by 2to3 and the other caused the crash.

The one that was accepted consists of three bytes - 226, 128, 153 (as 
reported by python 2.6) or 226, 8364, 8482 (as reported by python3.2).

The one that crashed consists of a single byte - 146 (python 2.6) or 8217 
(python 3.2).

The issue is not that 2to3 should handle this correctly, but that it should 
give a more informative error message to the unsuspecting user.

Frank Millman

BTW I have always waited for 'final releases' before upgrading in the past, 
but this makes me realise the importance of checking out the beta versions - 
I will do so in future.





More information about the Python-list mailing list