2to3 chokes on bad character
frank at chagford.com
Wed Feb 23 09:47:36 CET 2011
I don't know if this counts as a bug in 2to3.py, but when I ran it on my
program directory it crashed, with a traceback but without any indication of
which file caused the problem.
Here is the traceback -
Traceback (most recent call last):
File "C:\Python32\Tools\Scripts\2to3.py", line 5, in <module>
File "C:\Python32\lib\lib2to3\main.py", line 172, in main
File "C:\Python32\lib\lib2to3\refactor.py", line 700, in refactor
items, write, doctests_only)
File "C:\Python32\lib\lib2to3\refactor.py", line 294, in refactor
self.refactor_dir(dir_or_file, write, doctests_only)
File "C:\Python32\lib\lib2to3\refactor.py", line 314, in refactor_dir
self.refactor_file(fullname, write, doctests_only)
File "C:\Python32\lib\lib2to3\refactor.py", line 741, in refactor_file
File "C:\Python32\lib\lib2to3\refactor.py", line 336, in refactor_file
input, encoding = self._read_python_source(filename)
File "C:\Python32\lib\lib2to3\refactor.py", line 332, in
return _from_system_newlines(f.read()), encoding
File "C:\Python32\lib\codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055:
invalid start byte
On investigation, I found some funny characters in docstrings that I
copy/pasted from a pdf file.
Here are the details if they are of any use. Oddly, I found two instances
where characters 'look like' apostrophes when viewed in my text editor, but
one of them was accepted by 2to3 and the other caused the crash.
The one that was accepted consists of three bytes - 226, 128, 153 (as
reported by python 2.6) or 226, 8364, 8482 (as reported by python3.2).
The one that crashed consists of a single byte - 146 (python 2.6) or 8217
The issue is not that 2to3 should handle this correctly, but that it should
give a more informative error message to the unsuspecting user.
BTW I have always waited for 'final releases' before upgrading in the past,
but this makes me realise the importance of checking out the beta versions -
I will do so in future.
More information about the Python-list