[issue2382] [Py3k] SyntaxError cursor shifted if multibyte character is in line.
STINNER Victor
report at bugs.python.org
Tue Mar 17 22:40:30 CET 2009
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Proof of concept of patch fixing this issue:
- parse_syntax_error() reads the text line into a PyUnicodeObject*
instead of a "const char**"
- create utf8_to_unicode_offset(): convert byte offset to a number of
characters. The Python version should be something like:
def utf8_to_unicode_offset(text, byte_offset):
utf8 = text.encode("utf-8")
utf8 = utf8[:byte_offset]
text = str(utf8, "utf-8")
return len(text)
- reuse adjust_offset() from
py3k_adjust_cursor_at_syntax_error_v2.patch, but force the use of
wcswidth() because HAVE_WCSWIDTH is not defined by configure
- print_error_text() works on unicode characters and not on bytes!
The patch should be refactorized:
- move adjust_offset(), utf8_to_unicode_offset(), utf8_len() in
unicodeobject.c. You might create a new method "width()" for the
unicode type. This method can be used to fix center(), ljust() and
rjust() unicode methods (see issue #3446).
----------
Added file: http://bugs.python.org/file13354/issue2382.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2382>
_______________________________________
More information about the Python-bugs-list
mailing list