[issue2382] [Py3k] SyntaxError cursor shifted if multibyte character is in line.

Hirokazu Yamamoto report at bugs.python.org
Tue Mar 18 06:22:32 CET 2008


New submission from Hirokazu Yamamoto <ocean-city at users.sourceforge.net>:

Hello. I found another problem related to issue2301.
SyntaxError cursor "^" is shifted when multibyte
characters are in line (before "^").

I think this is because err->text is stored as UTF-8
which requires 3 bytes for multibyte character,
but actually cp932 (my console encoding) requires only 2 bytes for it.

So "^" is shited to right 5 bytes because there is 5 multibyte chars.

C:\Documents and Settings\WhiteRabbit>py3k x.py
push any key....

  File "x.py", line 3
    print "あいうえお"
                          ^
SyntaxError: invalid syntax
[22567 refs]

Sorry, I didn't know what PyTokenizer_RestoreEncoding really doing.
That function adjusted err_ret->offset for this encoding conversion.
So, Python2.5 can output cursor in right place. (Of course, if source
encoding is not compatible for console encoding, broken string is printed
though. Anyway, cursor is right)

C:\Documents and Settings\WhiteRabbit>py a.py
  File "a.py", line 2
    x "、「、、、ヲ、ィ、ェ"
                 ^
SyntaxError: invalid syntax
[8728 refs]

I tried to fix this problem, but I'm not sure how to fix this.

----------
components: None
messages: 63895
nosy: ocean-city
severity: normal
status: open
title: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
versions: Python 3.0

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2382>
__________________________________


More information about the Python-bugs-list mailing list