[issue1679] tokenizer permits invalid hex integer

Sat Jan 19 17:56:06 CET 2008

Malte Helmert added the comment:

I can find three places where "0x" is accepted, but probably shouldn't:

1. Python's tokenizer:
>>> 0x
0
>>> 0xL
ValueError: invalid literal for long() with base 16: '0xL'
=> I think these should both be syntax errors.

2. int builtin:
>>> int("0x", 0) == int("0x", 16) == 0
True
>>> long("0x", 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for long() with base 16: '0x'
>>> long("0x", 16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for long()

=> The long behaviour looks right to me, and I think the int behaviour
should match it.

3. tokenize module:
This currently accepts "0x" and "0xL" as single tokens. The obvious fix
would lead to these two being reported as two separate tokens ("0":
NUMBER, "x": NAME; "0": NUMBER, "xL": NAME), as it currently does for
other cases where a name follows a number like "23cats". However, this
is not quite what Python's parser does, which returns an error token
instead. (Fortunately, name after number appears to be a syntax error
everywhere, so it doesn't really affect the behaviour; a syntax error
occurs either way.)

----------
nosy: +maltehelmert

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1679>
__________________________________