[New-bugs-announce] [issue25275] Documentation v/s behaviour mismatch wrt integer literals containing non-ASCII characters
Shreevatsa R
report at bugs.python.org
Wed Sep 30 07:19:45 CEST 2015
New submission from Shreevatsa R:
Summary: This is about int(u'१२३४') == 1234.
At https://docs.python.org/2/library/functions.html and also https://docs.python.org/3/library/functions.html the documentation for
class int(x=0)
class int(x, base=10)
says (respectively):
> If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in radix base.
> If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base.
If you follow the definition of "integer literal" into the reference (https://docs.python.org/2/reference/lexical_analysis.html#integers and https://docs.python.org/3/reference/lexical_analysis.html#integers respectively), the definitions ultimately involve
nonzerodigit ::= "1"..."9"
octdigit ::= "0"..."7"
bindigit ::= "0" | "1"
digit ::= "0"..."9"
So it looks like whether the behaviour of int() conforms to its documentation hinges on what "representing" means. Apparently it is some definition under which u'१२३४' represents the integer literal 1234, but it would be great to either clarify the documentation of int() or change its behaviour.
----------
assignee: docs at python
components: Documentation, Interpreter Core, Unicode
messages: 251915
nosy: docs at python, ezio.melotti, haypo, shreevatsa
priority: normal
severity: normal
status: open
title: Documentation v/s behaviour mismatch wrt integer literals containing non-ASCII characters
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25275>
_______________________________________
More information about the New-bugs-announce
mailing list