On Thu, Dec 2, 2010 at 8:23 PM, "Martin v. Löwis" firstname.lastname@example.org wrote:
In the case of number parsing, I think Python would be better if float() rejected non-ASCII strings, and any support for such parsing should be redone correctly in a different place (preferably along with printing of numbers).
+1. The set of strings currently accepted by the float constructor just seems too ad hoc to be at all useful. Apart from the decimal separator issue, and the question of exactly which decimal digits are accepted and which aren't, there are issues like this one:
x = '\uff11\uff25\uff0b\uff11\uff10' x
Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'decimal' codec can't encode character '\uff25' in position 1: invalid decimal Unicode string
y = '\uff11E+\uff11\uff10' y
That is, fullwidth *digits* are allowed, but none of the other characters can be fullwidth variants. Unfortunately, a float string doesn't consist solely of digits, and it seems to me to make little sense to allow variation in the digits without allowing corresponding variations in the other characters that might appear ('.', 'e', 'E', '+', '-').
A couple of slightly trickier decisions: (1) the float constructor currently does accept leading and trailing whitespace; should it allow any Unicode whitespace characters here? I'd say yes. (2) For int() rather than float(), there's a bit more value in allowing the variant digits, since it provides an easy way to interpret those digits. The decimal module currently makes use of this, for example (the decimal spec requires that non-European digits be accepted). I'd be happier if this functionality were moved elsewhere, though. The int constructor is, if anything, currently worse off than float, thanks to its attempts to support non-decimal bases.
There's value in having an easy-to-specify, easy-to-maintain API for these basic builtin functions. For one thing, it helps non-CPython implementations.
The Python 3.x docs apparently introduced a reference to the language spec which is clearly not capturing the wealth of possible inputs.
That documentation update was my fault; I was motivated to make the update by issues unrelated to this one (mostly to do with Python 3's more consistent handling of inf and nan, as a result of all the new float<->string conversion code). If I'd been thinking harder, I would have remembered that float accepted the non-European digits and added a note to that effect. This (unintentional) omission does underline the point that it's difficult right now to document and understand exactly what the float constructor does or doesn't accept.