[Python-Dev] Python and the Unicode Character Database

Thu Dec 2 22:48:33 CET 2010

Am 02.12.2010 22:30, schrieb Steven D'Aprano:
> Martin v. Löwis wrote:
>>>> Then these users should speak up and indicate their need, or somebody
>>>> should speak up and confirm that there are users who actually want
>>>> '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
>>>> system in which '١٢٣٤.٥٦e4' means 12345600.0.
>>> I'm not sure what you're after here.
>>
>> That the current float() constructor accepts tons of bogus character
>> strings and accepts them as numbers, and that it should stop doing so.
> 
> What bogus characters do the float() and int() constructors accept? As
> far as I can see, they only accepts numerals.

Not bogus characters, but bogus character strings. E.g. strings that mix
digits from different scripts, and mix them with the Python decimal
separator.

>> Notice that Python does *not* currently support printing numbers in
>> other scripts - even though this may actually be more useful than
>> parsing.
> 
> Lack of one function, even if more useful, does not imply that an
> existing function should be removed.

No. But if the specific function(ality) is not useful and
underspecified, it should be removed.

> So your problems with the current behaviour are:
> 
> (1) in some unspecified way, it's not done correctly;

No. My main concern is that it is not properly specified. If it was
specified, I could then tell you what precisely is wrong about it.
Right now, I can only give examples for input that it should not accept,
and examples of input that it should, but does not accept.

> (2) it belongs somewhere other than float() and int().

That's only because it also needs a parameter to specify what syntax to
follow, somehow. That parameter could be explicit or implicit, and it
could be to float or to some other function. But it must be available,
and is not.

> That second is awfully close to bike-shedding. Since you accept that
> Python *should* have the current behaviour

No, I don't. I think it behaves incorrectly, accepting garbage input and
guessing some meaning out of it.

> - how the current behaviour is incorrect;

See above: it accepts strings that do not denote real numbers in any
writing system, and, despite the claim that the feature is there to
support other writing systems, actually does not truly support other
writing systems.

> - your suggestions for correcting it; and

Make the current implementation exactly match the current documentation.
I think the documentation is correct; the implementation is wrong.

> - a concrete suggestion for where you would like to see the behaviour
> moved to, and why that would be better than where it currently is.

The current behavior should go nowhere; it is not useful. Something very
similar to the current behavior (but done correctly) should go into the
locale module.

Regards,
Martin