[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

Steven D'Aprano steve at pearwood.info
Tue Jun 11 02:53:02 CEST 2013


On 10/06/13 05:35, Alexander Belopolsky wrote:
> On Sun, Jun 9, 2013 at 12:47 PM, Łukasz Langa <lukasz at langa.pl> wrote:
>
>> Wikipedia started serving numerical data with \N{MINUS SIGN} instead of
>> 0x2D, for instance on climate tables. I think we'll see increased usage of
>> such characters in the wild.
>
>
> While I don't think MINUS SIGN can be abused this way, you should be very
> careful when you copy numerical data from the web.   Consider this case:
>
>>>> float('123٠95')
> 123095.0
>
> Depending on your font, '123٠95' may be indistinguishable from '123.95'.

Indistinguishable *by eye* maybe, but the same applies to ASCII, 365lO98, and there are plenty of ways to distinguish them other than by a careless glance at the screen.

[Aside: I have seen users type I or O for digits, based on the fact that it works fine when using a typewriter, and I've read books from the 1970s that recommended that number parsers accept I, L and O for just that reason.]

It would be a pretty awful font that made ٠ look like . But even if it did, what is the concern here? If somebody enters a mixed script number, presumably they have some reason for it. If they don't, the point is moot. It is hardly likely that mixed script numbers will form by accident.

Postel's Law, or the Robustness Principle, supports the current behaviour: "Be conservative in what you send, be liberal in what you accept". str(number) is conservative, and emits only ASCII digits. int(string) and float(string) are liberal, and accept any valid digit as a digit.


> If you do research using numerical data published on the web, you will be
> well advised not to assume that anything that looks like a number to your
> eye can be fed to python's float().

What exactly are you concerned about? If it is a syntactically valid number made up of nothing but digits (plus a sign and decimal point), Python will convert it, using the correct value for each digit. If it contains non-digits, Python will give you a traceback.



-- 
Steven


More information about the Python-ideas mailing list