[Python-ideas] More alternate constructors for builtin type

Steven D'Aprano steve at pearwood.info
Tue May 7 20:20:13 EDT 2019


On Tue, May 07, 2019 at 07:17:00PM +0100, Oscar Benjamin wrote:

> The int function accepts all kinds of things e.g.
> 
>     >>> int('๒')
>     2
> 
> However in my own code if that character ever got passed to int then
> it would definitely indicate either a bug in the code or data
> corruption so I'd rather have an exception.

If you ever get Thai users who would like to enter numbers in their own 
language, they may be a tad annoyed that you consider that a bug.


> Admittedly the non-ASCII unicode digit example is not one that has
> actually caused me a problem 

I don't see why it should cause a problem. An int is an int, regardless 
of how it was spelled before conversion.

You probably don't lose any sleep over the relatively high probability 
of a single flipped bit changing a '7' digit into a '6' digit, say. It 
seems strange to worry about the enormously less likely data corruption 
which just so happens to result in valid non-ASCII digits.


If your application can support user-data of "123" for the int 123, why 
would it matter if the user spelled it '๑๒๓' instead? You're not 
obligated to output Thai digits if you don't have many Thai users, but 
it just seems mean to reject Thai input if It Just Works.


> but what I have had a problem with is
> floats. Given that a user of my code can pass in a float in place of a
> string the fact that int(1.5) gives 1 can lead to bugs or confusion.

So write a helper function and use that. Or specify a base:

int(string, 0)

will support the usual Python formats (e.g. any of '123', '0x7b', 
'0o173', '0b1111011') without converting non-strings. If for some reason 
you only want to support a single base, say, base 7, you can specify a 
non-zero argument as the base:

py> int('234', 7)
123

But user input seems like a good place to apply Postel's Law:

"Be conservative in what you output, be liberal in what you accept."

It shouldn't be any skin off your nose to accept '0x7b' or '๑๒๓' as well 
as '123'.



-- 
Steven


More information about the Python-ideas mailing list