[Tutor] Limitation of int() in converting strings

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Jan 2 16:16:48 CET 2013


On 1 January 2013 05:07, Steven D'Aprano <steve at pearwood.info> wrote:
> On 23/12/12 04:38, Oscar Benjamin wrote:
>>
>> On 22 December 2012 01:34, Steven D'Aprano<steve at pearwood.info>  wrote:
>>>
>>> On 18/12/12 01:36, Oscar Benjamin wrote:
>>>
>>>> I think it's unfortunate that Python's int() function combines two
>>>> distinct behaviours in this way. In different situations int() is used
>>>> to:
>>>> 1) Coerce an object of some type other than int into an int without
>>>> changing the value of the integer that the object represents.
>>>
> [SNIP]
>
> Yes. And it is a demonstrable fact that int is *not* intended to coerce
> objects to int "without changing the value of the number", because
> changing the value of the number is precisely what int() does, in some
> circumstances.

I wonder how often the int() function is used in a situation where the
dual behaviour is actually desired. I imagine that the common uses for
int are (in descending order or prevalence):
1) Parse a string as an integer
2) Convert an integer-valued numeric object of some other type, to the int type.
3) Truncate a non-integer valued object from a numeric type.

The int() function as it stands performs all three operations. I would
describe 1) and 2) as changing type but not value and 3) as a value
changing operation. I can see use cases where 1) and 2) go together. I
can't really envisage a case in which 1) and 3) are wanted at the same
time in some particular piece of code. In other words I can't imagine
a use case that calls for a function that works precisely as the int()
function currently does.

>
> If you would like to argue that it would have been better if int did
> not do this, then I might even agree with you.

That's exactly what I would argue.

> There is certainly
> precedence: if I remember correctly, you cannot convert floating point
> values to integers directly in Pascal, you first have to truncate them
> to an integer-valued float, then convert.
>
> # excuse my sloppy Pascal syntax, it has been a few years
> var
>   i: integer;
>   x: real;
> begin
>   i = integer(trunc(x));
> end;

 When the idea was discussed in the run up to Python 3, Guido raised
exactly this case and said
"""
I still really wish I had followed Pascal's lead instead of C's here:
Pascal requires you to use trunc() to convert a real to an integer.
(BTW Pascal also had the division operator right, unlike C, and we're
finally fixing this in Py3k by following Pascal's nearly-40-year-old
lead.) If we had done it that way, we wouldn't have had to introduce
the index() builtin and the corresponding infrastructure (__index__
and a whole slew of C APIs).
"""
http://mail.python.org/pipermail/python-dev/2008-January/076546.html

> So I'm not entirely against the idea that Python should have had separate
> int() and trunc() functions, with int raising an exception on (non-whole
> number?) floats.

It's possible that the reason the idea was rejected before is because
it was suggested that int(1.0) would raise an error, analogous to the
way that float(1+0j) raises an error even though in both cases the
conversion can exactly preserve value.

> But back to Python as it actually is, rather than how it might have been.
> There's no rule that int() must be numerically lossless. It is lossless
> with strings, and refuses to convert strings-that-look-like-floats to ints.
> And that makes sense: in an int, the "." character is just as illegal as
> the characters "k" or "&" or "Ω", int will raise on "123k456", so why
> wouldn't it raise on "123.456"?

I agree. To be fair string handling is not my real complaint. I
referred to that initially since the OP asked about that case.

>
> But that (good, conservative) design decision isn't required or enforced.
> Hence my reply that you cannot safely make the assumption that int() on a
> non-numeric type will be numerically exact.

This is precisely my complaint. The currently available functions are
int, operator.index, trunc, math.ceil, math.floor and round.
Conspicuously absent is the function that simply converts an object to
an integer type at the same time as refusing to change its numeric
value. i.e. I would like the special rounding mode that is "no
rounding", just an error for non-integral values. It's probably to do
with the kind of code I write but I seem to often find myself
implementing a (lazy and bug-prone) function for this task.

>> [SNIP]
>>
>> This is precisely my point. I would prefer if if int(obj) would fail
>> on non-integers leaving me with the option of calling an appropriate
>> rounding function. After catching RoundError (or whatever) you would
>> know that you have a number type object that can be passed to round,
>> ceil, floor etc.
>
> Well, I guess that comes down to the fact that Python is mostly aimed at
> mathematically and numerically naive users who would be scared off at a
> plethora of rounding modes :-)

That's true but I think there are some conceptual points that any user
of any programming language should be forced to consider. Python can
make many common tasks easy for you but there simply is no "one way"
to round a number. Providing a particular rounding mode as a default
rounding mode leads to people not thinking properly about what
rounding mode they really need in a particular problem. This in turn
leads to bugs where the wrong rounding mode is used.


Oscar


More information about the Tutor mailing list