[Tutor] Limitation of int() in converting strings
Steven D'Aprano
steve at pearwood.info
Tue Jan 1 06:07:38 CET 2013
On 23/12/12 04:38, Oscar Benjamin wrote:
> On 22 December 2012 01:34, Steven D'Aprano<steve at pearwood.info> wrote:
>> On 18/12/12 01:36, Oscar Benjamin wrote:
>>
>>> I think it's unfortunate that Python's int() function combines two
>>> distinct behaviours in this way. In different situations int() is used
>>> to:
>>> 1) Coerce an object of some type other than int into an int without
>>> changing the value of the integer that the object represents.
>>
>> The second half of the sentence (starting from "without changing") is not
>> justified. You can't safely make that assumption. All you know is that
>> calling int() on an object is intended to convert the object to an int,
>> in whatever way is suitable for that object. In some cases, that will
>> be numerically exact (e.g. int("1234") will give 1234), in other cases it
>> will not be.
>
> If I was to rewrite that sentence would replace the word 'integer'
> with 'number' but otherwise I'm happy with it. Your reference to
> "numerically exact" shows that you understood exactly what I meant.
Yes. And it is a demonstrable fact that int is *not* intended to coerce
objects to int "without changing the value of the number", because
changing the value of the number is precisely what int() does, in some
circumstances.
If you would like to argue that it would have been better if int did
not do this, then I might even agree with you. There is certainly
precedence: if I remember correctly, you cannot convert floating point
values to integers directly in Pascal, you first have to truncate them
to an integer-valued float, then convert.
# excuse my sloppy Pascal syntax, it has been a few years
var
i: integer;
x: real;
begin
i = integer(trunc(x));
end;
So I'm not entirely against the idea that Python should have had separate
int() and trunc() functions, with int raising an exception on (non-whole
number?) floats.
But back to Python as it actually is, rather than how it might have been.
There's no rule that int() must be numerically lossless. It is lossless
with strings, and refuses to convert strings-that-look-like-floats to ints.
And that makes sense: in an int, the "." character is just as illegal as
the characters "k" or "&" or "Ω", int will raise on "123k456", so why
wouldn't it raise on "123.456"?
But that (good, conservative) design decision isn't required or enforced.
Hence my reply that you cannot safely make the assumption that int() on a
non-numeric type will be numerically exact.
>>> 2) Round an object with a non-integer value to an integer value.
>>
>>
>> int() does not perform rounding (except in the most generic sense that any
>> conversion from real-valued number to integer is "rounding"). That is what
>> the round() function does. int() performs truncating: it returns the
>> integer part of a numeric value, ignoring any fraction part:
>
> I was surprised by your objection to my use of the word "rounding"
> here. So I looked it up on Wikipedia:
> http://en.wikipedia.org/wiki/Rounding#Rounding_to_integer
>
> That section describes "round toward zero (or truncate..." which is
> essentially how I would have put it, and also how you put it below:
Well, yes. I explicitly referred to the generic sense where any conversion
from real-valued to whole number is "rounding". But I think that it is a
problematic, ambiguous term that needs qualification:
* sometimes truncation is explicitly included as a kind of rounding;
* sometimes truncation is used in opposition to rounding.
For example, I think that in everyday English, most people would be
surprised to hear you describe "rounding 9.9999999 to 9". In the
absence of an explicit rounding direction ("round down", "round up"),
some form of "round to nearest" is assumed in everyday English, and
as such is used in contrast to merely cutting off whatever fraction
part is there (truncation).
Hence the need for qualification.
>> So you shouldn't think of int(number) as "convert number to an int", since
>> that is ambiguous. There are at least six common ways to convert arbitrary
>> numbers to ints:
>
> This is precisely my point. I would prefer if if int(obj) would fail
> on non-integers leaving me with the option of calling an appropriate
> rounding function. After catching RoundError (or whatever) you would
> know that you have a number type object that can be passed to round,
> ceil, floor etc.
Well, I guess that comes down to the fact that Python is mostly aimed at
mathematically and numerically naive users who would be scared off at a
plethora of rounding modes :-)
>> Python provides truncation via the int and math.trunc functions, floor and
>> ceiling via math.floor and math.ceil, and round to nearest via round.
>> In Python 2, ties are rounded up, which is biased; in Python 3, the
>> unbiased banker's rounding is used.
>
> I wasn't aware of this change. Thanks for that.
Actually, I appear to have been wrong: in Python 2, ties are rounded
away from zero rather than up. Positive arguments round up, negative
arguments round down:
py> round(1.5), round(2.5)
(2.0, 3.0)
py> round(-1.5), round(-2.5)
(-2.0, -3.0)
>> Instead, you should consider int(number) to be one of a pair of functions,
>> "return integer part", "return fraction part", where unfortunately the
>> second function isn't provided directly. In general though, you can get
>> the fractional part of a number with "x % 1". For floats, math.modf also
>> works.
>
> Assuming that you know you have an object that supports algebraic
> operations in a sensible way then this works, although the
> complementary function for "x % 1" would be "x // 1" or
> "math.floor(x)" rather than "int(x)".
Again, I was mistaken. x%1 is not suitable to get the fraction part of a
number in Python: it returns the wrong result for negative values. You need
math.modf:
py> x = -99.25
py> x % 1 # want -0.25
0.75
py> math.modf(x)
(-0.25, -99.0)
--
Steven
More information about the Tutor
mailing list