[Tutor] Limitation of int() in converting strings

Steven D'Aprano steve at pearwood.info
Thu Dec 27 18:49:54 CET 2012


On 23/12/12 04:57, Oscar Benjamin wrote:
> On 22 December 2012 02:06, Steven D'Aprano<steve at pearwood.info>  wrote:
>> On 18/12/12 01:36, Oscar Benjamin wrote:
>>
>>> I have often found myself writing awkward functions to prevent a
>>> rounding error from occurring when coercing an object with int().
>>> Here's one:
>>>
>>> def make_int(obj):
>>>       '''Coerce str, float and int to int without rounding error
>>>       Accepts strings like '4.0' but not '4.1'
>>>       '''
>>>       fnum = float('%s' % obj)
>>>       inum = int(fnum)
>>>       assert inum == fnum
>>>       return inum
>>
>> Well, that function is dangerously wrong. In no particular order,
>> I can find four bugs and one design flaw.
>
> I expected someone to object to this function. I had hoped that they
> might also offer an improved version, though. I can't see a good way
> to do this without special casing the treatment of some or other type
> (the obvious one being str).

Why is that a problem?


I think this should do the trick. However, I am lazy and have not
tested it, so this is your opportunity to catch me writing buggy
code :-)


def make_int(obj):
     try:
         # Fail if obj is not numeric.
         obj + 0
     except TypeError:
         # For simplicity, I require that objects that convert to
         # ints always do so losslessly.
         try:
             return int(obj)
         except ValueError:
             obj = float(obj)
     # If we get here, obj is numeric. But is it an int?
     n = int(obj)  # This may fail if obj is a NAN or INF.
     if n == obj:
         return n
     raise ValueError('not an integer')




> Although you have listed 5 errors I would have written the same list
> as 2 errors:
> 1) You don't like my use of assert.

That's more than a mere personal preference. See below.


> 2) The function doesn't work for large numbers (bigger than around
> 100000000000000000).

It's not just that it "doesn't work", but it experiences distinct failure
modes. If you were writing regression tests for these bugs, you would need
*at least* two such tests:

- large strings convert exactly;
- for int n, make_int(n) always returns n

If I were writing unit tests, I would ensure that I had a unit test for
each of the failures I showed.



> I would also add:
> 3) It's ridiculous to convert types several times just to convert to
> an integer without rounding.

Perhaps. Even if that is the case, that's not a bug, merely a slightly
less efficient implementation.


> Whether or not assert is appropriate depends on the context (although
> I imagine that some people would disapprove of it always). I would say
> that if you are using assert then it should really be in a situation
> where you're not really looking to handle errors but just to abort the
> program and debug when something goes wrong. In that context I think
> that, far from being confusing, assert statements make it plainly
> clear what the programmer who wrote them was meaning to do.

And what is that? "I only sometimes want to handle errors, sometimes I
want errors to silently occur without warning"?

Asserts can be disabled by the person running your code. That alone means
that assert is *never* suitable for error checking, because you cannot be
sure if your error checking is taking place or not. It is as simple as
that.

So what is assert useful for?

- Asserts are really handy for testing in the interactive interpreter;
   assert is a lazy person's test, but when you're being quick and dirty,
   that's a feature, not a bug.

- Asserts are also useful for test suites, although less so because you
   cannot run your test suite with optimizations on.

- Asserts are good for checking the internal logic and/or state of your
   program. This is not error checking in the usual sense, since you are
   not checking that data is okay, but defensively checking that your
   code is okay.


What do I mean by that last one? If you're ever written defensive code
with a comment saying "This cannot ever happen", this is a good candidate
for an assertion. Good defensive technique is to be very cautious about
the assumptions you make: just because you think something cannot happen,
doesn't mean you are correct. So you test your own logic by checking
that the thing you think must be true is true, and raise an error if it
turns out you are wrong.

But it seems pretty wasteful and pointless to be checking something that
you know is always correct. Especially if those checks are expensive, you
might want to turn them off. Hence, you use assert, which can be turned
off. This is a trade-off, of course: you're trading a bit of extra speed
for a bit more risk of a silent failure. If you're aren't confident
enough to make that trade-off, you are better off using an explicit,
non-assert check.

It's a subtle difference, and a matter of personal judgement where the
line between "internal logic" and "error checking" lies. But here's an
example of what I consider a check of internal logic, specifically
that numbers must be zero, positive or negative:


# you can assume that x is a numeric type like int, float, Decimal, etc.
if x == 0:
     handle_zero()
elif x > 0:
     handle_positive()
elif x < 0:
     handle_negative()
else:
     # This cannot ever happen, so we make an assertion.
     assert False, "x is neither less than, greater than, or equal to zero"


I've deliberately given an example where the internal logic is actually
*wrong*. You might think that the assertion is pointless because the
logic is self-evidently correct, but without the final else clause, there
is a bug waiting to happen. With the assert, instead of the program silently
doing the wrong thing, it will loudly and immediately fail with an
AssertionError.

Bonus points to anyone who can tell me what is wrong with my logic that any
numeric type must be either less than, equal, or greater than zero.



> <snip>
>> Lest you think that it is only humongous numbers where this is a
>> problem, it is not. A mere seventeen digits is enough:
>>
>> py>  s = "10000000000000001"
>> py>  make_int(s) - int(s)
>> -1L
>
> I think that most people would consider one hundred thousand million
> million and one to be a fairly big number.


That's because most people are neither mathematicians, nor programmers.
That's less than 2**54, a mere seven bytes.



-- 
Steven


More information about the Tutor mailing list