[Tutor] Limitation of int() in converting strings

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Dec 27 20:22:16 CET 2012


On 27 December 2012 17:49, Steven D'Aprano <steve at pearwood.info> wrote:
> On 23/12/12 04:57, Oscar Benjamin wrote:
>>
>> On 22 December 2012 02:06, Steven D'Aprano<steve at pearwood.info>  wrote:
>>>
>>> On 18/12/12 01:36, Oscar Benjamin wrote:
>>>
>>>> I have often found myself writing awkward functions to prevent a
>>>> rounding error from occurring when coercing an object with int().
>>>> Here's one:
>>>>
>>>> def make_int(obj):
>>>>       '''Coerce str, float and int to int without rounding error
>>>>       Accepts strings like '4.0' but not '4.1'
>>>>       '''
>>>>       fnum = float('%s' % obj)
>>>>       inum = int(fnum)
>>>>       assert inum == fnum
>>>>       return inum
>>>
>>>
>>> Well, that function is dangerously wrong. In no particular order,
>>> I can find four bugs and one design flaw.
>>
>>
>> I expected someone to object to this function. I had hoped that they
>> might also offer an improved version, though. I can't see a good way
>> to do this without special casing the treatment of some or other type
>> (the obvious one being str).
>
>
> Why is that a problem?
>
>
> I think this should do the trick. However, I am lazy and have not
> tested it, so this is your opportunity to catch me writing buggy
> code :-)
>
>
> def make_int(obj):
>     try:
>         # Fail if obj is not numeric.
>         obj + 0
>     except TypeError:
>         # For simplicity, I require that objects that convert to
>         # ints always do so losslessly.
>         try:
>             return int(obj)
>         except ValueError:
>             obj = float(obj)
>     # If we get here, obj is numeric. But is it an int?
>     n = int(obj)  # This may fail if obj is a NAN or INF.
>     if n == obj:
>         return n
>     raise ValueError('not an integer')

This one has another large number related problem (also solved by
using Decimal instead of float):

>>> make_int('100000000000000000.1')
100000000000000000

Otherwise the function is good and it demonstrates my original point
quite nicely: the function we've ended up with is pretty horrific for
such a simple operation. It's also not something that a novice
programmer could be expected to write or perhaps even to fully
understand.

In my ideal world the int() function would always raise an error for
non-integers. People would have to get used to calling trunc() in
place of int() but only in the (relatively few) places where they
actually wanted that behaviour. The resulting code would be more
explicit about when numeric values were being altered and what kind of
rounding is being used, both of which are good things.

At one point a similar (perhaps better) idea was discussed on python-dev:
http://mail.python.org/pipermail/python-dev/2008-January/076481.html
but it was rejected citing backwards compatibility concerns:
http://mail.python.org/pipermail/python-dev/2008-January/076552.html

<snip>
>
>> Whether or not assert is appropriate depends on the context (although
>> I imagine that some people would disapprove of it always). I would say
>> that if you are using assert then it should really be in a situation
>> where you're not really looking to handle errors but just to abort the
>> program and debug when something goes wrong. In that context I think
>> that, far from being confusing, assert statements make it plainly
>> clear what the programmer who wrote them was meaning to do.
>
> And what is that? "I only sometimes want to handle errors, sometimes I
> want errors to silently occur without warning"?
>
> Asserts can be disabled by the person running your code. That alone means
> that assert is *never* suitable for error checking, because you cannot be
> sure if your error checking is taking place or not. It is as simple as
> that.

Maybe no-one else will ever run your code. This is the case for much
of the code that I write.

> So what is assert useful for?
>
> - Asserts are really handy for testing in the interactive interpreter;
>   assert is a lazy person's test, but when you're being quick and dirty,
>   that's a feature, not a bug.

This would be my number one reason for using an assert (probably also
the reason in that particular case).

>
> - Asserts are also useful for test suites, although less so because you
>   cannot run your test suite with optimizations on.

I've seen this done a few times for example here:
https://github.com/sympy/sympy/blob/master/sympy/assumptions/tests/test_matrices.py
but I hadn't considered that particular problem. I think the reason
for using them in sympy is that py.test has a special handler for
pulling apart assert statements to show you the values in the
expression that failed.

> - Asserts are good for checking the internal logic and/or state of your
>   program. This is not error checking in the usual sense, since you are
>   not checking that data is okay, but defensively checking that your
>   code is okay.

This is always the case when I use an assert. I don't want to catch
the error and I also think the condition is, for some reason, always
true unless my own code has a bug somewhere.

<snip>
>
> It's a subtle difference, and a matter of personal judgement where the
> line between "internal logic" and "error checking" lies. But here's an
> example of what I consider a check of internal logic, specifically
> that numbers must be zero, positive or negative:
>
>
> # you can assume that x is a numeric type like int, float, Decimal, etc.
> if x == 0:
>     handle_zero()
> elif x > 0:
>     handle_positive()
> elif x < 0:
>     handle_negative()
> else:
>     # This cannot ever happen, so we make an assertion.
>     assert False, "x is neither less than, greater than, or equal to zero"

If I'm feeling really lazy I might just use a bare raise statement for
this kind of thing. Obviously that doesn't work if there is an
exception to be re-raised, though.

>
>
> I've deliberately given an example where the internal logic is actually
> *wrong*. You might think that the assertion is pointless because the
> logic is self-evidently correct, but without the final else clause, there
> is a bug waiting to happen. With the assert, instead of the program silently
> doing the wrong thing, it will loudly and immediately fail with an
> AssertionError.
>
> Bonus points to anyone who can tell me what is wrong with my logic that any
> numeric type must be either less than, equal, or greater than zero.

A NaN leads there and I think it's a good example of where I would use
an assert. Generally I want to abort if I get a NaN. It would be even
better if I could tell Python to abort instantly when the NaN is
generated but that's a separate issue.


Oscar


More information about the Tutor mailing list