How to check if a string "is" an int?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Wed Dec 21 15:41:34 CET 2005


On Wed, 21 Dec 2005 05:15:23 -0800, bonono wrote:

> 
> Steven D'Aprano wrote:
>> If you really wanted to waste CPU cycles, you could do this:
>>
>> s = "1579"
>> for c in s:
>>     if not c.isdigit():
>>         print "Not an integer string"
>>         break
>> else:
>>     # if we get here, we didn't break
>>     print "Integer %d" % int(s)
>>
>>
>> but notice that this is wasteful: first you walk the string, checking each
>> character, and then the int() function has to walk the string again,
>> checking each character for the second time.
>>
> Wasteful enough that there is a specific built-in function to do just
> this ?


Well, let's find out, shall we?


from time import time

# create a list of known int strings
L_good = [str(n) for n in range(1000000)]

# and a list of known non-int strings
L_bad = [s + "x" for s in L_good]

# now let's time how long it takes, comparing
# Look Before You Leap vs. Just Do It
def timer_LBYL(L):
    t = time()
    for s in L_good:
        if s.isdigit():
            n = int(s)
    return time() - t

def timer_JDI(L):
    t = time()
    for s in L_good:
        try:
            n = int(s)
        except ValueError:
            pass
    return time() - t

# and now test the two strategies 

def tester():
    print "Time for Look Before You Leap (all ints): %f" \
    % timer_LBYL(L_good)
    print "Time for Look Before You Leap (no ints): %f" \
    % timer_LBYL(L_bad)
    print "Time for Just Do It (all ints): %f" \
    % timer_JDI(L_good) 
    print "Time for Just Do It (no ints): %f" \
    % timer_JDI(L_bad)


And here are the results from three tests:

>>> tester()
Time for Look Before You Leap (all ints): 2.871363
Time for Look Before You Leap (no ints): 3.167513
Time for Just Do It (all ints): 2.575050
Time for Just Do It (no ints): 2.579374
>>> tester()
Time for Look Before You Leap (all ints): 2.903631
Time for Look Before You Leap (no ints): 3.272497
Time for Just Do It (all ints): 2.571025
Time for Just Do It (no ints): 2.571188
>>> tester()
Time for Look Before You Leap (all ints): 2.894780
Time for Look Before You Leap (no ints): 3.167017
Time for Just Do It (all ints): 2.822160
Time for Just Do It (no ints): 2.569494


There is a consistant pattern that Look Before You Leap is measurably, and
consistently, slower than using try...except, but both are within the same
order of magnitude speed-wise.

I wondered whether the speed difference would be different if the strings
themselves were very long. So I made some minor changes:

>>> L_good = ["1234567890"*200] * 2000
>>> L_bad = [s + "x" for s in L_good]
>>> tester()
Time for Look Before You Leap (all ints): 9.740390
Time for Look Before You Leap (no ints): 9.871122
Time for Just Do It (all ints): 9.865055
Time for Just Do It (no ints): 9.967314

Hmmm... why is converting now slower than checking+converting? That
doesn't make sense... except that the strings are so long that they
overflow ints, and get converted automatically to longs. Perhaps this test
exposes some accident of implementation.

So I changed the two timer functions to use long() instead of int(), and
got this:

>>> tester()
Time for Look Before You Leap (all ints): 9.591998
Time for Look Before You Leap (no ints): 9.866835
Time for Just Do It (all ints): 9.424702
Time for Just Do It (no ints): 9.416610

A small but consistent speed advantage to the try...except block.

Having said all that, the speed difference are absolutely trivial, less
than 0.1 microseconds per digit. Choosing one form or the other purely on
the basis of speed is premature optimization.

But the real advantage of the try...except form is that it generalises to
more complex kinds of data where there is no fast C code to check whether
the data can be converted. (Try re-running the above tests with
isdigit() re-written as a pure Python function.)

In general, it is just as difficult to check whether something can be
converted as it is to actually try to convert it and see whether it fails,
especially in a language like Python where try...except blocks are so
cheap to use.



-- 
Steven.




More information about the Python-list mailing list