why the inconsistency?

Thu Sep 25 20:57:33 EDT 2003

[Christos "TZOTZIOY" Georgiou'
>>> If only math.log(2**64)/math.log(10) worked with longs without
>>> converting them into floats first... :(
>>
>> And after reading the FM, if only
>>
>> def count_digits(x):
>>     return int(math.log(x, 10))
>>
>> worked correctly for *all* large values, like 10**70...

[Michael Hudson]
> Um, it does :-)
>
> >>> float(40**200)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> OverflowError: long int too large to convert to float
> >>> math.log(40**200, 14)
> 279.56038792470861
>
> (10**70 is well within the range of an IEEE double).

You're not used to answering floating-point questions <wink>.  It's true
that, in recent Pythons, log(long) isn't limited to arguments that fit in a
machine double.  But Christos couldn't have been questioning that, since
10**70 wouldn't have given him an OverflowError even if Python were still so
limited (it used to be).

So the real complaint must be the result his function returns when fed
10**70.  The first thing to note is that count_digits(x) doesn't return the
number of decimal digits even for small x:

>>> count_digits(1)
0
>>> count_digits(9)
0
>>> count_digits(10)
1
>>>

It's returning what it was told to compute, though, so that's a bug in the
name of the function, or in its implementation.  10**70 has 71 digits in
reality, but from the short examples above we can infer that 70 was "the
expected" result.

So, is that what we get?  Can't tell from here:  Python is still calling the
platform C math functions, and their accuracy varies across platforms.  On
my platform right now,

>>> count_digits(10**70)
69
>>>

and that's because

>>> math.log(10**70, 10)
69.999999999999986
>>>

on this box.  Going on, and knowing that a C double has 53 bits of precision
on my box,

>>> math.frexp(_)
(0.54687499999999989, 7)
>>> hex(long(_[0] * 2**53))
'0x117FFFFFFFFFFFL'
>>>

So the result (69.999...) was the closest representable double strictly less
than 70, and that 1-bit rounding error ruins his day <wink>.

So it goes.  The math.log() trick can still be extremely useful in getting
the right answer with reasonable speed; we just have to recognize that f.p.
results may suffer small rounding errors, and compensate when they occur:

import math

def num_digits(x):
    """Number of digits in decimal representation of x.

    x must be an integer > 0.
    """

    # Get close fast.
    g = int(math.log(x, 10)) + 1
    # Refine.  Correct result must have
    # 10**(g-1) <= x < 10**g
    lower = 10**(g-1)
    if x < lower:
        print "guess too high"
        g -= 1
    elif x >= lower*10:
        print "guess too low"
        g += 1

    assert 10**(g-1) <= x < 10**g
    return g

Then, e.g.,

>>> num_digits(1)
1
>>> num_digits(9)
1
>>> num_digits(10)
2
>>> num_digits(10**70)
guess too low
71
>>> num_digits(10**70 - 1)
70
>>> num_digits(10**10000)
10001
>>> num_digits(10**10000 - 1)
guess too high
10000
>>>

all-obvious-to-the-most-casual-observer-ly y'rs  - tim