[Python-Dev] Re: Memory size overflows

Fri, 25 Oct 2002 13:07:14 -0400

I wrote:
> Shall I submit a patch?

Tim Peters wrote:
> Sure, but also submit your timing harness so that people can measure the
> effects cross-platform and cross-compiler.

Of course. I first wanted to see if there was any interest.

Integrated into Python, the speedup was much less: only 0.1%
overall for this million-multiply script:

 import time

 start = time.time()
 a = -1000
 while a <= 1000:
     b = 1000
     while b <= 1000:
         b += 1
     a += 1
 end = time.time()

 base = end - start

 start = time.time()
 a = -1000
 while a <= 1000:
     b = -1000
     while b <= 1000:
         b += 1
         c = a * b
     a += 1
 end = time.time()

 print end - start - base

The difference in speedup is an interesting data point on
Python overhead in general (it took just over 6 seconds to
do 1,000,000 multiplies on my system).

> Which platform?  Which compiler?  What was your test driver?  Was this
> timing the mult code in isolation, or timing Python-level multiplies?
> Claims of small speedups are notoriously platform- and test-dependent.  If
> it's a mixed bag across platforms, the risk of introducing a new bug would
> favor leaving things alone.  In the absence of a clear correctness proof, a
> Python simulation program demonstrating correctness exhaustively in small
> bases would also be helpful.

I doubt you're still interested, but I used Cygwin on a
P3/833 Thinkpad running Win2K (GCC compiler). My original
results were on just the multiply code in isolation--I
wanted to see if there was interest before continuing.

Since I have them, I attached my initial testbed (new.c
is the final version), and here is an exhaustive test
for the "cannot_overflow" check used in that version:

 from __future__ import generators

 def cannot_overflow(a,b,m):
     M_MASK = (1 << m) - 1
     HI_LO_BIT = 1 << ((m/2) - 1)
     OVL_IMPOSSIBLE_MASK = HI_LO_BIT - 1
     OVL_POSSIBLE_MASK = M_MASK ^ OVL_IMPOSSIBLE_MASK
     if a < 0:
         check = OVL_POSSIBLE_MASK
     else:
         check = 0
     if (a & OVL_POSSIBLE_MASK) == check:
         if b < 0:
             check = OVL_POSSIBLE_MASK
         else:
             check = 0
         if (b & OVL_POSSIBLE_MASK) == check:
             return True
     return False

 def overflowed(a,b,m):
     VALID_MASK = (2**(m - 1)) - 1
     INVALID_MASK = (2**(2*m) - 1) ^ VALID_MASK
     prod = a * b
     highbits = prod & INVALID_MASK
     return (highbits != 0) and (highbits != INVALID_MASK)

 def checkall(m):
     for i in range(-(2**(m-1)),2**(m-1)):
         for j in range(-(2**(m-1)),2**(m-1)):
             if cannot_overflow(i,j,m) and overflowed(i,j,m):
                 yield (i,j)

 if __name__ == "__main__":
     for m in range(2,11,2):
         print "M ==", m
         for i in checkall(m):
             print i
         print "END OF LIST"

You can compare the "cannot_overflow" code to the code that
would be added to int_mul. For the sake of minimizing impact,
I limited the changes to defining UL_OVERFLOW_POSSIBLE_MASK
as the equivalent of ~(2**(m/2-1) - 1) and this code, added
immediately after converting the arguments into longs:

        {
            unsigned long ma = a & UL_OVERFLOW_POSSIBLE_MASK;
            if (ma == ((a < 0) ? UL_OVERFLOW_POSSIBLE_MASK : 0))
            {
                unsigned long mb = b & UL_OVERFLOW_POSSIBLE_MASK;
                if (mb == ((b < 0) ? UL_OVERFLOW_POSSIBLE_MASK : 0))
                {
                    return PyInt_FromLong(a*b);
                }
            }
        }

This seems pretty safe, but the payout seems too small
to warrant taking this any further. Unless someone asks
me to do so, I'm planning to drop it.

-Jerry