[Python-Dev] Inplace operations for PyLong objects
Terry Reedy
tjreedy at udel.edu
Thu Aug 31 17:24:50 EDT 2017
On 8/31/2017 2:40 PM, Manciu, Catalin Gabriel wrote:
> Hi everyone,
>
> While looking over the PyLong source code in Objects/longobject.c I came
> across the fact that the PyLong object doesnt't include implementation for
> basic inplace operations such as adding or multiplication:
>
> [...]
> long_long, /*nb_int*/
> 0, /*nb_reserved*/
> long_float, /*nb_float*/
> 0, /* nb_inplace_add */
> 0, /* nb_inplace_subtract */
> 0, /* nb_inplace_multiply */
> 0, /* nb_inplace_remainder */
> [...]
>
> While I understand that the immutable nature of this type of object justifies
> this approach, I wanted to experiment and see how much performance an inplace
> add would bring.
> My inplace add will revert to calling the default long_add function when:
> - the refcount of the first operand indicates that it's being shared
> or
> - that operand is one of the preallocated 'small ints'
> which should mitigate the effects of not conforming to the PyLong immutability
> specification.
> It also allocates a new PyLong _only_ in case of a potential overflow.
>
> The workload I used to evaluate this is a simple script that does a lot of
> inplace adding:
>
> import time
> import sys
>
> def write_progress(prev_percentage, value, limit):
> percentage = (100 * value) // limit
> if percentage != prev_percentage:
> sys.stdout.write("%d%%\r" % (percentage))
> sys.stdout.flush()
> return percentage
>
> progress = -1
> the_value = 0
> the_increment = ((1 << 30) - 1)
> crt_iter = 0
> total_iters = 10 ** 9
>
> start = time.time()
>
> while crt_iter < total_iters:
> the_value += the_increment
> crt_iter += 1
>
> progress = write_progress(progress, crt_iter, total_iters)
> end = time.time()
>
> print ("\n%.3fs" % (end - start))
> print ("the_value: %d" % (the_value))
>
> Running the baseline version outputs:
> ./python inplace.py
> 100%
> 356.633s
> the_value: 1073741823000000000
>
> Running the modified version outputs:
> ./python inplace.py
> 100%
> 308.606s
> the_value: 1073741823000000000
>
> In summary, I got a +13.47% improvement for the modified version.
> The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f
> from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost
> disabled (frequency is pinned at 4GHz).
>
> Do you think that such an optimization would be a good approach ?
On my machine, the more realistic code, with an implicit C loop,
the_value = sum(the_increment for i in range(total_iters))
gives the same value twice as fast as your explicit Python loop.
(I cut total_iters down to 10**7).
You might check whether sum uses an in-place accumulator for ints.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list