[Python-Dev] Inplace operations for PyLong objects

Thu Aug 31 17:24:50 EDT 2017

On 8/31/2017 2:40 PM, Manciu, Catalin Gabriel wrote:
> Hi everyone,
> 
> While looking over the PyLong source code in Objects/longobject.c I came
> across the fact that the PyLong object doesnt't include implementation for
> basic inplace operations such as adding or multiplication:
> 
> [...]
>      long_long,                  /*nb_int*/
>      0,                          /*nb_reserved*/
>      long_float,                 /*nb_float*/
>      0,                          /* nb_inplace_add */
>      0,                          /* nb_inplace_subtract */
>      0,                          /* nb_inplace_multiply */
>      0,                          /* nb_inplace_remainder */
> [...]
> 
> While I understand that the immutable nature of this type of object justifies
> this approach, I wanted to experiment and see how much performance an inplace
> add would bring.
> My inplace add will revert to calling the default long_add function when:
> 	- the refcount of the first operand indicates that it's being shared
> 	or
> 	- that operand is one of the preallocated 'small ints'
> which should mitigate the effects of not conforming to the PyLong immutability
> specification.
> It also allocates a new PyLong _only_ in case of a potential overflow.
> 
> The workload I used to evaluate this is a simple script that does a lot of
> inplace adding:
> 
> 	import time
> 	import sys
> 
> 	def write_progress(prev_percentage, value, limit):
> 		percentage = (100 * value) // limit
> 		if percentage != prev_percentage:
> 			sys.stdout.write("%d%%\r" % (percentage))
> 			sys.stdout.flush()
> 		return percentage
> 
> 	progress = -1
> 	the_value = 0
> 	the_increment = ((1 << 30) - 1)
> 	crt_iter = 0
> 	total_iters = 10 ** 9
> 
> 	start = time.time()
> 
> 	while crt_iter < total_iters:
> 		the_value += the_increment
> 		crt_iter += 1
> 		
> 		progress = write_progress(progress, crt_iter, total_iters)
> 	end = time.time()
> 
> 	print ("\n%.3fs" % (end - start))
> 	print ("the_value: %d" % (the_value))
> 
> Running the baseline version outputs:
> ./python inplace.py
> 100%
> 356.633s
> the_value: 1073741823000000000
> 
> Running the modified version outputs:
> ./python inplace.py
> 100%
> 308.606s
> the_value: 1073741823000000000
> 
> In summary, I got a +13.47% improvement for the modified version.
> The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f
> from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost
> disabled (frequency is pinned at 4GHz).
> 
> Do you think that such an optimization would be a good approach ?

On my machine, the more realistic code, with an implicit C loop,
the_value = sum(the_increment for i in range(total_iters))
gives the same value twice as fast as your explicit Python loop.
(I cut total_iters down to 10**7).

You might check whether sum uses an in-place accumulator for ints.

-- 
Terry Jan Reedy