[Python-ideas] Fast sum() for non-numbers

Sergey sergemp at mail.ru
Fri Jul 12 16:03:53 CEST 2013


On Jul 11, 2013 Ronald Oussoren wrote:

>> It depends on implementation details, it's possible to keep it
>> backward compatible. BTW, what C API do you expect to break?
> 
> If a tuple stores the values in a separate list the structure of a PyTupleObject
> changes. That structure is exposed to users of the C API, and hence changes
> shouldn't be made lightly.

That would effectively add one more:
  void *_internal_list;
field, which is backward compatible.

(On the other hand it may be a good idea to break API compatibility
from time to time to make sure that it's not badly misused. :)

>> Yes, technically it's possible to implement tuple so that it would
>> realloc internal list to save some ram, but why? List does not do
>> that when you remove elements from it, why should tuple do that?
> 
> Actually, list does resize when you remove items but only does so when
> the amount of free items gets too large (see list_resize in Object/listobject.c)

Agree. It indeed tries to reallocate lists sometimes. I was fooled
because I tried:
  a=[i for i in xrange(50000000)]
  while len(a) > 10: x = a.pop()
and it only released the memory when I closed the interpreter.

(this brings a weak question of whether it makes sense to reallocate
if memory is not released to the system)

> Keeping memory blocks alive unnecessarily is bad because this can increase
> the amount of memory used by a script, without their being a clear reason for
> it when you inspect the python code.  This has been one of the reasons for
> not making string slicing operations views on the entire string (that is, a method
> like aStr.partition could return objects that reference aStr for the character
> storage which could save memory but at the significant risk of keeping too much
> memory alive when aStr is discarded earlier than one of the return values)

Well, depending on the actual implementation this can be fixed. E.g.
as long as we optimize things for the case of 2 objects using same
list, the list could store both sizes and reallocate if they both
are much smaller than allocated size. Or it can store all sizes and
reallocate when all of them are small enough. Or it can use some lazy
reallocation, just marking the list and allowing other alive objects
to reallocate when they next access this list. Lots of options. :)

>> On the other hand:
>>  a = (1,) * 1000
>>  b = a + (2,3)
>>  c = b + (4,5)
>> And you have 3 variables for the price of one. Lot's of memory saved!
> 
> How do you do that? As for as I know Python isn't a quantum system, 
> item 1000 of the hidden storage list can't be both 2 and 4 at the same time.

c==(1, ... 1,2,3,4,5). No need to be 2 and 4 at the same time. :)

>> Nothing incorrect here, of course __add__ should handle that, and
>> if it cannot reuse list it would copy it. As it does now.
> 
> And then you introduce unpredictable performance for tuple addition,
> currently you can reason about the performance of code that does
> tuple addition and with this change you no longer can (it sometimes
> is fast, and sometimes it is slow).

Don't understand. Now it's always slow and unpredictable (because it
depends on the current heap state, allocation map of system RAM, etc).
You cannot predict anything right now. It is as much unpredictable
as adding item to a list (it may or may not need to realloc), or
as almost every other operation in python. It may become faster
in average, but will still be as much unpredictable as it is now.

E.g. you have no way to predict that the memory page you're trying
to read was swapped out by the system and won't come up in next few
seconds because disk is under a heavy load.

> Anyways, I'm still +0 using += in sum, and -1 on trying to special case
> particular types.

This is not about special casing. What we discuss now is some sort of
copy-on-write optimisation for python. Copy-on-write is known to save
memory, it's widely used in many places (e.g. by Linux kernel when
managing apps memory pages). Is it a bad thing to have such a general
and well known optimisation technique in python?

This approach has other benefits. For example lists and tuples could
share the allocation code, that would not only make lists faster as
well, but will allow instant conversion of lists to tuples and back.

What I'm afraid of is that despite making many things faster I may
face the argument "this also make sum() of lists faster, and this
should not happen, sum() MUST be slow because this is how it must be".
So after all the sum discussions I don't have much desire to even
start working on implementation.

—— 


More information about the Python-ideas mailing list