[Python-ideas] Fast sum() for non-numbers

Ronald Oussoren ronaldoussoren at mac.com
Thu Jul 11 08:15:06 CEST 2013


On 11 Jul, 2013, at 0:47, Sergey <sergemp at mail.ru> wrote:

> On Jul 9, 2013 Ronald Oussoren wrote:
> 
>>> For example, rewrite tuple to internally store its values in a
>>> list, and have `localcount` variable saying how many items from
>>> that list belong to this tuple. Then __add__ could extend that
>>> list and reuse it for new tuple.
>> 
>> That's not going to happen, not only breaks that backward compatibility
>> for users for the C API,
> 
> It depends on implementation details, it's possible to keep it
> backward compatible. BTW, what C API do you expect to break?

If a tuple stores the values in a separate list the structure of a PyTupleObject
changes. That structure is exposed to users of the C API, and hence changes
shouldn't be made lightly.

> 
>> it has nasty side effects and is incorrect.
>> Nasty side effect:
>> a = (1,)
>> b = (2,) * 1000
>> c = a + b
>> del b
>> del c
>> With the internal list 'a' keeps alive the extra storage used for 'c'.
> 
> Yes, technically it's possible to implement tuple so that it would
> realloc internal list to save some ram, but why? List does not do
> that when you remove elements from it, why should tuple do that?

Actually, list does resize when you remove items but only does so when
the amount of free items gets too large (see list_resize in Object/listobject.c)

Keeping memory blocks alive unnecessarily is bad because this can increase
the amount of memory used by a script, without their being a clear reason for
it when you inspect the python code.  This has been one of the reasons for
not making string slicing operations views on the entire string (that is, a method
like aStr.partition could return objects that reference aStr for the character
storage which could save memory but at the significant risk of keeping too much
memory alive when aStr is discarded earlier than one of the return values)

> 
> On the other hand:
>  a = (1,) * 1000
>  b = a + (2,3)
>  c = b + (4,5)
> And you have 3 variables for the price of one. Lot's of memory saved!

How do you do that? As for as I know Python isn't a quantum system, 
item 1000 of the hidden storage list can't be both 2 and 4 at the same time.

> 
>> Incorrect:
>> a = (1,)
>> b = a + (2,)
>> c = a + (3,)
>> Now 'b' and 'c' can't possibly both share storage with 'a'.
> 
> Nothing incorrect here, of course __add__ should handle that, and
> if it cannot reuse list it would copy it. As it does now.

And then you introduce unpredictable performance for tuple addition,
currently you can reason about the performance of code that does
tuple addition and with this change you no longer can (it sometimes
is fast, and sometimes it is slow).  That's a major gotcha that will
cause confusion (string addition also has this feature, and that
optimization wouldn't have gotten in with the knowlegde we now have).

Anyways, I'm still +0 using += in sum, and -1 on trying to special case
particular types.

Ronald


More information about the Python-ideas mailing list