Interning own classes like strings for speed and size?
Stefan Behnel
stefan_ml at behnel.de
Tue Dec 28 09:39:32 EST 2010
Steven D'Aprano, 28.12.2010 15:11:
> On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote:
>
>> Steven D'Aprano wrote:
>>>>>> class InternedTuple(tuple):
>>> ... _cache = {}
>>> ... def __new__(cls, *args):
>>> ... t = super().__new__(cls, *args)
>>> ... return cls._cache.setdefault(t, t)
>>
>> That looks good. The only thing that first bothered me is that it
>> creates an object and then possibly discards it again. However, there is
>> no way around that, since at least the key to the dict must be created
>> for lookup. Since key and value are the same here, this is even for
>> free.
>>
>> What I also found was that with the above, I can't provide __eq__ and
>> __ne__ that just check for identity. If I do, the lookup in setdefault()
>> will never find an existing tuple and I will never save memory for a
>> single object.
>
> If all you want is to save memory, you don't need to change the __eq__
> method. But if you still want to, try this:
>
> # Untested
Yep, that' the problem. ;)
> class InternedTuple(tuple):
> _cache = {}
> def __new__(cls, *args):
> t = super().__new__(cls, *args)
> return cls._cache.setdefault(args, t)
> def __eq__(self, other):
> return self is other
> def __ne__(self, other):
> return self is not other
What Ulrich meant, was: doing this will actually kill the caching, because
the first time the comparison is called is when looking up the tuple while
adding it to the interning dict. Since the new tuple is, well, new, it will
not be equal (read: identical) to any cached tuple, thus resulting in a new
entry regardless of its content.
Stefan
More information about the Python-list
mailing list