Interning own classes like strings for speed and size?

Daniel Fetchinson fetchinson at googlemail.com
Mon Dec 27 06:45:42 EST 2010


> I'm trying to solve a computational problem and of course speed and size is
> important there. Apart from picking the right algorithm, I came across an
> idea that could help speed up things and keep memory requirements down. What
> I have is regions described by min and max coordinates. At first, I just
> modeled these as a simple class containing two values for each axis.
>
> In a second step, I derived this class from tuple instead of object. Some
> code then moved from __init__ to __new__ and some code that modified these
> objects had to be changed to replace them instead. The upside to this is
> that they can be used as keys in sets and dicts, which isn't the case for
> mutable types[1].
>
> What I'm now considering is to only allow a single instance of these objects
> for each set of values, similar to interned strings. What I would gain is
> that I could safely compare objects for identity instead of equality. What
> I'm not yet sure is how much overhead that would give me and/or how to keep
> it low. The idea is to store each instance in a set and after creating a new
> object I would first look up an equal object in the global set and return
> that instead, otherwise add the new one.
>
> The problem I foresee is that if I define equality as identity, this lookup
> when creating will never eliminate duplicates. If I only fall back to
> equality comparison for non-identical objects, I would probably sacrifice
> most of the gain. If I build a dict mapping between the values and the
> actual objects, I would have doubled the required memory and uselessly store
> the same values twice there.
>
> Am I looking in the wrong direction? Is there some better approach? Please
> don't tell me to use C, as I'm specifically interested in learning Python,
> I'm pretty sure I could have solved the problem quickly in C++ otherwise.
> Other suggestions?
>
> Cheers!
>
> Uli
>
>
> [1] Somebody correct me if I'm wrong, but I believe I could have defined a
> hashing function for the type and thus allowed its use in a set or dict,
> right? However, goofing up because you accidentally modified an object and
> changed its hash value is something I don't want to risk anyway.

I believe what you are looking for is (some variant of) the singleton pattern:

http://en.wikipedia.org/wiki/Singleton_pattern

How it's done in python see http://www.google.com/search?q=python+singleton

Cheers,
Daniel

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown



More information about the Python-list mailing list