[pypy-dev] Alternative algorithm for attribute lookup

Antonio Cuni anto.cuni at gmail.com
Tue Jul 3 15:56:51 CEST 2007


Hi all,

I passed the last few days slowly experimenting with
alternative/faster ways to do lookup of attributes.

For simplicity, I didn't touch the interpreter but I simply wrote few
rpython programs doing only the last part of the attribute lookup,
i.e. accessing the underlying RPython dictionary. Moreover, I played
only with dict of strings, since this is how objects are implemented
when multidict is enabled.


My first experiment was with precomputed hashes for keys; I introduced
a new method get_fast on dictionaries, which accepts both the key and
its hash value, so it does not need to compute it.  Of course this is
just a quick hack because programs using get_fast works only when
translated and can't run on top of CPython, but it was the simplest
way to experiment.

You can find both the get_fast patch and the benchmark here:
http://codespeak.net/svn/user/antocuni/lookup-hack/get_fast.patch
http://codespeak.net/svn/user/antocuni/lookup-hack/targethashcache.py

The interesting thing is the result of the benchmark: since RPython
strings already cache their hash I didn't expect get_fast to be much
faster than get, but the benchmark says the opposite; on my machine:

antocuni at anto lookup-hack $ ./targethashcache-c
50000000 iterations
get:        3.820000 seconds
get_fast:   2.520000 seconds

get_fast:   1.515873 times faster

It is 50% faster. I really don't know why, maybe there is something
wrong in the benchmark or maybe something is wrong with the hash cache
of rpython strings.


The second and more insteresting experiment is a bit more involved.
The idea is that for most of Python objects, we can guess a set of
"likely attributes" at compile time:

   - for modules, by inspecting its global namespace;

   - for classes, by inspecting the namespace of a class definition;

   - for instances, by looking for LOAD_FAST 0, STORE_ATTR x inside the
     methods of the class.

Note that we don't pretend to catch all the possible attributes, just
the most likely. The assumption is that it's very uncommon to have a
"non-likely attribute" and that for most of the objects the set of the
attributes at runtime is a subset of the likely attributes.

The idea is to speed up consistently all the accesses to likely
attributes, at the cost of a slighly slow down for accessing the
non-likely ones.

Once we have collected the set of likely attributes, we compute a
perfect hash function for this set; for my experiment, I used an
hacked version of this algorithm:
http://www.amk.ca/files/python/perfect_hash.txt

The perfect hash function looks like this, where N and G vary from set
to set of likely attributes::

     def perfect_hash(N, G, key):
         h = hash(key) # normal hash
         i = h & N-1
         j = i+1
         res = G[i] + G[j]
         res = res & N-1
         return res

Moreover, since h depends only on key we could also precomputed it at
compile time, as we did for get_fast::

     def perfect_hash(N, G, key, h): ...

Finally, we insert a fast path for the common case in the lookup code::

     def lookup(d, key, h): # h is the precomputed hash
         index = perfect_hash(d.N, d.G, key, h)
         entry = d.entries[index]
         if entry.is_valid() and entry.key is key:
             return entry.value
         else:
             # do a full lookup

Note that in the fast path we use 'is' instead of == to compare the
keys; since the W_Strings storing the names of the attributes are
interned, this should be enough. Note also that G and N are stored on
the dict (thus we can have different perfect hashes for different set
of likely attributes, as we need).

If my assumption about likely attributes is true, most of LOAD_ATTR
will follow the fast path, resulting in a considerable speedup: if the
object only uses likely attributes, there will never be a collision.
Most important, this does not break anything, because in case the fast
path is not followed (for example when we do {get,set}attr(obj, 'attr')),
the full lookup will work as well.

You can find the benchmark here (this doesn't need get_fast.patch):
http://codespeak.net/svn/user/antocuni/lookup-hack/targetperfectdict.py

For the test, I implemented a perfect_dict as an RPython class, thus
it might be slower than what we can get if we implement it as a real
rpython type on the style of r_dict; in particular, RPython does
index-checking when calcultaing G[i] and G[j], but it's not necessary.

Although perfect_dict is not as fast as possible, the result of the
benchmark is very interesting:

antocuni at anto lookup-hack $ ./targetperfectdict-c
50000000 iterations
get:        3.740000 seconds
pdict:      1.450000 seconds

pdict:      2.579310 times faster


What do you think about this idea? Is there any obvious bug/wrong
assumption I can't catch? Do you think it might be worth of
implementing it? (it would be a nice sprint topic).

ciao Anto



More information about the Pypy-dev mailing list