[pypy-dev] a possible leak in the object namespace...

Carl Friedrich Bolz cfbolz at gmx.de
Mon Nov 29 19:46:11 CET 2010


Hi Alex,

On 11/29/2010 03:04 PM, Alex A. Naanou wrote:
> With the release of version 1.4, I decided to test these usecases out
> and benchmark them on PyPy and 15 minutes later I got results that
> were surprising to say the least...
>
> Expectations:
> 1) the normal/native namespace should have been a bit faster than the
> hooked object on the first run. Both cases should have leveled to
> about the same performance after the JIT finished it's job +/- a
> constant.
> 2) all times should have been near constant.
>
> What I got per point:
> 1) the object with native dict was slower by about three orders of
> magnitude than the object with a hooked namespace.
> 2) sequential write benchmark runs on the normal object did not level
> out, as they did with the hook, rather, they exhibited exponential
> times (!!)

Don't do that then :-).


> For details and actual test code see the attached file.

The code you are trying is essentially this:

def test(obj, count=10000):
	t0 = time.time()
	for i in xrange(count):
		setattr(obj, 'a' + str(i), i)
	t1 = time.time()
	# return: total, time per write
	return t1 - t0, (t1 - t0)/count

This is not working very well with the non-overridden dict, because we 
don't optimize for this case at all. You are using

  a) lots of attributes, which we expect to be rare
  b) access them with setattr, which is a lot slower than a fixed attribute
  c) access a different attribute every loop iteration, which means the 
compiler has to produce one bit of code for every attribute

Read this, for some hints why this is the case:

http://morepypy.blogspot.com/2010/11/efficiently-implementing-python-objects.html

This is in theory fixable with enough work, but I am not sure that this 
is a common or useful use case. If you really need to do this, just use 
a normal dictionary. Or show me some real-world code that does this, and 
might think about the case some more.

Anyway, the timing behavior of the above loop is merely quadratic in the 
number of attributes, not exponential :-).

Cheers,

Carl Friedrich



More information about the Pypy-dev mailing list