[pypy-dev] Python vs pypy: interesting performance difference [dict.setdefault]

David Naylor naylor.b.david at gmail.com
Fri Aug 12 17:49:54 CEST 2011


On Friday, 12 August 2011 14:51:36 Antonio Cuni wrote:
> Hello David,
> 
> On 10/08/11 21:27, David Naylor wrote:
> > Hi,
> > 
> > I needed to create a cache of date and time objects and I wondered what
> > was the best way to handle the cache.  For comparison I put together
> 
> > the following test:
> [cut]
> 
> > Pypy displays significant slowdown in the defaultdict function, otherwise
> > displays its usual speedup.  To check what is the cause I replaced
> > i.date() with i.day and found no major difference in times.  It appears
> > dict.setdefault (or it's interaction with jit) is causing a slow down.
> 
> I don't think that setdefault is the culprit here, as shown by this
> benchmark:

I made a mistake in the script, I intended to use the day number as the key 
(as was in the case of the other tests).  Changing the critical line to 
"b.append(cache.setdefault(i.day, i.date()))" gives comparable results to 
python (factoring in the speed up).  

<snip/>

> as you can see, in PyPy there is almost no difference between using a
> try/except or using setdefault.

I would argue that with an expensive value operation the try/except will be 
much faster than setdefault (where cache hits are above 0).  Using my fixed 
script I get:

keydict: [0.3298788070678711, 0.28450703620910645, 0.28931379318237305]
defaultdict: [0.47180604934692383, 0.4183311462402344, 0.4172670841217041]
 
indicating setdefault is about 40% slower (1.4 times slower), which I 
attribute to the cost of i.date().  

> I had a quick look at the code (which in PyPy is written at applevel) and
> it does a lot of nonsense.  In particular, __hash__ calls __getstate which
> formats a dynamically created string, just to call hash() on it.  I
> suppose that this code can (and should) be optimized a lot.  I may try to
> look at it but it's unclear when, since I'm about to go on vacation.

Would it not be a simple matter of changing the __(get|set)state method to use 
a tuple or even an int(long)?  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110812/af09aae4/attachment.pgp>


More information about the pypy-dev mailing list