On Friday, 12 August 2011 14:51:36 Antonio Cuni wrote:
Hello David,
On 10/08/11 21:27, David Naylor wrote:
Hi,
I needed to create a cache of date and time objects and I wondered what was the best way to handle the cache. For comparison I put together
the following test: [cut]
Pypy displays significant slowdown in the defaultdict function, otherwise displays its usual speedup. To check what is the cause I replaced i.date() with i.day and found no major difference in times. It appears dict.setdefault (or it's interaction with jit) is causing a slow down.
I don't think that setdefault is the culprit here, as shown by this benchmark:
I made a mistake in the script, I intended to use the day number as the key (as was in the case of the other tests). Changing the critical line to "b.append(cache.setdefault(i.day, i.date()))" gives comparable results to python (factoring in the speed up). <snip/>
as you can see, in PyPy there is almost no difference between using a try/except or using setdefault.
I would argue that with an expensive value operation the try/except will be much faster than setdefault (where cache hits are above 0). Using my fixed script I get: keydict: [0.3298788070678711, 0.28450703620910645, 0.28931379318237305] defaultdict: [0.47180604934692383, 0.4183311462402344, 0.4172670841217041] indicating setdefault is about 40% slower (1.4 times slower), which I attribute to the cost of i.date().
I had a quick look at the code (which in PyPy is written at applevel) and it does a lot of nonsense. In particular, __hash__ calls __getstate which formats a dynamically created string, just to call hash() on it. I suppose that this code can (and should) be optimized a lot. I may try to look at it but it's unclear when, since I'm about to go on vacation.
Would it not be a simple matter of changing the __(get|set)state method to use a tuple or even an int(long)?