Hello David, On 10/08/11 21:27, David Naylor wrote:
Hi,
I needed to create a cache of date and time objects and I wondered what was the best way to handle the cache. For comparison I put together the following test:
[cut]
Pypy displays significant slowdown in the defaultdict function, otherwise displays its usual speedup. To check what is the cause I replaced i.date() with i.day and found no major difference in times. It appears dict.setdefault (or it's interaction with jit) is causing a slow down.
I don't think that setdefault is the culprit here, as shown by this benchmark: @bench.bench def setdef(): d = {} for i in range(10000000): d.setdefault(i, i) return d @bench.bench def tryexcept(): d = {} for i in range(10000000): try: d[i] except KeyError: d[i] = i return d setdef() tryexcept() $ python dictbench.py setdef: 2.03 seconds tryexcept: 8.54 seconds tmp $ pypy-c dictbench.py setdef: 1.31 seconds tryexcept: 1.37 seconds as you can see, in PyPy there is almost no difference between using a try/except or using setdefault. What is very slow on PyPy seems to be hashing datetime objects: import datetime @bench.bench def hashdate(): res = 0 for i in range(100000): now = datetime.datetime.now() res ^= hash(now) return res hashdate() $ pypy-c dictbench.py hashdate: 0.83 seconds $ python dictbench.py hashdate: 0.22 seconds I had a quick look at the code (which in PyPy is written at applevel) and it does a lot of nonsense. In particular, __hash__ calls __getstate which formats a dynamically created string, just to call hash() on it. I suppose that this code can (and should) be optimized a lot. I may try to look at it but it's unclear when, since I'm about to go on vacation. ciao, Anto