
Hello David,
On 10/08/11 21:27, David Naylor wrote:
Hi,
I needed to create a cache of date and time objects and I wondered what was the best way to handle the cache. For comparison I put together the following test:
[cut]
Pypy displays significant slowdown in the defaultdict function, otherwise displays its usual speedup. To check what is the cause I replaced i.date() with i.day and found no major difference in times. It appears dict.setdefault (or it's interaction with jit) is causing a slow down.
I don't think that setdefault is the culprit here, as shown by this benchmark:
@bench.bench def setdef(): d = {} for i in range(10000000): d.setdefault(i, i) return d
@bench.bench def tryexcept(): d = {} for i in range(10000000): try: d[i] except KeyError: d[i] = i return d
setdef() tryexcept()
$ python dictbench.py setdef: 2.03 seconds tryexcept: 8.54 seconds
tmp $ pypy-c dictbench.py setdef: 1.31 seconds tryexcept: 1.37 seconds
as you can see, in PyPy there is almost no difference between using a try/except or using setdefault.
What is very slow on PyPy seems to be hashing datetime objects:
import datetime
@bench.bench def hashdate(): res = 0 for i in range(100000): now = datetime.datetime.now() res ^= hash(now) return res
hashdate()
$ pypy-c dictbench.py hashdate: 0.83 seconds
$ python dictbench.py hashdate: 0.22 seconds
I had a quick look at the code (which in PyPy is written at applevel) and it does a lot of nonsense. In particular, __hash__ calls __getstate which formats a dynamically created string, just to call hash() on it. I suppose that this code can (and should) be optimized a lot. I may try to look at it but it's unclear when, since I'm about to go on vacation.
ciao, Anto