[pypy-dev] Python vs pypy: interesting performance difference [dict.setdefault]
Antonio Cuni
anto.cuni at gmail.com
Fri Aug 12 14:51:36 CEST 2011
Hello David,
On 10/08/11 21:27, David Naylor wrote:
> Hi,
>
> I needed to create a cache of date and time objects and I wondered what was the best way to handle the cache. For comparison I put together
> the following test:
>
[cut]
> Pypy displays significant slowdown in the defaultdict function, otherwise displays its usual speedup. To check what is the cause I replaced i.date()
> with i.day and found no major difference in times. It appears dict.setdefault (or it's interaction with jit) is causing a slow down.
I don't think that setdefault is the culprit here, as shown by this benchmark:
@bench.bench
def setdef():
d = {}
for i in range(10000000):
d.setdefault(i, i)
return d
@bench.bench
def tryexcept():
d = {}
for i in range(10000000):
try:
d[i]
except KeyError:
d[i] = i
return d
setdef()
tryexcept()
$ python dictbench.py
setdef: 2.03 seconds
tryexcept: 8.54 seconds
tmp $ pypy-c dictbench.py
setdef: 1.31 seconds
tryexcept: 1.37 seconds
as you can see, in PyPy there is almost no difference between using a
try/except or using setdefault.
What is very slow on PyPy seems to be hashing datetime objects:
import datetime
@bench.bench
def hashdate():
res = 0
for i in range(100000):
now = datetime.datetime.now()
res ^= hash(now)
return res
hashdate()
$ pypy-c dictbench.py
hashdate: 0.83 seconds
$ python dictbench.py
hashdate: 0.22 seconds
I had a quick look at the code (which in PyPy is written at applevel) and it
does a lot of nonsense. In particular, __hash__ calls __getstate which
formats a dynamically created string, just to call hash() on it. I suppose
that this code can (and should) be optimized a lot. I may try to look at it
but it's unclear when, since I'm about to go on vacation.
ciao,
Anto
More information about the pypy-dev
mailing list