[pypy-dev] Poor performance with custom bytecode
Timothy Baldridge
tbaldridge at gmail.com
Fri Feb 17 14:03:22 CET 2012
Last night, I was finally able to get enough of core.clj implemented
to run a basic factorial program in clojure-py
(https://github.com/halgari/clojure-py). In this project we use
byteplay to generate bytecode for clojure routines, and run the entire
language off the python vm. In general, I've been very impressed with
the performance of pypy, but this factorial program takes about 2x
longer to complete than the same routine running on CPython. Perhaps
someone can give me some pointers? The core of the whole test is the
factorial function:
(ns clojure.examples.factorial)
(defn fact [x]
(loop [n x f 1]
(if (= n 1)
f
(recur (dec n) (* f n)))))
(defn test [times]
(loop [rem times]
(if (> rem 0)
(do (fact 20000)
(print rem)
(recur (dec rem))))))
(test 20)
It all seems to match normal python bytecode, with one exception: the
implementation of *, = and >. In clojure these functions can take 0 to
n arguments and perform different logic based on the results. Our
solution? to stuff the arguments into __argsv__ and then perform a
len() on this argument, and jump to different code blocks based on the
length of the input tuple. Now I know this code may not be super fast,
but it seems to work fine in CPython. Could this be the pain point for
us in pypy? Any thoughts/ideas?
clojure.examples.factorial=> (dis.dis *)
0 0 LOAD_FAST 0 (__argsv__)
3 LOAD_ATTR 0 (__len__)
6 CALL_FUNCTION 0
9 LOAD_CONST 1 (0)
12 COMPARE_OP 2 (==)
15 POP_JUMP_IF_FALSE 22
18 LOAD_CONST 2 (1)
21 RETURN_VALUE
>> 22 LOAD_FAST 0 (__argsv__)
25 LOAD_ATTR 0 (__len__)
28 CALL_FUNCTION 0
31 LOAD_CONST 2 (1)
34 COMPARE_OP 2 (==)
37 POP_JUMP_IF_FALSE 54
40 LOAD_FAST 0 (__argsv__)
43 LOAD_CONST 1 (0)
46 BINARY_SUBSCR
47 STORE_FAST 1 (x)
50 LOAD_FAST 1 (x)
53 RETURN_VALUE
>> 54 LOAD_FAST 0 (__argsv__)
57 LOAD_ATTR 0 (__len__)
60 CALL_FUNCTION 0
63 LOAD_CONST 3 (2)
66 COMPARE_OP 2 (==)
69 POP_JUMP_IF_FALSE 100
72 LOAD_FAST 0 (__argsv__)
75 LOAD_CONST 1 (0)
78 BINARY_SUBSCR
79 STORE_FAST 1 (x)
82 LOAD_FAST 0 (__argsv__)
85 LOAD_CONST 2 (1)
88 BINARY_SUBSCR
89 STORE_FAST 2 (y)
1010 92 LOAD_FAST 1 (x)
95 LOAD_FAST 2 (y)
98 BINARY_MULTIPLY
99 RETURN_VALUE
>> 100 LOAD_FAST 0 (__argsv__)
103 LOAD_ATTR 0 (__len__)
106 CALL_FUNCTION 0
109 LOAD_CONST 3 (2)
112 COMPARE_OP 5 (>=)
115 POP_JUMP_IF_FALSE 173
118 LOAD_FAST 0 (__argsv__)
121 LOAD_CONST 1 (0)
124 BINARY_SUBSCR
125 STORE_FAST 1 (x)
128 LOAD_FAST 0 (__argsv__)
131 LOAD_CONST 2 (1)
134 BINARY_SUBSCR
135 STORE_FAST 2 (y)
138 LOAD_FAST 0 (__argsv__)
141 LOAD_CONST 3 (2)
144 SLICE+1
145 STORE_FAST 3 (more)
1012 148 LOAD_GLOBAL 1 (reduce1)
151 LOAD_GLOBAL 2 (*)
154 LOAD_GLOBAL 2 (*)
157 LOAD_FAST 1 (x)
160 LOAD_FAST 2 (y)
163 CALL_FUNCTION 2
166 LOAD_FAST 3 (more)
169 CALL_FUNCTION 3
172 RETURN_VALUE
>> 173 LOAD_CONST 4 (<type 'exceptions.Exception'>)
176 CALL_FUNCTION 0
179 RAISE_VARARGS 1
None
Thanks,
Timothy Baldridge
--
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)
More information about the pypy-dev
mailing list