
"CT" == Christian Tismer <tismer@tismer.com> writes:
CT> Summary: We had two effects here. Effect 1: Wasting time with CT> extra errors in instance creation. Effect 2: Loss of locality CT> due to code size increase. CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a CT> little renaming of the one or the other module, in order to get CT> the default link order to support locality better. CT> Now everything is clear to me. My first attempts with reordering CT> could not reveal the loss with the instance stuff. CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to CT> get related code ordered better. I reach a different conclusion. The performance difference 1.5.2 and 1.6, measured with pystone and pybench, is so small that effects like the order in which the compiler assembles the code make a difference. I don't think we should make any non-trivial effort to improve performance based on this kind of voodoo. I also question the claim that the two effects here explain the performance difference between 1.5.2 and 1.6. Rather, they explain the performance difference of pystone and pybench running on different versions of the interpreter. Saying that pystone is the same speed is a far cry from saying that python is the same speed! Remember that performance on a benchmark is just that. (It's like the old joke about a person's IQ: It is a very good indicator of how well they did on the IQ test.) I think we could use better benchmarks of two sorts. The pybench microbenchmarks are quite helpful individually, though the overall number isn't particularly meaningful. However, these benchmarks are sometimes a little too big to be useful. For example, the instance creation effect was tracked down by running this code: class Foo: pass for i in range(big_num): Foo() The pybench test "CreateInstance" does all sorts of other stuff. It tests creation with and without an __init__ method. It tests instance deallocation (because all the created objected need to be dealloced, too). It also tests attribute assignment, since many of the __init__ methods make assignments. What would be better (and I'm not sure what priority should be placed on doing it) is a set of nano-benchmarks that try to limit themselves to a single feature or small set of features. Guido suggested having a hierarchy so that there are multiple nano-benchmarks for instance creation, each identifying a particular effect, and a micro-benchmark that is the aggregate of all these nano-benchmarks. We could also use some better large benchmarks. Using pystone is pretty crude, because it doesn't necessarily measure the performance of things we care about. It would be better to have a collection of 5-10 apps that each do something we care about -- munging text files or XML data, creating lots of objects, etc. For example, I used the compiler package (in nondist/src/Compiler) to compile itself. Based on that benchmark, an interpreter built from the current CVS tree is still 9-11% slower than 1.5. Jeremy