Just a few points and then I will get off python-dev. :) First of all, I don't think it is very meaningful to use leaking applications to do timing comparisons. The collector has be quite careful when freeing structures containing reference cycles. However, using something other than pystone is definitely a good idea. Here the pybench results for the latest patch: PYBENCH 0.6 Benchmark: b_nogc (rounds=4, warp=30) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 286.25 ms 3.37 us -4.22% BuiltinMethodLookup: 367.50 ms 1.05 us -4.55% ConcatStrings: 368.37 ms 3.68 us +42.50% CreateInstances: 477.87 ms 17.07 us -4.45% CreateStringsWithConcat: 322.12 ms 2.42 us +10.27% DictCreation: 382.75 ms 3.83 us +1.29% ForLoops: 536.88 ms 80.73 us +0.99% IfThenElse: 432.75 ms 0.96 us -3.21% ListSlicing: 236.87 ms 104.12 us +11.73% NestedForLoops: 320.00 ms 1.28 us +0.16% NormalClassAttribute: 386.50 ms 0.97 us +0.00% PythonFunctionCalls: 477.87 ms 4.34 us -3.51% PythonMethodCalls: 379.12 ms 7.59 us -12.67% Recursion: 280.75 ms 33.70 us -0.88% SecondImport: 206.62 ms 12.41 us -5.76% SecondPackageImport: 216.75 ms 13.02 us -4.41% SecondSubmoduleImport: 279.25 ms 16.77 us -2.57% SimpleComplexArithmetic: 351.00 ms 2.39 us +0.72% SimpleDictManipulation: 320.75 ms 1.60 us -2.99% SimpleFloatArithmetic: 361.50 ms 0.99 us -20.37% SimpleIntFloatArithmetic: 336.00 ms 0.76 us +0.04% SimpleIntegerArithmetic: 328.38 ms 0.75 us -0.76% SimpleListManipulation: 312.88 ms 1.74 us -1.61% SimpleLongArithmetic: 308.75 ms 2.81 us +9.88% SmallLists: 470.13 ms 2.77 us -5.05% SmallTuples: 374.62 ms 2.34 us -15.74% SpecialClassAttribute: 384.00 ms 0.96 us -1.88% SpecialInstanceAttribute: 446.38 ms 1.12 us -2.75% StringSlicing: 315.50 ms 2.70 us +16.58% TryExcept: 585.37 ms 0.59 us -1.70% TryRaiseExcept: 312.75 ms 31.28 us -5.30% TupleSlicing: 299.38 ms 4.39 us +12.18% ------------------------------------------------------------------------ Average round time: 13615.00 ms -1.13% My AMD-K6-II processor is a pretty quirky beast so I don't think you can conclude too much for those results. Here are the median timings from running Jeremy's compiler on its own source: $ time python compile.py `find . -name '*.py'` Python 1.6 without GC: real 0m16.926s user 0m16.810s sys 0m0.110s Python 1.6 with GC: real 0m21.593s user 0m21.470s sys 0m0.080s Python 1.6 with GC, collection disabled (ie. gc.set_threshold(0)): real 0m18.441s user 0m18.220s sys 0m0.220s We can tune the collection frequency all we want but we won't do any better than the last numbers. Those numbers reflect the cost of keeping track of the objects and the increase in object size. On a related note, I would like to cleanup the PyGC_{NEW, VAR_NEW} macros but I can't figure out a way to transform this code into a macro: op = PyObject_MALLOC(sizeof(PyGCInfo) + _PyObject_SIZE(tp)); if (op) op = PyGC_OBJ((PyGCInfo *)op); If C's || operator was like the Python or I could do something like: #define PyGC_OBJ_SAFE(g) ((PyGCInfo *)(((g) || -1) + 1)) Any ideas? Using an inline function in the header file would be nice but of course it is not portable. GCC has statement expressions but again they are not portable. Neil
Neil Schemenauer wrote:
On a related note, I would like to cleanup the PyGC_{NEW, VAR_NEW} macros but I can't figure out a way to transform this code into a macro:
op = PyObject_MALLOC(sizeof(PyGCInfo) + _PyObject_SIZE(tp)); if (op) op = PyGC_OBJ((PyGCInfo *)op);
The correct thing to do, IMO, is to reflect the sizeof(PyGCInfo) mem increment in the tp_basicsize slot of the type object. Thus, we can reuse the current PyObject_New/NEW code, which boils down to PyObject_MALLOC(_PyObject_SIZE(tp)). There is no need for PyGC_NEW/NEW_VAR. This would imply some additional changes to PyObject_Init when the GC_FLAG is set for a given typeobj. I had a closer look at the patch today. It looks good, but I think it needs some more work for a smooth integration with the existing APIs. There's room for optimisations (there are lots of function calls in there, so I am not surprised about the performance hit), but they'll come later on, once the big chunks of the patch fit in their places. One thing that bothered me is that the current gc object allocation code won't fly with C++, where the object storage is allocated (and its size is computed) automatically. There is no PyObject_New, and a C++ object constructor is supposed to begin with PyObject_Init. Another thing is the function names: PyGC_NEW, PyGC_Info, PyGC_Insert, etc. It would be better if they reflect the fact that we're talking about GC on PyObjects, and not about GC in general (on arbitrary mem chunks). I would suggest renaming them along these lines: PyGC_Info -> PyGC_ObjectHead PyGC_Insert -> PyGC_ObjectInit PyGC_Remove -> PyGC_ObjectFini ... All this needs some more thought though... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
participants (2)
-
Neil Schemenauer
-
Vladimir.Marangozov@inrialpes.fr