[New-bugs-announce] [issue36044] PROFILE_TASK for PGO build is not a good workload

Tue Feb 19 15:39:50 EST 2019

New submission from Neil Schemenauer <nas-python at arctrix.com>:

I was doing some 'perf' profiling runs of Python.  I decided to try running PROFILE_TASK to see what the profile looks like.  I was surprised that gc_collect dominated runtime:

  Children      Self  Symbol
+   93.93%     6.00%  [.] _PyEval_EvalFrameDefault                
+   76.19%     0.12%  [.] function_code_fastcall                  
+   70.65%     0.31%  [.] _PyMethodDef_RawFastCallKeywords        
+   63.24%     0.13%  [.] _PyCFunction_FastCallKeywords           
+   58.67%     0.36%  [.] _PyEval_EvalCodeWithName                
+   57.45%    23.84%  [.] collect                                 
+   52.89%     0.00%  [.] gc_collect                              
+   52.10%     0.08%  [.] _PyFunction_FastCallDict                
+   41.99%     0.02%  [.] _PyObject_Call_Prepend                  
+   36.37%     0.18%  [.] _PyFunction_FastCallKeywords            
+   20.94%     0.07%  [.] _PyObject_FastCallDict                  
+   19.64%     0.00%  [.] PyObject_Call                           
+   17.74%     0.11%  [.] _PyObject_FastCallKeywords              
+   12.45%     0.00%  [.] slot_tp_call                            
+   12.27%     4.05%  [.] dict_traverse                           
+   11.45%    11.04%  [.] visit_reachable                         
+   11.18%    10.76%  [.] visit_decref                            
+    9.65%     0.11%  [.] type_call                               
+    8.80%     0.83%  [.] func_traverse                           
+    7.78%     0.08%  [.] _PyMethodDescr_FastCallKeywords     

Part of the problem is that we run full cyclic GC for every test.  I.e. cleanup_test_droppings() calls gc.collect().  Maybe we could make these calls conditional on the --pgo flag of regtest.  Or, maybe we need to re-evaluate if running the unit test suite is the best way to generate PGO trace data.

Based on a tiny bit of further investigation, it looks like gc_collect() is getting called quite a lot of times, in addition to cleanup_test_droppings().  Maybe there is some low-hanging fruit here for optimization.  Full GC is pretty expensive.

----------
messages: 336018
nosy: nascheme
priority: normal
severity: normal
status: open
title: PROFILE_TASK for PGO build is not a good workload
type: performance

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36044>
_______________________________________