Hi everyone, I'm in the middle of developing a fancy heap profiler for Python (for those times when tracemalloc isn't enough), and in the process, thought of a small change in the interpreter which could have a lot of uses. I call it "scope painting" because it lets you paint a label on an interpreter scope. *Conceptual TL;DR: *Add a function which attaches a string label to the current interpreter frame. (This would go away when you either explicitly cleared it, or the frame ended, i.e. the current function returned) You can then use it for: - *Debugging: *Print this value out, if set, in the traceback formatters. This can be used e.g. in a server, by putting a unique request identifier as the label for the top-level handler method; then if something in the particular query triggers a crash, it's *much* easier to see what caused it. (When I was at Google, we did something very similar to this for the servers in search. It completely transformed the way we detected "queries of death," and made hunting down data-dependent bugs an order of magnitude easier.) - *CPU and heap profiling: *If a profiler or tracer stores this information when it's grabbing the stack trace, this can be used for two different kinds of analysis, depending on user needs: - *Splitting: *Consider two stack traces different if they differ in any of the labels. This lets you separate flows which seem to be going through the same part of the logic, but are actually doing it on very distinct paths. - *Joining: *Group stack traces by the label; this lets you identify "call with this label value and its descendants," which lets you very easily establish a user-defined aggregation for total CPU or heap usage. - *Network request tracing: *If you were feeling really fancy, you could attach this to a cross-network-request trace ID and propagate the same value of this across multiple servers in a complicated request chain. Then you could perform all of the above joinings, plus similar joinings from cross-server profiling systems, as well. *Implementation: *Pretty simple: add a new field PyObject *f_label to PyFrameObject which contains an optional Unicode string; the frame owns a reference. Add get and set methods, exposing those in the Python API, and probably also a simple context manager to set and restore the value of this field because that's going to be the 90% way it's used. (Based on experience with doing similar things in C++) Modify the traceback, profile, cProfile, and tracemalloc libraries (and maybe others?) to use this value as well. What do people think? Yonatan
participants (3)
-
Andrew Barnert
-
Ronald Oussoren
-
Yonatan Zunger