![](https://secure.gravatar.com/avatar/7f37d34f3bb0e91890c01450f8321524.jpg?s=120&d=mm&r=g)
On Wed, Feb 2, 2022 at 2:48 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
I'm planning on moving us to a simpler, more efficient alternative to _Py_IDENTIFIER(), but want to see if there are any objections first before moving ahead. Also see https://bugs.python.org/issue46541.
_Py_IDENTIFIER() was added in 2011 to replace several internal string object caches and to support cleaning up the cached objects during finalization. A number of "private" functions (each with a _Py_Identifier param) were added at that time, mostly corresponding to existing functions that take PyObject* or char*. Note that at present there are several hundred uses of _Py_IDENTIFIER(), including a number of duplicates.
My plan is to replace our use of _Py_IDENTIFIER() with statically initialized string objects (as fields under _PyRuntimeState). That involves the following:
* add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for each string that currently uses _Py_IDENTIFIER() (or _Py_static_string()) * statically initialize each object as part of the initializer for _PyRuntimeState * add a macro to look up a given global string * update each location that currently uses _Py_IDENTIFIER() to use the new macro instead
Pros:
* reduces indirection (and extra calls) for C-API functions that need the strings (making the code a little easier to understand and speeding it up) * the objects are referenced from a fixed address in the static data section instead of the heap (speeding things up and allowing the C compiler to optimize better) * there is no lazy allocation (or lookup, etc.) so there are fewer possible failures when the objects get used (thus less error return checking) * saves memory (at little, at least) * if needed, the approach for per-interpreter is simpler * helps us get rid of several hundred static variables throughout the code base * allows us to get rid of _Py_IDENTIFIER() and a bunch of related C-API functions * "deep frozen" modules can use the global strings * commonly-used strings could be pre-allocated by adding _PyRuntimeState fields for them
Cons:
* a little less convenient: adding a global string requires modifying a separate file from the one where you actually want to use the string * strings can get "orphaned" (I'm planning on checking in CI) * some strings may never get used for any given ./python invocation (not that big a difference though)
I have a PR up (https://github.com/python/cpython/pull/30928) that adds the global strings and replaces use of _Py_IDENTIFIER() in our code base, except for in non-builtin stdlib extension modules. (Those will be handled separately if we proceed.) The PR also adds a CI check for "orphaned" strings. It leaves _Py_IDENTIFIER() for now, but disallows any Py_BUILD_CORE code from using it.
With that change I'm seeing a 1% improvement in performance (see https://github.com/faster-cpython/ideas/issues/230).
I'd also like to actually get rid of _Py_IDENTIFIER(), along with other related API including ~14 (private) C-API functions. Dropping all that helps reduce maintenance costs. However, at least one PyPI project (blender) is using _Py_IDENTIFIER(). So, before we could get rid of it, we'd first have to deal with that project (and any others).
datapoint: an internal code search turns up blender, multidict, and typed_ast as open source users of _Py_IDENTIFIER . Easy to clean up as PRs. There are a couple of internal uses as well, all of which are similarly easy to address and are only in code that is expected to need API cleanup tweaks between CPython versions. Overall I think addressing the broader strategy question among the performance focused folks is worthwhile though. -gps
To sum up, I wanted to see if there are any objections before I start merging anything. Thanks!
-eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DNMZAMB4... Code of Conduct: http://python.org/psf/codeofconduct/