On Wed, Feb 2, 2022 at 2:48 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
I'm planning on moving us to a simpler, more efficient alternative to
_Py_IDENTIFIER(), but want to see if there are any objections first
before moving ahead.  Also see https://bugs.python.org/issue46541.

_Py_IDENTIFIER() was added in 2011 to replace several internal string
object caches and to support cleaning up the cached objects during
finalization.  A number of "private" functions (each with a
_Py_Identifier param) were added at that time, mostly corresponding to
existing functions that take PyObject* or char*.  Note that at present
there are several hundred uses of _Py_IDENTIFIER(), including a number
of duplicates.

My plan is to replace our use of _Py_IDENTIFIER() with statically
initialized string objects (as fields under _PyRuntimeState).  That
involves the following:

* add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for
each string that currently uses _Py_IDENTIFIER() (or
_Py_static_string())
* statically initialize each object as part of the initializer for
_PyRuntimeState
* add a macro to look up a given global string
* update each location that currently uses _Py_IDENTIFIER() to use the
new macro instead

Pros:

* reduces indirection (and extra calls) for C-API functions that need
the strings (making the code a little easier to understand and
speeding it up)
* the objects are referenced from a fixed address in the static data
section instead of the heap (speeding things up and allowing the C
compiler to optimize better)
* there is no lazy allocation (or lookup, etc.) so there are fewer
possible failures when the objects get used (thus less error return
checking)
* saves memory (at little, at least)
* if needed, the approach for per-interpreter is simpler
* helps us get rid of several hundred static variables throughout the code base
* allows us to get rid of _Py_IDENTIFIER() and a bunch of related
C-API functions
* "deep frozen" modules can use the global strings
* commonly-used strings could be pre-allocated by adding
_PyRuntimeState fields for them

Cons:

* a little less convenient: adding a global string requires modifying
a separate file from the one where you actually want to use the string
* strings can get "orphaned" (I'm planning on checking in CI)
* some strings may never get used for any given ./python invocation
(not that big a difference though)

I have a PR up (https://github.com/python/cpython/pull/30928) that
adds the global strings and replaces use of _Py_IDENTIFIER() in our
code base, except for in non-builtin stdlib extension modules.  (Those
will be handled separately if we proceed.)  The PR also adds a CI
check for "orphaned" strings.  It leaves _Py_IDENTIFIER() for now, but
disallows any Py_BUILD_CORE code from using it.

With that change I'm seeing a 1% improvement in performance (see
https://github.com/faster-cpython/ideas/issues/230).

I'd also like to actually get rid of _Py_IDENTIFIER(), along with
other related API including ~14 (private) C-API functions.  Dropping
all that helps reduce maintenance costs.  However, at least one PyPI
project (blender) is using _Py_IDENTIFIER().  So, before we could get
rid of it, we'd first have to deal with that project (and any others).

datapoint: an internal code search turns up blender, multidict, and typed_ast as open source users of _Py_IDENTIFIER .  Easy to clean up as PRs.  There are a couple of internal uses as well, all of which are similarly easy to address and are only in code that is expected to need API cleanup tweaks between CPython versions.

Overall I think addressing the broader strategy question among the performance focused folks is worthwhile though.

-gps
 

To sum up, I wanted to see if there are any objections before I start
merging anything.  Thanks!

-eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DNMZAMB4M6RVR76RDZMUK2WRLI6KAAYS/
Code of Conduct: http://python.org/psf/codeofconduct/