[New-bugs-announce] [issue44222] Improving _removeHandlerRef for a very long _handlerList

Yonatan Goldschmidt report at bugs.python.org
Sun May 23 21:12:25 EDT 2021


New submission from Yonatan Goldschmidt <yon.goldschmidt at gmail.com>:

We have an application which creates 10,000s of logging.Logger & logging.Handler objects. I encountered a major slowdown during GCs which happen to collect dead Handler objects, calling logging._removeHandlerRef() along the way.
That function looks like:

    def _removeHandlerRef(wr):
        acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
        if acquire and release and handlers:
            acquire()
            try:
                if wr in handlers:
                    handlers.remove(wr)
            finally:
                release()

and is called by a weakref, attached to each Handler created in _addHandlerRef().

The slowdown occurs in the "in" operator, and the remove() call. These operations on very long lists are expensive. My suggestion is that, without modifying the observed behavior of this function, it can be changed to perform only one lookup in the list, by simply calling remove() and ignoring the ValueError if it gets raised.

Attached is a small script which demonstrates the issue. It creates N1 Handler objects, then deletes the strong reference to the last N2 handler objects, and triggers a GC, measuring the time it takes. Additionally, we can measure its entire execution (since all remaining Handler objects are cleaned up on exit, and eventually _removeHandlerRef() is called for each, and it drastically slows down the interpreter shutdown).

Running the demo with N1=30k and N2=2k:

    root at 8aafe9d1308b:/# python -V
    Python 3.9.2
    root at 8aafe9d1308b:/# time python demo.py 30000 2000
    len(logging._handlerList)=30001
    len(logging._handlerList)=28001
    gc.collect() took 1.552838139992673s

    real    0m12.626s
    user    0m12.618s
    sys 0m0.008s

then applying the improving patch...

    root at 8aafe9d1308b:/# (cd /usr/local/lib/python3.9/ && patch -p2 < /patch )
    patching file logging/__init__.py
    Hunk #1 succeeded at 826 (offset -19 lines).

and trying again:

    root at 8aafe9d1308b:/# time python demo.py 30000 2000
    len(logging._handlerList)=30001
    len(logging._handlerList)=28001
    gc.collect() took 0.8691686679958366s

    real    0m7.134s
    user    0m7.130s
    sys 0m0.004s
    root at 8aafe9d1308b:/#

Almost a 2x speedup.

I also tried removing the acquire/release locks (now that the remove operation is atomic...). I'm not sure it maintains the semantics, and it didn't make too much of a positive effect on the run-time anyway, so I didn't continue looking into it.

----------
components: Library (Lib)
files: demo.py
messages: 394229
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: Improving _removeHandlerRef for a very long _handlerList
type: performance
versions: Python 3.11
Added file: https://bugs.python.org/file50060/demo.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44222>
_______________________________________


More information about the New-bugs-announce mailing list