[Python-Dev] Have a big machine and spare time? Here's a possible Python bug.
solipsis at pitrou.net
Sun May 26 05:24:15 EDT 2019
On Fri, 24 May 2019 14:23:21 +0200
Thomas Wouters <thomas at python.org> wrote:
> On Thu, May 23, 2019 at 5:15 PM Steve Dower <steve.dower at python.org> wrote:
> > On 23May2019 0542, Inada Naoki wrote:
> > > 1. perf shows 95% of CPU time is eaten by _PyObject_Free, not kernel
> > space.
> > > 2. This loop is cleary hot:
> > >
> > https://github.com/python/cpython/blob/51aa35e9e17eef60d04add9619fe2a7eb938358c/Objects/obmalloc.c#L1816-L1819
> > >
> > > I can attach the process by gdb and I confirmed many arenas have
> > > same nfreepools.
> > It's relatively easy to test replacing our custom allocators with the
> > system ones, yes? Can we try those to see whether they have the same
> > characteristic?
> > Given the relative amount of investment over the last 19 years , I
> > wouldn't be surprised if most system ones are at least as good for our
> > needs now. Certainly Windows HeapAlloc has had serious improvements in
> > that time to help with fragmentation and small allocations.
> FYI, and I've mentioned this at PyCon to a few people (might've been you,
> Steve, I don't remember) -- but at Google we've experimented with disabling
> obmalloc when using TCMalloc (a faster and thread-aware malloc, which makes
> a huge difference within Google when dealing with multi-threaded C++
> libraries), both using the usual Python benchmarks and real-world code with
> real-world workloads (a core part of YouTube, for example), all on Linux.
> There's still a significant benefit to using obmalloc when using glibc's
> malloc, and also a noticeable benefit when using TCMalloc. There are
> certainly cases where it doesn't matter much, and there may even be cases
> where the overhead of obmalloc isn't worth it, but I do believe it's still
> a firm net benefit.
Interesting that a 20-year simple allocator (obmalloc) is able to do
better than the sophisticated TCMalloc.
(well, of course, obmalloc doesn't have to worry about concurrent
scenarios, which explains some of the simplicity)
More information about the Python-Dev