[Python-Dev] Investigating Python memory footprint of one real Web application

Nathaniel Smith njs at pobox.com
Tue Jan 24 13:21:45 EST 2017


On Jan 24, 2017 3:35 AM, "Thomas Wouters" <thomas at python.org> wrote:



On Fri, Jan 20, 2017 at 1:40 PM, Christian Heimes <christian at python.org>
wrote:

> On 2017-01-20 13:15, INADA Naoki wrote:
> >>
> >> "this script counts static memory usage. It doesn’t care about dynamic
> >> memory usage of processing real request"
> >>
> >> You may be trying to optimize something which is only a very small
> >> fraction of your actual memory footprint.  That said, the marshal
> >> module could certainly try to intern some tuples and other immutable
> >> structures.
> >>
> >
> > Yes.  I hadn't think static memory footprint is so important.
> >
> > But Instagram tried to increase CoW efficiency of prefork application,
> > and got some success about memory usage and CPU throughput.
> > I surprised about it because prefork only shares static memory footprint.
> >
> > Maybe, sharing some tuples which code object has may increase cache
> efficiency.
> > I'll try run pyperformance with the marshal patch.
>
> IIRC Thomas Wouters (?) has been working on a patch to move the ref
> counter out of the PyObject struct and into a dedicated memory area. He
> proposed the idea to improve cache affinity, reduce cache evictions and
> to make CoW more efficient. Especially modern ccNUMA machines with
> multiple processors could benefit from the improvement, but also single
> processor/multi core machines.
>

FWIW, I have a working patch for that (against trunk a few months back,
even though the original idea was for the gilectomy branch), moving just
the refcount and not PyGC_HEAD. Performance-wise, in the benchmarks it's a
small but consistent loss (2-5% on a noisy machine, as measured by
python-benchmarks, not perf), and it breaks the ABI as well as any code
that dereferences PyObject.ob_refcnt directly (the field was repurposed and
renamed, and exposed as a const* to avoid direct assignment). It also
exposes the API awkwardness that CPython doesn't *require* objects to go
through a specific mechanism for object initialisation, even though nearly
all extension modules do so. (That same API awkwardness made life a little
harder when experimenting with BDW GC :P.) I don't believe external
refcounts can be made the default without careful redesigning of a new set
of PyObject API calls and deprecation of the old ones.


The thing I found most surprising about that blog post was that contrary to
common wisdom, refcnt updates per se had essentially no effect on the
amount of memory shared between CoW processes, and the problems were all
due to the cycle collector. (Though I guess it's still possible that part
of the problems caused by the cycle collector are due to it touching
ob_refcnt.)

It's promising too though, because the GC metadata is much less exposed to
extension modules than PyObject_HEAD is, and the access patterns are
presumably (?) much more bursty. It'd be really interesting to see how
things performed if packing just PyGC_HEAD but *not* ob_refcnt into a
dedicated region.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170124/0c0baf57/attachment.html>


More information about the Python-Dev mailing list