[Python-Dev] Investigating Python memory footprint of one real Web application

INADA Naoki songofacandy at gmail.com
Fri Jan 20 07:15:42 EST 2017


>
> "this script counts static memory usage. It doesn’t care about dynamic
> memory usage of processing real request"
>
> You may be trying to optimize something which is only a very small
> fraction of your actual memory footprint.  That said, the marshal
> module could certainly try to intern some tuples and other immutable
> structures.
>

Yes.  I hadn't think static memory footprint is so important.

But Instagram tried to increase CoW efficiency of prefork application,
and got some success about memory usage and CPU throughput.
I surprised about it because prefork only shares static memory footprint.

Maybe, sharing some tuples which code object has may increase cache efficiency.
I'll try run pyperformance with the marshal patch.


>> * Most large strings are docstring.  Is it worth enough that option
>> for trim docstrings, without disabling asserts?
>
> Perhaps docstrings may be compressed and then lazily decompressed when
> accessed for the first time.  lz4 and zstd are good modern candidates
> for that.  zstd also has a dictionary mode that helps for small data
> (*).  See https://facebook.github.io/zstd/
>
> (*) Even a 200-bytes docstring can be compressed this way:
>
>>>> data = os.times.__doc__.encode()
>>>> len(data)
> 211
>>>> len(lz4.compress(data))
> 200
>>>> c = zstd.ZstdCompressor()
>>>> len(c.compress(data))
> 156
>>>> c = zstd.ZstdCompressor(dict_data=dict_data)
>>>> len(c.compress(data))
> 104
>
> `dict_data` here is some 16KB dictionary I've trained on some Python
> docstrings.  That 16KB dictionary could be computed while building
> Python (or hand-generated from time to time, since it's unlikely to
> change a lot) and put in a static array somewhere:
>

Interesting.  I noticed zstd is added to mercurial (current RC version).
But zstd (and brotli) are new project.  I stay tuned about them.

>
> A similar strategy may be used for annotations and other
> rarely-accessed metadata.
>
> Another possibility, but probably much more costly in terms of initial
> development and maintenance, is to put the docstrings (+ annotations,
> etc.) in a separate file that's lazily read.
>
> I think optimizing the footprint for everyone is much better than
> adding command-line options to disable some specific metadata.
>

I see.  Although -OO option exists, I can't strip only SQLAlchemy's docstrings.
I need to check all dependency libraries doesn't require __doc__ to use -OO
in production.

We have almost one year before 3.7beta1.  We can find and implement better way.

> Regards
>
> Antoine.
>


More information about the Python-Dev mailing list