[Python-Dev] Investigating Python memory footprint of one real Web application
INADA Naoki
songofacandy at gmail.com
Fri Jan 20 07:15:42 EST 2017
>
> "this script counts static memory usage. It doesn’t care about dynamic
> memory usage of processing real request"
>
> You may be trying to optimize something which is only a very small
> fraction of your actual memory footprint. That said, the marshal
> module could certainly try to intern some tuples and other immutable
> structures.
>
Yes. I hadn't think static memory footprint is so important.
But Instagram tried to increase CoW efficiency of prefork application,
and got some success about memory usage and CPU throughput.
I surprised about it because prefork only shares static memory footprint.
Maybe, sharing some tuples which code object has may increase cache efficiency.
I'll try run pyperformance with the marshal patch.
>> * Most large strings are docstring. Is it worth enough that option
>> for trim docstrings, without disabling asserts?
>
> Perhaps docstrings may be compressed and then lazily decompressed when
> accessed for the first time. lz4 and zstd are good modern candidates
> for that. zstd also has a dictionary mode that helps for small data
> (*). See https://facebook.github.io/zstd/
>
> (*) Even a 200-bytes docstring can be compressed this way:
>
>>>> data = os.times.__doc__.encode()
>>>> len(data)
> 211
>>>> len(lz4.compress(data))
> 200
>>>> c = zstd.ZstdCompressor()
>>>> len(c.compress(data))
> 156
>>>> c = zstd.ZstdCompressor(dict_data=dict_data)
>>>> len(c.compress(data))
> 104
>
> `dict_data` here is some 16KB dictionary I've trained on some Python
> docstrings. That 16KB dictionary could be computed while building
> Python (or hand-generated from time to time, since it's unlikely to
> change a lot) and put in a static array somewhere:
>
Interesting. I noticed zstd is added to mercurial (current RC version).
But zstd (and brotli) are new project. I stay tuned about them.
>
> A similar strategy may be used for annotations and other
> rarely-accessed metadata.
>
> Another possibility, but probably much more costly in terms of initial
> development and maintenance, is to put the docstrings (+ annotations,
> etc.) in a separate file that's lazily read.
>
> I think optimizing the footprint for everyone is much better than
> adding command-line options to disable some specific metadata.
>
I see. Although -OO option exists, I can't strip only SQLAlchemy's docstrings.
I need to check all dependency libraries doesn't require __doc__ to use -OO
in production.
We have almost one year before 3.7beta1. We can find and implement better way.
> Regards
>
> Antoine.
>
More information about the Python-Dev
mailing list