[Python-Dev] Opcode cache in ceval loop

Tue Feb 2 14:23:10 EST 2016

On 2016-02-02 1:45 PM, Serhiy Storchaka wrote:
> On 02.02.16 19:45, Yury Selivanov wrote:
>> On 2016-02-02 12:41 PM, Serhiy Storchaka wrote:
>>> On 01.02.16 21:10, Yury Selivanov wrote:
>>>> To measure the max/average memory impact, I tuned my code to optimize
>>>> *every* code object on *first* run.  Then I ran the entire Python test
>>>> suite.  Python test suite + standard library both contain around 72395
>>>> code objects, which required 20Mb of memory for caches.  The test
>>>> process consumed around 400Mb of memory.  Thus, the absolute worst 
>>>> case
>>>> scenario, the overhead is about 5%.
>>>
>>> Test process consumes such much memory because few tests creates huge
>>> objects. If exclude these tests (note that tests that requires more
>>> than 1Gb are already excluded by default) and tests that creates a
>>> number of threads (threads consume much memory too), the rest of tests
>>> needs less than 100Mb of memory. Absolute required minimum is about
>>> 25Mb. Thus, the absolute worst case scenario, the overhead is about 
>>> 100%.
>> Can you give me the exact configuration of tests (command line to run)
>> that would only consume 25mb?
>
> I don't remember what exact tests consume the most of memory, but 
> following tests are failed when run with less than 30Mb of memory:
>
> test___all__ test_asynchat test_asyncio test_bz2 test_capi 
> test_concurrent_futures test_ctypes test_decimal test_descr 
> test_distutils test_docxmlrpc test_eintr test_email test_fork1 
> test_fstring test_ftplib test_functools test_gc test_gdb test_hashlib 
> test_httplib test_httpservers test_idle test_imaplib test_import 
> test_importlib test_io test_itertools test_json test_lib2to3 test_list 
> test_logging test_longexp test_lzma test_mmap 
> test_multiprocessing_fork test_multiprocessing_forkserver 
> test_multiprocessing_main_handling test_multiprocessing_spawn test_os 
> test_pickle test_poplib test_pydoc test_queue test_regrtest 
> test_resource test_robotparser test_shutil test_smtplib test_socket 
> test_sqlite test_ssl test_subprocess test_tarfile test_tcl test_thread 
> test_threaded_import test_threadedtempfile test_threading 
> test_threading_local test_threadsignals test_tix test_tk test_tools 
> test_ttk_guionly test_ttk_textonly test_tuple test_unicode 
> test_urllib2_localnet test_wait3 test_wait4 test_xmlrpc test_zipfile 
> test_zlib

Alright, I modified the code to optimize ALL code objects, and ran unit 
tests with the above tests excluded:

-- Max process mem (ru_maxrss)     = 131858432
-- Opcode cache number of objects  = 42109
-- Opcode cache total extra mem    = 10901106

And asyncio tests:

-- Max process mem (ru_maxrss)     = 57081856
-- Opcode cache number of objects  = 4656
-- Opcode cache total extra mem    = 1766681

So the absolute worst case for a small asyncio program is 3%, for unit 
tests (with the above list excluded) - 8%.

I think it'd be very hard to find a real-life program that consists of 
only code objects, and nothing else (no data to work with/process, no 
objects with dicts, no threads, basically nothing).  Because only for 
such a program you would have a 100% memory overhead for the bytecode 
cache (when all code objects are optimized).

FWIW, here are stats for asyncio with only hot objects being optimized:

-- Max process mem (ru_maxrss)     = 54775808
-- Opcode cache number of objects  = 121
-- Opcode cache total extra mem    = 43521

Yury