[pypy-svn] r58057 - pypy/build/doc

hpk at codespeak.net hpk at codespeak.net
Thu Sep 11 10:32:06 CEST 2008

Author: hpk
Date: Thu Sep 11 10:32:06 2008
New Revision: 58057

adding much more info, explanations, links, handling one XXX 

Modified: pypy/build/doc/benchmark_memory.txt
--- pypy/build/doc/benchmark_memory.txt	(original)
+++ pypy/build/doc/benchmark_memory.txt	Thu Sep 11 10:32:06 2008
@@ -1,5 +1,5 @@
-XXX draft doc 
+XXX draft doc, needs discussion, review
 what we want to measure
@@ -10,7 +10,8 @@
 * measure CPU usage of benchmark
-* measure one and multiple interpreter instances at a given time
+* measure for 1/2/N interpreters running at the same time
+  and maybe also for forked processes. 
 Benchmark targets
@@ -37,12 +38,11 @@
 * CPython 2.5  
-* pypy-c --opt=3
 * pypy-c --opt=mem
-XXX instead of trying various things, we need to make opt=mem good. opt=3 is
-    good for comparison
-* pypy-c various options/GCs 
+* pypy-c --opt=3 (for comparison purposes)
+We want to select and optimize good underlying settings for 
+PyPy's choices regarding "--opt=mem". 
 notes about memory-saving GCs
@@ -59,35 +59,108 @@
-measuring memory on Linux
+Understanding linux /proc/pid/smaps info
-Unless we can use some external tool we
-can parse info located in /proc/PID/smaps
-(beginning from linux 2.6.16) and contains
-Size, Rss, Shared and Private. 
-Note that all addresses are virtual, which means
-that having the same address in two processes 
-doesn't mean it's the same memory.
-Size: total (virtual) size of memory, possibly irrelevant
-RSS (Resident Set Size): indicates physically used RAM 
-Shared: memory shared with other processes. Note that this is simply a counter
-how many processes reference it. Memory can move private -> shared in case
-some other process will load the same library or so.
-Private: private memory owned by a process.
+The most detailed info apparently is provided 
+by /proc/PID/smaps, starting from linux 2.6.14. 
+Here is an example output of running
+"python2.5" on a linux 2.6.24 ubuntu machine::
+XXX please review, correct, complete so we get to a shared good understanding
+08048000-08140000 r-xp 00000000 08:01 89921      /usr/bin/python2.5
+Size:                992 kB
+Rss:                 768 kB
+Shared_Clean:        764 kB
+Shared_Dirty:          0 kB
+Private_Clean:         4 kB
+Private_Dirty:         0 kB
+Referenced:          768 kB
+The first line indicates that the /usr/bin/python2.5 file is 
+mapped as Read/eXecute into the given process and is 
+seen at address 08048000 by the process. 
+Virtual memory size is 992kB, of which 768 kB are actually 
+mapped into RAM (Rss = Resident Set Size) - the rest of the
+file has not been accessed yet and is thus not mapped.  
+764 kB are shared (Shared_Clean) - so if there are other python processes they 
+will get their mapping but no additional RAM will be
+used for these 764 kBs.  "clean" means that these pages can easily 
+get swapped out by dropping them and - upon access - retrieving them
+from the file.  XXX what does Private_Clean / 4kB mean exactly here? 
+Let's look at a mapping that is more indicative 
+of the per-process "incremental" RAM usage::
+08165000-081e0000 rw-p 08165000 00:00 0          [heap]
+Size:                492 kB
+Rss:                 452 kB
+Shared_Clean:          0 kB
+Shared_Dirty:          0 kB
+Private_Clean:         0 kB
+Private_Dirty:       452 kB
+Referenced:          452 kB
+Here we have a readwrite anonymous mapping, objects
+allocated on the heap. It uses 492 kB virtual address
+space of which 452 kB are actually mapped in physical 
+RAM.  Dirty means that these pages have been modified. 
+"Dirty" or "clean" is important info for Swapping 
+but not too relevant for us regarding measuring 
+memory footprint.  
+Of coures, there are many more mappings, also for
+the stack.  Doing:: 
+    >>> l = ["xasd"] * 1000000"
+i get this additional mapping::
+b7890000-b7c61000 rw-p b7890000 00:00 0
+Size:               3908 kB
+Rss:                3908 kB
+Shared_Clean:          0 kB
+Shared_Dirty:          0 kB
+Private_Clean:         0 kB
+Private_Dirty:      3908 kB
+Referenced:         3908 kB
+We have an anonymous readwrite mapping and the 3908 KBs
+for the list and strings are mapped into physical ram. 
+For some more information here is a link 
+which also points to the mem_usage.py tool that presents 
+a process mappings in a somewhat nicer summarized format. 
+Tool to measure python interpreter  mem foot print
+We need a tool that can invoke python apps and benchmarks 
+and measure memory foot print - producing data that can
+be parsed back and used for producing graphs, tables etc. 
+See also tools for maemo: 
+notes on scenarios from fijal which need to be checked if they
+are sufficiently reflected in the above sections. 
-Probable measurment scenarios:
-1. Measure private memory owned by process when no other interpreter's
+1. Measure private memory when no other interpreter's
    process exist.
-2. The same in case there is other one.
+2. Measure private memory with another interprerter running at the same time. 
 3. Measure amount of private memory owned by process that was forked from
    other interpreter process.
 4. Measure total RAM usage/RSS size in case process is running vs
    total RAM usage when process is not running.

More information about the Pypy-commit mailing list