[pypy-commit] extradoc extradoc: remove rss comparison and try with some page privatisation statistics

Tue May 27 14:34:24 CEST 2014

Author: Remi Meier <remi.meier at inf.ethz.ch>
Branch: extradoc
Changeset: r5274:477a6b8da6f8
Date: 2014-05-27 14:35 +0200
http://bitbucket.org/pypy/extradoc/changeset/477a6b8da6f8/

Log:	remove rss comparison and try with some page privatisation
	statistics

diff --git a/talk/dls2014/paper/paper.tex b/talk/dls2014/paper/paper.tex
--- a/talk/dls2014/paper/paper.tex
+++ b/talk/dls2014/paper/paper.tex
@@ -1009,36 +1009,27 @@
 need the total amount of memory required by old objects multiplied
 by $N+1$ (incl. the sharing segment). Pages get re-shared during
 major collections if possible.
-\remi{maybe collect some statistics about pages privatised per segment}
 
-\remi{The following discussion about richards mem usage does not
-say that much... Also, RSS is not a good measure but it's hard to
-get something better.}
-In figure \ref{fig:richards_mem} we look at the memory usage of
-one of our benchmarks called Richards\footnote{OS kernel simulation
-benchmark}. The \emph{Resident Set Size} (RSS) shows the physical memory
-assigned to the process. From it, we see that the process' memory
-usage does not explode during the benchmark but actually stays pretty
-much the same after start-up. Since it is the job of the OS to map
-physical memory, this RSS number should be seen as a maximum. It is
-possible that some of the memory is not required any more but still
-assigned to our process.
-
-The \emph{GC managed memory} counts all memory used in the old object
-space including the memory required for private pages. The sharp drops
-in memory usage come from major collections that free old objects and
-re-share pages. Again the overall memory usage stays the same and
-we see that in this benchmark we have around 1 major collection every
-second.
+In figure \ref{fig:richards_mem} we look at the memory usage of one of
+our benchmarks called Richards\footnote{OS kernel simulation
+benchmark}. The \emph{GC managed memory} counts all memory used in the
+old object space including the memory required for private pages. The
+sharp drops in memory usage come from major collections that free old
+objects and re-share pages. The average memory usage stays around
+29~MiB and we see that in this benchmark we have around 1 major
+collection every second. The \emph{page privatisation}, which
+represents the percentage of used pages with at least one private
+copy, shows the same spikes as the memory usage. These come directly from
+re-sharing the pages. The maximum page privatisation is around $20\%$
+between major collections. Thus we can say that $~20\%$ of the old
+objects get modified between collections in this benchmark.
 
 For PyPy-STM the average memory requirement is 29~MiB and there are
 $\sim 11$ major collections during the runtime. Normal PyPy with a GIL
 grows its memory up to just 7~MiB and does not do a single major
-collection in that time.
-
-We are missing a memory optimisation to store small objects in a more
-compact way, which is done by a normal PyPy not using STM.
-Additionally, since normal PyPy uses a GIL, it does not need to
+collection in that time. Compared to normal PyPy, we are missing a
+memory optimisation to store small objects in a more compact
+way. Additionally, since normal PyPy uses a GIL, it does not need to
 duplicate any data structures like e.g. the Nursery for each
 thread. This, the missing optimisation, and the additional memory
 requirements for STM explained above account for this difference.
@@ -1048,13 +1039,14 @@
 \begin{figure}[h]
   \centering
   \includegraphics[width=1\columnwidth]{plots/richards_mem.pdf}
-  \caption{Actual memory managed by the GC and resident set size
+  \caption{Actual memory managed by the GC and the page privatisation
     over time in Richards benchmark\label{fig:richards_mem}}
 \end{figure}
 
 
 \subsection{Overhead Breakdown}
 
+\remi{do it on a non-jit build (see reason above)}
 \remi{gs:segment prefix overhead is virtually none (maybe instruction cache)}
 \remi{update numbers in pypy/TODO}
 
diff --git a/talk/dls2014/paper/plots/plot_richards_mem.py b/talk/dls2014/paper/plots/plot_richards_mem.py
--- a/talk/dls2014/paper/plots/plot_richards_mem.py
+++ b/talk/dls2014/paper/plots/plot_richards_mem.py
@@ -32,21 +32,26 @@
         if not first_time:
             first_time = float(time)
         xs.append(float(time) - first_time)
-        real_mem, max_rss = mems.split("/")
+        real_mem, max_rss, page_util = mems.split("/")
         y1s.append(int(real_mem) / 1024. / 1024)
+        y2s.append(float(page_util) * 100)
 
-x2s = range(12)
-y2s = [152304, 180060, 180428,
-       180448, 180460, 180696,
-       180124, 180552, 180584,
-       180588, 180544, 180252]
-y2s = map(lambda x: x / 1024., y2s)
+# RSS:
+# x2s = range(12)
+# y2s = [152304, 180060, 180428,
+#        180448, 180460, 180696,
+#        180124, 180552, 180584,
+#        180588, 180544, 180252]
+# y2s = map(lambda x: x / 1024., y2s)
 
 
-def plot_mems(ax):
-    ax.plot(xs, y1s, '-o', label="GC managed memory",
-            ms=2)
-    ax.plot(x2s, y2s, '-x', label="Resident Set Size (RSS)")
+def plot_mems(ax, ax2):
+    print sum(y1s) / len(xs)
+    print sum(y2s) / len(xs)
+    a, = ax.plot(xs, y1s, 'b-')
+    b, = ax2.plot(xs, y2s, 'r-')
+    return ax.legend((a, b),
+                     ('GC managed memory', 'Page privatisation'))
 
 
 def main():
@@ -57,12 +62,17 @@
 
     ax = fig.add_subplot(111)
 
-    plot_mems(ax)
-
-    ax.set_ylabel("Memory [MiB]")
+    ax.set_ylabel("Memory [MiB]", color='b')
     ax.set_xlabel("Runtime [s]")
     ax.set_xlim(-0.5, 11.5)
-    ax.set_ylim(0, 200)
+    ax.set_ylim(0, 50)
+
+    ax2 = ax.twinx()
+    ax2.set_ylim(0, 100)
+    ax2.set_ylabel("\% of pages with $>1$ private copy",
+                   color='r')
+    legend = plot_mems(ax, ax2)
+
 
     #axs[0].set_ylim(0, len(x))
     #ax.set_yticks([r+0.5 for r in range(len(logs))])
@@ -74,7 +84,6 @@
     # major_formatter = matplotlib.ticker.FuncFormatter(label_format)
     # axs[0].xaxis.set_major_formatter(major_formatter)
 
-    legend = ax.legend(loc=5)
     #ax.set_title("Memory Usage in Richards")
 
     plt.draw()
diff --git a/talk/dls2014/paper/plots/richards_mem.pdf b/talk/dls2014/paper/plots/richards_mem.pdf
index 138f5784e44512fef2911dbf52122be8a88d49bd..17e9e4ae371aeb31f4e7578a35b7ab0ebb9a775c
GIT binary patch

[cut]