[pypy-commit] extradoc extradoc: some tweaks and several notes

Thu May 15 12:39:06 CEST 2014

Author: Carl Friedrich Bolz <cfbolz at gmx.de>
Branch: extradoc
Changeset: r5249:6f0c28385e2e
Date: 2014-05-15 12:33 +0200
http://bitbucket.org/pypy/extradoc/changeset/6f0c28385e2e/

Log:	some tweaks and several notes

diff --git a/talk/dls2014/report/report.tex b/talk/dls2014/report/report.tex
--- a/talk/dls2014/report/report.tex
+++ b/talk/dls2014/report/report.tex
@@ -38,6 +38,7 @@
   }%
 }
 \newcommand\remi[1]{\mynote{Remi}{#1}}
+\newcommand\cfbolz[1]{\mynote{cfbolz}{#1}}
 
 % Title.
 % ------
@@ -145,6 +146,8 @@
 mechanism that avoids several of the problems of locks as they are
 used now.
 
+\cfbolz{the above is good, here is something missing: problems with current STM approaches, outlining the intuition behind the new one}
+
 Our contributions include:
 \begin{itemize}[noitemsep]
 \item We introduce a new software transactional memory (STM) system
@@ -177,13 +180,16 @@
 If we start multiple such transactions in multiple threads, the TM
 system guarantees that the outcome of running the transactions is
 \emph{serializable}. Meaning, the outcome is equal to some sequential
-execution of these transactions. Overall, this is exactly what a
-single global lock guarantees while still allowing the TM system to
+execution of these transactions. This means that the approach provides the same
+semantics as using the GIL
+while still allowing the TM system to
 run transactions in parallel as an optimization.
 
 
 \subsection{Python}
 
+\cfbolz{a pypy introduction needs to go somewhere, a paragraph or so. maybe in the evaluation section}
+
 We implement and evaluate our system for the Python language. For the
 actual implementation, we chose the PyPy interpreter because replacing
 the GIL there with a TM system is just a matter of adding a new
@@ -219,10 +225,12 @@
 
 \subsection{Synchronization}
 
+ cfbolz{citation again needed for the whole subsection}
+
 It is well known that using locks to synchronize multiple threads is
 hard. They are non-composable, have overhead, may deadlock, limit
 scalability, and overall add a lot of complexity. For a better
-parallel programming model for dynamic languages, we want to add
+parallel programming model for dynamic languages, we want to implement
 another, well-known synchronization mechanism: \emph{atomic blocks}.
 
 Atomic blocks are composable, deadlock-free, higher-level and expose
@@ -245,9 +253,13 @@
 should clarify the general semantics using commonly used terms from
 the literature.
 
+\cfbolz{there is an overview paragraph of the idea missing, maybe in the introduction}
+
+\cfbolz{this all feels very much dumping details, needs more overview. why is this info important? the subsubsections don't have any connections}
 
 \subsubsection{Conflict Handling}
 
+
 Our conflict detection works with \emph{object
   granularity}. Conceptually, it is based on \emph{read} and
 \emph{write sets} of transactions.  Two transactions conflict if they
@@ -276,6 +288,8 @@
 the isolation provides full \emph{opacity} to always guarantee a consistent
 read set.
 
+\cfbolz{this paragraph is hard to understand without giving an example (eg console printing) when it is useful}
+
 We support the notion of \emph{inevitable transactions} that are always
 guaranteed to commit. There is always at most one such transaction
 running in the system. We use this kind of transaction to provide
@@ -321,7 +335,7 @@
 threads.
 
 To get references to objects that are valid in all threads, we will
-use the object's offset inside the segment. Since all segments are
+use \cfbolz{use for what?} the object's offset inside the segment. Since all segments are
 copies of each other, the \emph{Segment Offset (SO)} will point to the
 private version of an object in all threads/segments. To then
 translate this SO to a real virtual memory address when used inside a
@@ -329,6 +343,8 @@
 SO. The result of this operation is called a \emph{Linear Address
   (LA)}. This is illustrated in Figure \ref{fig:Segment-Addressing}.
 
+\cfbolz{here it needs to say that this is x86 specific}
+
 To make this address translation efficient, we use the segment
 register $\%gs$. When this register points to a thread's segment start
 address, we can instruct the CPU to perform the above translation from
@@ -444,7 +460,7 @@
 \item [{Read~Barrier:}] Adds the object to the read set of the current
   transaction. Since our two-step address translation automatically
   resolves the reference to the private version of the object on every
-  access anyway, this is not the job of the read barrier anymore.
+  access anyway, the read barrier does not need to do address translation anymore.
 \item [{Write~Barrier:}] Adds the object to the read and write set of
   the current transaction and checks if all pages of the object are
   private, doing COW otherwise.\\
@@ -461,7 +477,7 @@
 
 \subsubsection{Atomicity: Commit \& Abort}
 
-To provide atomicity for a transaction, we want to make changes
+To provide atomicity for a transaction, we want to make changes globally
 visible on commit. We also need to be able to completely abort a
 transaction without a trace, like it never happened.
 \begin{description}
@@ -473,13 +489,15 @@
   transaction waiting or aborting.\\
   We then push all changes of modified objects in private pages to all
   the pages in other segments, including the sharing-segment (segment
-  0).
+  0). \cfbolz{can it really happen that you push pages to other segments? I thought it's always just back to the sharing segment}
 \item [{Abort:}] On abort the transaction will forget about all the
   changes it has done. All objects in the write set are reset by
   copying their previous version from the sharing-segment into the
   private pages of the aborting transaction.
+  \cfbolz{why doing any copying? aren't the pages re-shared instead?}
 \end{description}
 
+\cfbolz{random question: did we investigate the extra memory requirements? we should characterize memory overhead somewhere, eg at least one byte per object for the read markers}
 
 \subsubsection{Summary}
 
@@ -583,7 +601,7 @@
   first generation.
 \item [{Old~object~space:}] These pages are the ones that are really
   shared between segments. They mostly contain old objects but also
-  some young ones that were too big to allocate in the nursery.
+  some young ones that were too big to be allocated in the nursery.
 \end{description}
 
 
@@ -613,8 +631,7 @@
 
 Therefore, a thread may be assigned to different segments each time it
 starts a transaction. Although, we try to assign it the same segment
-again if possible. And a maximum of $N$ transactions may run in
-parallel.
+again if possible.
 
 
 
@@ -624,6 +641,8 @@
 Garbage collection plays a big role in our TM system. The GC is
 generational and has two generations.
 
+\cfbolz{maybe use "young" and "old" generation, if there are only two}
+
 The \textbf{first generation}, where objects are considered to be
 \emph{young} and reside in the \emph{Nursery}, is collected by
 \emph{minor collections}. These collections move the surviving objects
@@ -667,7 +686,7 @@
 
 The point of the read barrier is to add the object to the read set of
 the transaction. This information is needed to detect conflicts
-between transactions. Usually, it also resolves an object reference to
+between transactions. In other STM systems, it also resolves an object reference to
 a private copy, but since the CPU performs our address translation on
 every object access efficiently, we do not need to do that in our
 barrier.
@@ -797,7 +816,7 @@
 set (\lstinline!modified_old_objects!)  and check the corresponding
 \lstinline!read_markers!  in other threads/segments. If we detect a
 read-write conflict, we do contention management to either abort us or
-the other transaction, or to simply wait a bit.
+the other transaction, or to simply wait a bit. \cfbolz{why does waiting help?}
 
 After verifying that there are no conflicts anymore, we copy all our
 changes done to the objects in the write set to all other segments,