[pypy-commit] extradoc extradoc: paper updates

Thu May 15 16:48:28 CEST 2014

Author: Remi Meier <remi.meier at inf.ethz.ch>
Branch: extradoc
Changeset: r5251:46bc3a8000ea
Date: 2014-05-15 16:48 +0200
http://bitbucket.org/pypy/extradoc/changeset/46bc3a8000ea/

Log:	paper updates

diff --git a/talk/dls2014/paper/paper.tex b/talk/dls2014/paper/paper.tex
--- a/talk/dls2014/paper/paper.tex
+++ b/talk/dls2014/paper/paper.tex
@@ -142,7 +142,7 @@
 good performance that enables new applications. However, a parallel
 programming model was not part of the design of those languages. Thus,
 the reference implementations of e.g. Python and Ruby use a single,
-global interpreter lock (GIL) to serialize the execution of code in
+global interpreter lock (GIL) to serialise the execution of code in
 threads.
 
 While this GIL prevents any parallelism from occurring, it also
@@ -151,18 +151,23 @@
 in-between such instructions, it provides perfect isolation and
 atomicity between multiple threads for a series of
 instructions. Another technology that can provide the same guarantees
-is transactional memory (TM).
+is transactional memory (TM). \remi{cite our position paper}
 
 There have been several attempts at replacing the GIL with TM. Using
 transactions to enclose multiple bytecode instructions, we can get the
 very same semantics as the GIL while possibly executing several
 transactions in parallel. Furthermore, by exposing these
 interpreter-level transactions to the application in the form of
-\emph{atomic blocks}, we give dynamic languages a new synchronization
+\emph{atomic blocks}, we give dynamic languages a new synchronisation
 mechanism that avoids several of the problems of locks as they are
 used now.
 
-\cfbolz{the above is good, here is something missing: problems with current STM approaches, outlining the intuition behind the new one}
+\remi{cite and extract from (our pos. paper):}
+TM systems come in can be broadly categorised as hardware based (HTM),
+software based (STM), or hybrid systems (HyTM). HTM systems are limited
+by hardware constraints, while STM systems have a lot of overhead.
+In this paper, we describe how we manage to lower the overhead of our
+STM system so that it can be seen as a viable replacement for the GIL.
 
 Our contributions include:
 \begin{itemize}[noitemsep]
@@ -173,7 +178,7 @@
 \item This new STM system is used to replace the GIL in Python and is
   then evaluated extensively.
 \item We introduce atomic blocks to the Python language to provide a
-  backwards compatible, composable synchronization mechanism for
+  backwards compatible, composable synchronisation mechanism for
   threads.
 \end{itemize}
 
@@ -187,19 +192,19 @@
 Transactional memory (TM) is a concurrency control mechanism that
 comes from database systems. Using transactions, we can group a series
 of instructions performing operations on memory and make them happen
-atomically and in complete isolations from other
+atomically and in complete isolation from other
 transactions. \emph{Atomicity} means that all these instructions in
-the transaction and their effects seem to happen at one, undividable
+the transaction and their effects seem to happen at one, indivisible
 point in time. Other transactions never see inconsistent state of a
 partially executed transaction which is called \emph{isolation}.
 
 If we start multiple such transactions in multiple threads, the TM
 system guarantees that the outcome of running the transactions is
-\emph{serializable}. Meaning, the outcome is equal to some sequential
+\emph{serialisable}. Meaning, the outcome is equal to some sequential
 execution of these transactions. This means that the approach provides the same
 semantics as using the GIL
 while still allowing the TM system to
-run transactions in parallel as an optimization.
+run transactions in parallel as an optimisation.
 
 
 \subsection{Python}
@@ -239,15 +244,15 @@
 spots. We will compare our work with Jython for evaluation.
 
 
-\subsection{Synchronization}
+\subsection{Synchronisation}
 
 \cfbolz{citation again needed for the whole subsection}
 
-It is well known that using locks to synchronize multiple threads is
+It is well known that using locks to synchronise multiple threads is
 hard. They are non-composable, have overhead, may deadlock, limit
 scalability, and overall add a lot of complexity. For a better
 parallel programming model for dynamic languages, we want to implement
-another, well-known synchronization mechanism: \emph{atomic blocks}.
+another, well-known synchronisation mechanism: \emph{atomic blocks}.
 
 Atomic blocks are composable, deadlock-free, higher-level and expose
 useful atomicity and isolation guarantees to the application for a
@@ -265,69 +270,61 @@
 
 \subsection{Transactional Memory Model}
 
-In this section, we describe the general model of our TM system. This
-should clarify the general semantics in commonly used terms from
-the literature.
+In this section, we characterise the model of our TM system and its
+guarantees as well as some of the design choices we made. This should
+clarify the general semantics in commonly used terms from the
+literature.\remi{cite Transactional Memory 2nd edition}
 
-\cfbolz{there is an overview paragraph of the idea missing, maybe in the introduction}
-
-\cfbolz{this all feels very much dumping details, needs more overview. why is this info important? the subsubsections don't have any connections}
+Our TM system is fully implemented in software. However, we do exploit
+some more advanced features of current CPUs, particularly \emph{memory
+segmentation, virtual memory,} and the 64-bit address space. Still,
+it cannot be classified as a hybrid TM system since it currently
+makes no use of any HTM present in the CPU.
 
 \subsubsection{Conflict Handling}
 
+We implement an object-based TM system, thus it makes sense to detect
+conflicts with \emph{object granularity}. With this choice, if two
+transactions access the same object and at least one access is a
+write, we count it as a conflict. Conceptually, it is based on
+\emph{read} and \emph{write sets} of transactions. Reading from an
+object adds the object to the read set, writing to it adds it to both
+sets. Two transactions conflict if they have accessed a common object
+that is in the write set of at least one of them.
 
-Our conflict detection works with \emph{object
-  granularity}. Conceptually, it is based on \emph{read} and
-\emph{write sets} of transactions.  Two transactions conflict if they
-have accessed a common object that is now in the write set of at least
-one of them.
-
-The \emph{concurrency control} works partly \emph{optimistically} for
-reading of objects, where conflicts caused by just reading an object
-in transactions are detected only when the transaction that writes the
-object actually commits. For write-write conflicts we are currently
+The detection, or \emph{concurrency control}, works partly
+\emph{optimistically} for reading objects. Read-write conflicts
+between two transactions are detected in both exactly at the time when
+the writing one commits. For write-write conflicts we are currently
 \emph{pessimistic}: Only one transaction may have a certain object in
 its write set at any point in time, others trying to write to it will
-have to wait or abort.
+have to wait or abort. This decision needs to be evaluated further
+in the future.
 
-We use \emph{lazy version management} to ensure that modifications by
-a transaction are not visible to another transaction before the former
-commits.
-
-
-
+When a conflict is detected, we perform some simple contention
+management that generally prefers the older transaction to the younger.
 
 \subsubsection{Semantics}
 
-As required for TM systems, we guarantee complete \emph{isolation}
-and \emph{atomicity} for transactions at all times. Furthermore,
-the isolation provides full \emph{opacity} to always guarantee a consistent
-read set.
+As required for TM systems, we guarantee complete \emph{isolation} and
+\emph{atomicity} for transactions at all times. Our method of choice
+is \emph{lazy version management}. Modifications by a transaction are
+not visible to another transaction before the former commits.
+Furthermore, the isolation provides full \emph{opacity} to always
+guarantee a consistent read set even for non-committed transactions.
+\remi{cite On the Correctness of Transactional Memory}
 
-To support irreversible operations that cannot be undone when we abort
-a transaction (e.g. I/O, syscalls, and non-transactional code in
-general), we employ \emph{irrevocable} or \emph{inevitable
-transactions}. These transactions are always guaranteed to
-commit. There is always at most one such transaction running in the
-system, thus their execution is serialised. With this guarantee,
-providing \emph{strong isolation} and \emph{serializability} between
-non-transactional code is possible by making the current transaction
-inevitable right before running irreversible operations.
-
-
-\subsubsection{Contention Management}
-
-When a conflict is detected, we perform some simple contention
-management.  First, inevitable transactions always win. Second, the
-older transaction wins. Different schemes are possible.
-
-
-\subsubsection{Software Transactional Memory}
-
-Generally speaking, the system is fully implemented in
-software. However, we exploit some more advanced features of current
-CPUs, especially \emph{memory segmentation, virtual memory,} and the
-64-bit address space.
+To also support these properties for irreversible operations that
+cannot be undone when we abort a transaction (e.g. I/O, syscalls, and
+non-transactional code in general), we use \emph{irrevocable} or
+\emph{inevitable transactions}. These transactions are always
+guaranteed to commit, which is why they always have to win in case
+there is a conflict with another, normal transaction. There is always
+at most one such transaction running in the system, thus their
+execution is serialised. With this guarantee, providing \emph{strong
+isolation} and \emph{serialisability} between non-transactional code
+is possible by making the current transaction inevitable right before
+running irreversible operations.
 
 
 \subsection{Implementation}
@@ -420,7 +417,7 @@
 the segments $>0$ to the file pages of our sharing-segment. This is
 the fully-shared configuration.
 
-During runtime, we can then privatize single pages in segments $>0$
+During runtime, we can then privatise single pages in segments $>0$
 again by remapping single pages as seen in (III).
 
 Looking back at address translation for object references, we see now
@@ -428,7 +425,7 @@
 translated to different linear addresses in different threads by the
 CPU. Then, depending on the current mapping of virtual pages to file
 pages, these LAs can map to a single file page in the sharing-segment,
-or to privatized file pages in the corresponding segments. This
+or to privatised file pages in the corresponding segments. This
 mapping is also performed efficiently by the CPU and can easily be
 done on every access to an object.
 
@@ -444,7 +441,7 @@
     \par\end{centering}
 
     \protect\caption{Page Remapping: (I) after \texttt{mmap()}. (II) remap all pages to
-      segment 0, fully shared memory configuration. (III) privatize single
+      segment 0, fully shared memory configuration. (III) privatise single
       pages.\label{fig:Page-Remapping}}
 \end{figure}
 
@@ -459,7 +456,7 @@
 object without other threads seeing the changes immediately, we ensure
 that all pages belonging to the object are private to our segment.
 
-To detect when to privatize pages, we use write barriers before every
+To detect when to privatise pages, we use write barriers before every
 write. When the barrier detects that the object is not in a private
 page (or any pages that belong to the object), we remap and copy the
 pages to the thread's segment. From now on, the translation of
@@ -500,7 +497,7 @@
 visible on commit. We also need to be able to completely abort a
 transaction without a trace, like it never happened.
 \begin{description}
-\item [{Commit:}] If a transaction commits, we synchronize all threads
+\item [{Commit:}] If a transaction commits, we synchronise all threads
   so that all of them are waiting in a safe point. In the committing
   transaction, we go through all objects in the write set and check if
   another transaction in a different segment read the same object.
@@ -520,13 +517,13 @@
   resetting should be faster than re-sharing.
 \end{description}
 
-\cfbolz{random question: did we investigate the extra memory requirements? we should characterize memory overhead somewhere, eg at least one byte per object for the read markers}
+\cfbolz{random question: did we investigate the extra memory requirements? we should characterise memory overhead somewhere, eg at least one byte per object for the read markers}
 
 \subsubsection{Summary}
 
-We provide isolation between transactions by privatizing the pages of
+We provide isolation between transactions by privatising the pages of
 the segments belonging to the threads the transactions run in.  To
-detect when and which pages need privatization, we use write barriers
+detect when and which pages need privatisation, we use write barriers
 that trigger a COW of one or several pages. Conflicts, however, are
 detected on the level of objects; based on the concept of read and
 write sets. Barriers before reading and writing add objects to the
@@ -605,7 +602,7 @@
 of segments $>0$ map to the pages of the sharing-segment.
 
 However, the layout of a segment is not uniform and we actually
-privatize a few areas again right away. These areas are illustrated in
+privatise a few areas again right away. These areas are illustrated in
 Figure \ref{fig:Segment-Layout} and explained here:
 \begin{description}[noitemsep]
 \item [{NULL~page:}] This page is unmapped and will produce a
@@ -725,10 +722,10 @@
 transaction, which will be incremented on each commit.  Thereby, we
 can avoid resetting the bytes to \lstinline!false!  on commit and only
 need to do this every 255 transactions. The whole code for the barrier
-is easily optimizable for compilers as well as perfectly predictable
+is easily optimisable for compilers as well as perfectly predictable
 for CPUs:
 
-\begin{lstlisting}[basicstyle={\footnotesize\ttfamily},tabsize=4]
+\begin{lstlisting}
 void stm_read(SO):
     *(SO >> 4) = read_version
 \end{lstlisting}
@@ -749,7 +746,7 @@
 objects. It is never set on freshly allocated objects that still
 reside in the nursery.
 
-\begin{lstlisting}[basicstyle={\footnotesize\ttfamily},tabsize=4]
+\begin{lstlisting}
 void stm_write(SO):
 	if SO->flags & WRITE_BARRIER:
 		write_slowpath(SO)
@@ -758,7 +755,7 @@
 
 The \textbf{slow path} is shown here:
 
-\begin{lstlisting}[basicstyle={\footnotesize\ttfamily},tabsize=4]
+\begin{lstlisting}
 void write_slowpath(SO):
 	// GC part:
 	list_append(to_trace, SO)
@@ -794,7 +791,7 @@
 to. The check for \lstinline!is_overflow_obj()!  tells us if the
 object was actually created in this transaction. In that case, we do
 not need to execute the following \emph{TM part}.  We especially do
-not need to privatize the page since no other transaction knows about
+not need to privatise the page since no other transaction knows about
 these ``old'' objects.
 
 For TM, we first perform a read barrier on the object. We then try to
@@ -805,7 +802,7 @@
 that will abort either us or the current owner of the object.  If we
 succeed in acquiring the lock using an atomic
 \lstinline!cmp_and_swap!, we need to add the object to the write set
-(a simple list called \lstinline!modified_old_objects!)  and privatize
+(a simple list called \lstinline!modified_old_objects!)  and privatise
 all pages belonging to it (copy-on-write).
 
 In all cases, we remove the \lstinline!WRITE_BARRIER!  flag from the
@@ -823,7 +820,7 @@
 over all objects in the write set (\lstinline!modified_old_objects!)
 and reset any modifications in our private pages by copying from the
 sharing-segment. What is left is to use \lstinline!longjmp()!  to jump
-back to the location initialized by a \lstinline!setjmp()!  in
+back to the location initialised by a \lstinline!setjmp()!  in
 \lstinline!stm_start_transaction()!.  Increasing the
 \lstinline!read_version!  is also done there.
 
@@ -832,7 +829,7 @@
 
 \subsubsection{Commit}
 
-Committing a transaction needs a bit more work. First, we synchronize
+Committing a transaction needs a bit more work. First, we synchronise
 all threads so that the committing one is the only one running and all
 the others are waiting in a safe point. We then go through the write
 set (\lstinline!modified_old_objects!)  and check the corresponding
@@ -853,18 +850,18 @@
 
 
 
-\subsubsection{Thread Synchronization}
+\subsubsection{Thread Synchronisation}
 
-A requirement for performing a commit is to synchronize all threads so
+A requirement for performing a commit is to synchronise all threads so
 that we can safely update objects in other segments. To make this
-synchronization fast and cheap, we do not want to insert an additional
-check regularly in order to see if synchronization is requested. We
+synchronisation fast and cheap, we do not want to insert an additional
+check regularly in order to see if synchronisation is requested. We
 use a trick relying on the fact that dynamic languages are usually
 very high-level and thus allocate a lot of objects very regularly.
 This is done through the function \lstinline!stm_allocate!  shown
 below:
 
-\begin{lstlisting}[basicstyle={\footnotesize\ttfamily},tabsize=4]
+\begin{lstlisting}
 object_t *stm_allocate(ssize_t size_rounded):
     result = nursery_current
 	nursery_current += size_rounded
@@ -882,14 +879,14 @@
 path of the function to possibly perform a minor collection in order
 to free up space in the nursery.
 
-If we want to synchronize all threads, we can rely on this check being
+If we want to synchronise all threads, we can rely on this check being
 performed regularly. So what we do is to set the
 \lstinline!nursery_end!  to $0$ in all segments that we want to
-synchronize. The mentioned check will then fail in those segments and
+synchronise. The mentioned check will then fail in those segments and
 call the slow path. In \lstinline!allocate_slowpath!  they can simply
 check for this condition and enter a safe point.
 
-For other synchronization requirements, for example:
+For other synchronisation requirements, for example:
 \begin{itemize}[noitemsep]
 \item waiting for a segment to be released,
 \item waiting for a transaction to abort or commit,