[pypy-commit] extradoc extradoc: some tweaks and several notes
cfbolz
noreply at buildbot.pypy.org
Thu May 15 12:39:06 CEST 2014
Author: Carl Friedrich Bolz <cfbolz at gmx.de>
Branch: extradoc
Changeset: r5249:6f0c28385e2e
Date: 2014-05-15 12:33 +0200
http://bitbucket.org/pypy/extradoc/changeset/6f0c28385e2e/
Log: some tweaks and several notes
diff --git a/talk/dls2014/report/report.tex b/talk/dls2014/report/report.tex
--- a/talk/dls2014/report/report.tex
+++ b/talk/dls2014/report/report.tex
@@ -38,6 +38,7 @@
}%
}
\newcommand\remi[1]{\mynote{Remi}{#1}}
+\newcommand\cfbolz[1]{\mynote{cfbolz}{#1}}
% Title.
% ------
@@ -145,6 +146,8 @@
mechanism that avoids several of the problems of locks as they are
used now.
+\cfbolz{the above is good, here is something missing: problems with current STM approaches, outlining the intuition behind the new one}
+
Our contributions include:
\begin{itemize}[noitemsep]
\item We introduce a new software transactional memory (STM) system
@@ -177,13 +180,16 @@
If we start multiple such transactions in multiple threads, the TM
system guarantees that the outcome of running the transactions is
\emph{serializable}. Meaning, the outcome is equal to some sequential
-execution of these transactions. Overall, this is exactly what a
-single global lock guarantees while still allowing the TM system to
+execution of these transactions. This means that the approach provides the same
+semantics as using the GIL
+while still allowing the TM system to
run transactions in parallel as an optimization.
\subsection{Python}
+\cfbolz{a pypy introduction needs to go somewhere, a paragraph or so. maybe in the evaluation section}
+
We implement and evaluate our system for the Python language. For the
actual implementation, we chose the PyPy interpreter because replacing
the GIL there with a TM system is just a matter of adding a new
@@ -219,10 +225,12 @@
\subsection{Synchronization}
+ cfbolz{citation again needed for the whole subsection}
+
It is well known that using locks to synchronize multiple threads is
hard. They are non-composable, have overhead, may deadlock, limit
scalability, and overall add a lot of complexity. For a better
-parallel programming model for dynamic languages, we want to add
+parallel programming model for dynamic languages, we want to implement
another, well-known synchronization mechanism: \emph{atomic blocks}.
Atomic blocks are composable, deadlock-free, higher-level and expose
@@ -245,9 +253,13 @@
should clarify the general semantics using commonly used terms from
the literature.
+\cfbolz{there is an overview paragraph of the idea missing, maybe in the introduction}
+
+\cfbolz{this all feels very much dumping details, needs more overview. why is this info important? the subsubsections don't have any connections}
\subsubsection{Conflict Handling}
+
Our conflict detection works with \emph{object
granularity}. Conceptually, it is based on \emph{read} and
\emph{write sets} of transactions. Two transactions conflict if they
@@ -276,6 +288,8 @@
the isolation provides full \emph{opacity} to always guarantee a consistent
read set.
+\cfbolz{this paragraph is hard to understand without giving an example (eg console printing) when it is useful}
+
We support the notion of \emph{inevitable transactions} that are always
guaranteed to commit. There is always at most one such transaction
running in the system. We use this kind of transaction to provide
@@ -321,7 +335,7 @@
threads.
To get references to objects that are valid in all threads, we will
-use the object's offset inside the segment. Since all segments are
+use \cfbolz{use for what?} the object's offset inside the segment. Since all segments are
copies of each other, the \emph{Segment Offset (SO)} will point to the
private version of an object in all threads/segments. To then
translate this SO to a real virtual memory address when used inside a
@@ -329,6 +343,8 @@
SO. The result of this operation is called a \emph{Linear Address
(LA)}. This is illustrated in Figure \ref{fig:Segment-Addressing}.
+\cfbolz{here it needs to say that this is x86 specific}
+
To make this address translation efficient, we use the segment
register $\%gs$. When this register points to a thread's segment start
address, we can instruct the CPU to perform the above translation from
@@ -444,7 +460,7 @@
\item [{Read~Barrier:}] Adds the object to the read set of the current
transaction. Since our two-step address translation automatically
resolves the reference to the private version of the object on every
- access anyway, this is not the job of the read barrier anymore.
+ access anyway, the read barrier does not need to do address translation anymore.
\item [{Write~Barrier:}] Adds the object to the read and write set of
the current transaction and checks if all pages of the object are
private, doing COW otherwise.\\
@@ -461,7 +477,7 @@
\subsubsection{Atomicity: Commit \& Abort}
-To provide atomicity for a transaction, we want to make changes
+To provide atomicity for a transaction, we want to make changes globally
visible on commit. We also need to be able to completely abort a
transaction without a trace, like it never happened.
\begin{description}
@@ -473,13 +489,15 @@
transaction waiting or aborting.\\
We then push all changes of modified objects in private pages to all
the pages in other segments, including the sharing-segment (segment
- 0).
+ 0). \cfbolz{can it really happen that you push pages to other segments? I thought it's always just back to the sharing segment}
\item [{Abort:}] On abort the transaction will forget about all the
changes it has done. All objects in the write set are reset by
copying their previous version from the sharing-segment into the
private pages of the aborting transaction.
+ \cfbolz{why doing any copying? aren't the pages re-shared instead?}
\end{description}
+\cfbolz{random question: did we investigate the extra memory requirements? we should characterize memory overhead somewhere, eg at least one byte per object for the read markers}
\subsubsection{Summary}
@@ -583,7 +601,7 @@
first generation.
\item [{Old~object~space:}] These pages are the ones that are really
shared between segments. They mostly contain old objects but also
- some young ones that were too big to allocate in the nursery.
+ some young ones that were too big to be allocated in the nursery.
\end{description}
@@ -613,8 +631,7 @@
Therefore, a thread may be assigned to different segments each time it
starts a transaction. Although, we try to assign it the same segment
-again if possible. And a maximum of $N$ transactions may run in
-parallel.
+again if possible.
@@ -624,6 +641,8 @@
Garbage collection plays a big role in our TM system. The GC is
generational and has two generations.
+\cfbolz{maybe use "young" and "old" generation, if there are only two}
+
The \textbf{first generation}, where objects are considered to be
\emph{young} and reside in the \emph{Nursery}, is collected by
\emph{minor collections}. These collections move the surviving objects
@@ -667,7 +686,7 @@
The point of the read barrier is to add the object to the read set of
the transaction. This information is needed to detect conflicts
-between transactions. Usually, it also resolves an object reference to
+between transactions. In other STM systems, it also resolves an object reference to
a private copy, but since the CPU performs our address translation on
every object access efficiently, we do not need to do that in our
barrier.
@@ -797,7 +816,7 @@
set (\lstinline!modified_old_objects!) and check the corresponding
\lstinline!read_markers! in other threads/segments. If we detect a
read-write conflict, we do contention management to either abort us or
-the other transaction, or to simply wait a bit.
+the other transaction, or to simply wait a bit. \cfbolz{why does waiting help?}
After verifying that there are no conflicts anymore, we copy all our
changes done to the objects in the write set to all other segments,
More information about the pypy-commit
mailing list