[pypy-commit] extradoc extradoc: Finish the slides
arigo
noreply at buildbot.pypy.org
Wed Jul 23 08:02:34 CEST 2014
Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r5372:3c5ba3e46ec5
Date: 2014-07-23 08:02 +0200
http://bitbucket.org/pypy/extradoc/changeset/3c5ba3e46ec5/
Log: Finish the slides
diff --git a/talk/ep2014/stm/talk.html b/talk/ep2014/stm/talk.html
--- a/talk/ep2014/stm/talk.html
+++ b/talk/ep2014/stm/talk.html
@@ -502,40 +502,64 @@
</li>
</ul>
</div>
-<div class="slide" id="big-point">
-<h1>Big Point</h1>
+<div class="slide" id="pypy-stm">
+<h1>PyPy-STM</h1>
<ul class="simple">
-<li>application-level locks still needed...</li>
+<li>implementation of a specially-tailored STM ("hard" part):<ul>
+<li>a reusable C library</li>
+<li>called STMGC-C7</li>
+</ul>
+</li>
+<li>used in PyPy to replace the GIL ("easy" part)</li>
+<li>could also be used in CPython<ul>
+<li>but refcounting needs replacing</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="slide" id="how-does-it-work">
+<h1>How does it work?</h1>
+<object data="fig4.svg" type="image/svg+xml">
+fig4.svg</object>
+</div>
+<div class="slide" id="demo">
+<h1>Demo</h1>
+<ul class="simple">
+<li>counting primes</li>
+</ul>
+</div>
+<div class="slide" id="long-transactions">
+<h1>Long Transactions</h1>
+<ul class="simple">
+<li>threads and application-level locks still needed...</li>
<li>but <em>can be very coarse:</em><ul>
-<li>even two big transactions can optimistically run in parallel</li>
+<li>two transactions can optimistically run in parallel</li>
<li>even if they both <em>acquire and release the same lock</em></li>
+<li>internally, drive the transaction lengths by the locks we acquire</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="id2">
-<h1>Big Point</h1>
+<h1>Long Transactions</h1>
<object data="fig4.svg" type="image/svg+xml">
fig4.svg</object>
</div>
-<div class="slide" id="demo-1">
-<h1>Demo 1</h1>
+<div class="slide" id="id3">
+<h1>Demo</h1>
<ul class="simple">
-<li>"Twisted apps made parallel out of the box"</li>
<li>Bottle web server</li>
</ul>
</div>
-<div class="slide" id="pypy-stm">
-<h1>PyPy-STM</h1>
+<div class="slide" id="pypy-stm-programming-model">
+<h1>PyPy-STM Programming Model</h1>
<ul class="simple">
-<li>implementation of a specially-tailored STM:<ul>
-<li>a reusable C library</li>
-<li>called STMGC-C7</li>
-</ul>
-</li>
-<li>used in PyPy to replace the GIL</li>
-<li>could also be used in CPython<ul>
-<li>but refcounting needs replacing</li>
+<li>threads-and-locks, fully compatible with the GIL</li>
+<li>this is not "everybody should use careful explicit threading
+with all the locking issues"</li>
+<li>instead, PyPy-STM pushes forward:<ul>
+<li>use a thread pool library</li>
+<li>coarse locking, inside that library only</li>
</ul>
</li>
</ul>
@@ -546,7 +570,7 @@
<li>current status:<ul>
<li>basics work</li>
<li>best case 25-40% overhead (much better than originally planned)</li>
-<li>parallelizing user locks not done yet</li>
+<li>parallelizing user locks not done yet (see "with atomic")</li>
<li>tons of things to improve</li>
<li>tons of things to improve</li>
<li>tons of things to improve</li>
@@ -558,52 +582,113 @@
</li>
</ul>
</div>
-<div class="slide" id="demo-2">
-<h1>Demo 2</h1>
+<div class="slide" id="summary-benefits">
+<h1>Summary: Benefits</h1>
<ul class="simple">
-<li>counting primes</li>
-</ul>
-</div>
-<div class="slide" id="benefits">
-<h1>Benefits</h1>
-<ul class="simple">
-<li>Keep locks coarse-grained</li>
<li>Potential to enable parallelism:<ul>
-<li>in CPU-bound multithreaded programs</li>
+<li>in any CPU-bound multithreaded program</li>
<li>or as a replacement of <tt class="docutils literal">multiprocessing</tt></li>
<li>but also in existing applications not written for that</li>
<li>as long as they do multiple things that are "often independent"</li>
</ul>
</li>
+<li>Keep locks coarse-grained</li>
</ul>
</div>
-<div class="slide" id="issues">
-<h1>Issues</h1>
+<div class="slide" id="summary-issues">
+<h1>Summary: Issues</h1>
<ul class="simple">
-<li>Performance hit: 25-40% everywhere (may be ok)</li>
<li>Keep locks coarse-grained:<ul>
<li>but in case of systematic conflicts, performance is bad again</li>
<li>need to track and fix them</li>
-<li>need tool support (debugger/profiler)</li>
+<li>need tool to support this (debugger/profiler)</li>
</ul>
</li>
+<li>Performance hit: 25-40% over a plain PyPy-JIT (may be ok)</li>
</ul>
</div>
-<div class="slide" id="summary">
-<h1>Summary</h1>
+<div class="slide" id="summary-pypy-stm">
+<h1>Summary: PyPy-STM</h1>
<ul class="simple">
-<li>Transactional Memory is still too researchy for production</li>
-<li>But it has the potential to enable "easier parallelism"</li>
+<li>Not production-ready</li>
+<li>But it has the potential to enable "easier parallelism for everybody"</li>
<li>Still alpha but slowly getting there!<ul>
<li>see <a class="reference external" href="http://morepypy.blogspot.com/">http://morepypy.blogspot.com/</a></li>
</ul>
</li>
+<li>Crowdfunding!<ul>
+<li>see <a class="reference external" href="http://pypy.org/">http://pypy.org/</a></li>
+</ul>
+</li>
</ul>
</div>
<div class="slide" id="part-2-under-the-hood">
<h1>Part 2 - Under The Hood</h1>
<p><strong>STMGC-C7</strong></p>
</div>
+<div class="slide" id="overview">
+<h1>Overview</h1>
+<ul class="simple">
+<li>Say we want to run N = 2 threads</li>
+<li>We reserve twice the memory</li>
+<li>Thread 1 reads/writes "memory segment" 1</li>
+<li>Thread 2 reads/writes "memory segment" 2</li>
+<li>Upon commit, we (try to) copy the changes to the other segment</li>
+</ul>
+</div>
+<div class="slide" id="trick-1">
+<h1>Trick #1</h1>
+<ul class="simple">
+<li>Objects contain pointers to each other</li>
+<li>These pointers are relative instead of absolute:<ul>
+<li>accessed as if they were "thread-local data"</li>
+<li>the x86 has a zero-cost way to do that (<tt class="docutils literal">%fs</tt>, <tt class="docutils literal">%gs</tt>)</li>
+<li>supported in clang (not gcc so far)</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="slide" id="trick-2">
+<h1>Trick #2</h1>
+<ul class="simple">
+<li>With Trick #1, most objects are exactly identical in all N segments:<ul>
+<li>so we share the memory</li>
+<li><tt class="docutils literal">mmap() MAP_SHARED</tt></li>
+<li>actual memory usage is multiplied by much less than N</li>
+</ul>
+</li>
+<li>Newly allocated objects are directly in shared pages:<ul>
+<li>we don't actually need to copy <em>all new objects</em> at commit,
+but only the few <em>old objects</em> modified</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="slide" id="barriers">
+<h1>Barriers</h1>
+<ul class="simple">
+<li>Need to record all reads and writes done by a transaction</li>
+<li>Extremely cheap way to do that:<ul>
+<li><em>Read:</em> set a flag in thread-local memory (one byte)</li>
+<li><em>Write</em> into a newly allocated object: nothing to do</li>
+<li><em>Write</em> into an old object: add the object to a list</li>
+</ul>
+</li>
+<li>Commit: check if each object from that list conflicts with
+a read flag set in some other thread</li>
+</ul>
+</div>
+<div class="slide" id="id4">
+<h1>...</h1>
+</div>
+<div class="slide" id="thank-you">
+<h1>Thank You</h1>
+<ul class="simple">
+<li><a class="reference external" href="http://morepypy.blogspot.com/">http://morepypy.blogspot.com/</a></li>
+<li><a class="reference external" href="http://pypy.org/">http://pypy.org/</a></li>
+<li>irc: <tt class="docutils literal">#pypy</tt> on freenode.net</li>
+</ul>
+</div>
</div>
</body>
</html>
diff --git a/talk/ep2014/stm/talk.rst b/talk/ep2014/stm/talk.rst
--- a/talk/ep2014/stm/talk.rst
+++ b/talk/ep2014/stm/talk.rst
@@ -153,8 +153,8 @@
- but refcounting needs replacing
-Commits
----------
+How does it work?
+-----------------
.. image:: fig4.svg
@@ -165,17 +165,19 @@
* counting primes
-Big Point
+Long Transactions
----------------------------
-* application-level locks still needed...
+* threads and application-level locks still needed...
* but *can be very coarse:*
- - even two big transactions can optimistically run in parallel
+ - two transactions can optimistically run in parallel
- even if they both *acquire and release the same lock*
+ - internally, drive the transaction lengths by the locks we acquire
+
Long Transactions
-----------------
@@ -211,7 +213,7 @@
- basics work
- best case 25-40% overhead (much better than originally planned)
- - parallelizing user locks not done yet (see ``with atomic``)
+ - parallelizing user locks not done yet (see "with atomic")
- tons of things to improve
- tons of things to improve
- tons of things to improve
@@ -224,8 +226,6 @@
Summary: Benefits
-----------------
-* Keep locks coarse-grained
-
* Potential to enable parallelism:
- in any CPU-bound multithreaded program
@@ -236,6 +236,8 @@
- as long as they do multiple things that are "often independent"
+* Keep locks coarse-grained
+
Summary: Issues
---------------
@@ -248,7 +250,7 @@
- need tool to support this (debugger/profiler)
-* Performance hit: 25-40% everywhere (may be ok)
+* Performance hit: 25-40% over a plain PyPy-JIT (may be ok)
Summary: PyPy-STM
@@ -256,12 +258,16 @@
* Not production-ready
-* But it has the potential to enable "easier parallelism"
+* But it has the potential to enable "easier parallelism for everybody"
* Still alpha but slowly getting there!
- see http://morepypy.blogspot.com/
+* Crowdfunding!
+
+ - see http://pypy.org/
+
Part 2 - Under The Hood
-----------------------
@@ -272,7 +278,7 @@
Overview
--------
-* Say we want to run two threads
+* Say we want to run N = 2 threads
* We reserve twice the memory
@@ -290,16 +296,56 @@
* These pointers are relative instead of absolute:
- -
+ - accessed as if they were "thread-local data"
+ - the x86 has a zero-cost way to do that (``%fs``, ``%gs``)
-Trick #1
+ - supported in clang (not gcc so far)
+
+
+Trick #2
--------
-* Most objects are the same in all segments:
+* With Trick #1, most objects are exactly identical in all N segments:
- so we share the memory
- - ``mmap() MAP_SHARED`` trickery
+ - ``mmap() MAP_SHARED``
+ - actual memory usage is multiplied by much less than N
+* Newly allocated objects are directly in shared pages:
+
+ - we don't actually need to copy *all new objects* at commit,
+ but only the few *old objects* modified
+
+
+Barriers
+--------
+
+* Need to record all reads and writes done by a transaction
+
+* Extremely cheap way to do that:
+
+ - *Read:* set a flag in thread-local memory (one byte)
+
+ - *Write* into a newly allocated object: nothing to do
+
+ - *Write* into an old object: add the object to a list
+
+* Commit: check if each object from that list conflicts with
+ a read flag set in some other thread
+
+
+...
+-------------------
+
+
+Thank You
+---------
+
+* http://morepypy.blogspot.com/
+
+* http://pypy.org/
+
+* irc: ``#pypy`` on freenode.net
More information about the pypy-commit
mailing list