[pypy-commit] extradoc extradoc: English language cleanups.
noreply at buildbot.pypy.org
Tue Jan 10 19:22:58 CET 2012
Date: 2012-01-10 13:22 -0500
Log: English language cleanups.
diff --git a/blog/draft/laplace.rst b/blog/draft/laplace.rst
@@ -4,9 +4,10 @@
We're excited to let you know about some of the great progress we've made on
-NumPyPy -- both completeness and performance. Here we'll mostly talk about the
-performance side and how far we have come so far. **Word of warning:** this
-work isn't done - we're maybe half way to where we want to be and there are
+NumPyPy: both completeness and performance. In this blog entry we mostly
+will talk about performance and how much progress we have made so far.
+**Word of warning:** this
+work isn't done -- we're maybe half way to where we want to be and there are
many trivial and not so trivial optimizations to be written. (For example, we
haven't even started to implement important optimizations, like vectorization.)
@@ -27,10 +28,10 @@
Numerically the algorithms used are identical, however exact data layout in
memory differs between them.
-**A note about all the benchmarks:** they were each run once, but the
+**A note about all the benchmarks:** they each were run once, but the
performance is very stable across runs.
-Starting with the C version, it implements a dead simple laplace transform
+Starting with the C version, it implements a trivial laplace transform
using two loops and double-reference memory (array of ``int*``). The double
reference does not matter for performance and the two algorithms are
implemented in ``inline-laplace.c`` and ``laplace.c``. They were both compiled
@@ -55,13 +56,14 @@
| inline_slow python | 278 | 23.7 |
-An important thing to notice here is that the data dependency in the inline
-version causes a huge slowdown for the C versions. This is already not too bad
-for us though, the braindead Python version takes longer and PyPy is not able
-to take advantage of the knowledge that the data is independent, but it is in
-the same ballpark as the C versions - **15% - 170%** slower, but the algorithm
-you choose matters more than the language. By comparison, the slow versions
-take about **5.75s** each on CPython 2.6 per iteration, and by estimating,
+An important thing to notice is the data dependency of the inline
+version causes a huge slowdown for the C versions. This is not a severe
+disadvantage for us though -- the brain-dead Python version takes longer
+and PyPy is not able to take advantage of the knowledge that the data is
+independent. The results are in the same ballpark as the C versions --
+**15% - 170%** slower, but the algorithm
+one chooses matters more than the language. By comparison, the slow versions
+take about **5.75s** each on CPython 2.6 per iteration, and by estimation,
are about **200x** slower than the PyPy equivalent, if I had the patience to
measure the full run.
@@ -78,7 +80,7 @@
We need 3 arrays here - one is an intermediate (PyPy only needs one, for all of
those subexpressions), one is a copy for computing the error, and one is the
-result. This works automatically, since in NumPy ``+`` or ``*`` creates an
+result. This works automatically because in NumPy ``+`` or ``*`` creates an
intermediate, while NumPyPy avoids allocating the intermediate if possible.
``numeric_2_time_step`` works in pretty much the same way::
@@ -90,7 +92,7 @@
except the copy is now explicit rather than implicit.
-``numeric_3_time_step`` does the same thing, but notices you don't have to copy
+``numeric_3_time_step`` does the same thing, but notices one doesn't have to copy
the entire array, it's enough to copy the border pieces and fill rest with
@@ -104,12 +106,12 @@
(src[1:-1,0:-2] + src[1:-1, 2:])*dx2)*dnr_inv
``numeric_4_time_step`` is the one that tries hardest to resemble the C version.
-Instead of doing an array copy, it actually notices that you can alternate
+Instead of doing an array copy, it actually notices that one can alternate
between two arrays. This is exactly what the C version does. The
``remove_invalidates`` call is a PyPy specific hack - we hope to remove this
-call in the near future, but in short it promises "I don't have any unbuilt
-intermediates that depend on the value of the argument", which means you don't
-have to compute sub-expressions you're not actually using::
+call in the near future, but, in short, it promises "I don't have any unbuilt
+intermediates that depend on the value of the argument", which means one doesn't
+have to compute sub-expressions one is not actually using::
@@ -120,7 +122,7 @@
This one is the most comparable to the C version.
-``numeric_5_time_step`` does the same thing, but notices you don't have to copy
+``numeric_5_time_step`` does the same thing, but notices one doesn't have to copy
the entire array, it's enough to just copy the edges. This is an optimization
that was not done in the C version::
@@ -158,9 +160,9 @@
the C version (or as fast as we'd like them to be), but we're already much
faster than NumPy on CPython, almost always by more than 2x on this relatively
real-world example. This is not the end though, in fact it's hardly the
-beginning: as we continue work, we hope to make even much better use of the
+beginning! As we continue work, we hope to make even more use of the
high level information that we have. Looking at the generated assembler by
-gcc in this example it's pretty clear we can outperform it, thanks to better
+gcc in this example, it's pretty clear we can outperform it, thanks to better
aliasing information and hence better possibilities for vectorization.
More information about the pypy-commit