Re: [Python-ideas] Type Hinting - Performance booster ?

27 Dec 2014

      On Sat, Dec 27, 2014 at 01:28:14AM +0100, Andrew Barnert wrote:
...
On Dec 26, 2014, at 23:05, David Mertz  wrote:
...
On Fri, Dec 26, 2014 at 1:39 PM, Antoine Pitrou
 wrote:
...
On Fri, 26 Dec 2014 13:11:19 -0700 David Mertz 
wrote:
...
I think the 5-6 year estimate is pessimistic.  Take a look at
http://en.wikipedia.org/wiki/Xeon_Phi for some background.
"""Intel Many Integrated Core Architecture or Intel MIC (pronounced
Mick or Mike[1]) is a *coprocessor* computer architecture"""
Enough said. It's not a general-purpose chip. It's meant as a
competitor against the computational use of GPU, not against
traditional general-purpose CPUs.
Yes and no:
The cores of Intel MIC are based on a modified version of P54C
design, used in the original Pentium. The basis of the Intel MIC
architecture is to leverage x86 legacy by creating a x86-compatible
multiprocessor architecture that can utilize existing
parallelization software tools. Programming tools include OpenMP,
OpenCL, Cilk/Cilk Plus and specialised versions of Intel's Fortran,
C++ and math libraries.
x86 is pretty general purpose, but also yes it's meant to compete
with GPUs too.  But also, there are many projects--including
Numba--that utilize GPUs for "general computation" (or at least to
offload much of the computation).  The distinctions seem to be
blurring in my mind.
But indeed, as many people have observed, parallelization is usually
non-trivial, and the presence of many cores is a far different thing
from their efficient utilization.
I think what we're eventually going to see is that optimized, explicit
parallelism is very hard, but general-purpose implicit parallelism is
pretty easy if you're willing to accept a lot of overhead. When people
start writing a lot of code that takes 4x as much CPU but can run on
64 cores instead of 2 and work with a dumb ring cache instead of full
coherence, that's when people will start selling 128-core laptops. And
it's not going to be new application programming techniques that make
that happen, it's going to be things like language-level STM, implicit
parallelism libraries, kernel schedulers that can migrate
low-utilization processes into low-power auxiliary cores, etc.
I disagree.  PyParallel works fine with existing programming techniques:

Just took a screen share of a load test between normal Python 3.3
release build, and the debugged-up-the-wazzo flaky PyParallel 0.1-ish,
and it undeniably crushes the competition.  (Then crashes, 'cause you
can't have it all.)

        https://www.youtube.com/watch?v=JHaIaOyfldo

Keep in mind that's a full debug build, but not only that, I've
butchered every PyObject and added like, 6 more 8-byte pointers to it;
coupled with excessive memory guard tests at every opportunity that
result in a few thousand hash tables being probed to check for ptr
address membership.

The thing is slooooooww.  And even with all that in place, check out the
results:

Python33:

Running 10s test @ http://192.168.1.15:8000/index.html

  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.69ms   11.59ms  27.93ms   52.76%
    Req/Sec   222.14    234.53     1.60k    86.91%
  Latency Distribution
     50%    5.67ms
     75%   26.75ms
     90%   27.36ms
     99%   27.93ms
  16448 requests in 10.00s, 141.13MB read
  Socket errors: connect 0, read 7, write 0, timeout 0

Requests/sec:   1644.66
Transfer/sec:     14.11MB

PyParallel v0.1, exploiting all cores:
Running 10s test @ http://192.168.1.15:8080/index.html

  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.32ms    2.29ms  27.57ms   92.89%
    Req/Sec   540.82    154.01     0.89k    75.34%
  Latency Distribution
     50%    1.68ms
     75%    2.00ms
     90%    3.57ms
     99%   11.26ms
  40828 requests in 10.00s, 350.47MB read
Requests/sec:   4082.66
Transfer/sec:     35.05MB

~2.5 times improvement even with all its warts.  And it's still
not even close to being loaded enough -- 35% of a gigabit link
being used and about half core use.  No reason it couldn't do
100,000 requests/s.

Recent thread on python-ideas with a bit more information:

https://mail.python.org/pipermail/python-ideas/2014-November/030196.html

Core concepts: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite...

        Trent.

Re: [Python-ideas] Type Hinting - Performance booster ?

Trent Nelson