[Python-ideas] Type Hinting - Performance booster ?

Sat Dec 27 11:36:11 CET 2014

On Dec 27, 2014, at 8:34, Trent Nelson <trent at snakebite.org> wrote:

> On Sat, Dec 27, 2014 at 01:28:14AM +0100, Andrew Barnert wrote:
>> On Dec 26, 2014, at 23:05, David Mertz <mertz at gnosis.cx> wrote:
>> 
>>> On Fri, Dec 26, 2014 at 1:39 PM, Antoine Pitrou
>>> <solipsis at pitrou.net> wrote:
>>>> On Fri, 26 Dec 2014 13:11:19 -0700 David Mertz <mertz at gnosis.cx>
>>>> wrote:
>>>>> I think the 5-6 year estimate is pessimistic.  Take a look at
>>>>> http://en.wikipedia.org/wiki/Xeon_Phi for some background.
>>>> 
>>>> """Intel Many Integrated Core Architecture or Intel MIC (pronounced
>>>> Mick or Mike[1]) is a *coprocessor* computer architecture"""
>>>> 
>>>> Enough said. It's not a general-purpose chip. It's meant as a
>>>> competitor against the computational use of GPU, not against
>>>> traditional general-purpose CPUs.
>>> 
>>> Yes and no:
>>> 
>>> The cores of Intel MIC are based on a modified version of P54C
>>> design, used in the original Pentium. The basis of the Intel MIC
>>> architecture is to leverage x86 legacy by creating a x86-compatible
>>> multiprocessor architecture that can utilize existing
>>> parallelization software tools. Programming tools include OpenMP,
>>> OpenCL, Cilk/Cilk Plus and specialised versions of Intel's Fortran,
>>> C++ and math libraries.
>>> 
>>> x86 is pretty general purpose, but also yes it's meant to compete
>>> with GPUs too.  But also, there are many projects--including
>>> Numba--that utilize GPUs for "general computation" (or at least to
>>> offload much of the computation).  The distinctions seem to be
>>> blurring in my mind.
>>> 
>>> But indeed, as many people have observed, parallelization is usually
>>> non-trivial, and the presence of many cores is a far different thing
>>> from their efficient utilization.
>> 
>> I think what we're eventually going to see is that optimized, explicit
>> parallelism is very hard, but general-purpose implicit parallelism is
>> pretty easy if you're willing to accept a lot of overhead. When people
>> start writing a lot of code that takes 4x as much CPU but can run on
>> 64 cores instead of 2 and work with a dumb ring cache instead of full
>> coherence, that's when people will start selling 128-core laptops. And
>> it's not going to be new application programming techniques that make
>> that happen, it's going to be things like language-level STM, implicit
>> parallelism libraries, kernel schedulers that can migrate
>> low-utilization processes into low-power auxiliary cores, etc.
> 
> I disagree.  PyParallel works fine with existing programming techniques:

Then what are you disagreeing with? My whole point is that it's not going to be new application programming techniques that make parallelism accessible.

> Just took a screen share of a load test between normal Python 3.3
> release build, and the debugged-up-the-wazzo flaky PyParallel 0.1-ish,
> and it undeniably crushes the competition.  (Then crashes, 'cause you
> can't have it all.)
> 
>        https://www.youtube.com/watch?v=JHaIaOyfldo
> 
> Keep in mind that's a full debug build, but not only that, I've
> butchered every PyObject and added like, 6 more 8-byte pointers to it;
> coupled with excessive memory guard tests at every opportunity that
> result in a few thousand hash tables being probed to check for ptr
> address membership.
> 
> The thing is slooooooww.  And even with all that in place, check out the
> results:

Sure, sloooooww code that's 8x as parallel runs 2.5x as fast. What's held things back for so long is that people insist on code that's almost as fast on 1- or 2-core machines and also scales to 8-core machines. That silly constraint is what's held us back. And now that mainstream machines are 2 to 8 cores instead of 1 to 2, and the code you have to be almost as fast as is still sequential, things are starting to change. Even when things like PyParallel or PyPy's STM aren't optimized at all, they're already winning.