[pypy-dev] support for 64-bit processors and eliminating global state

Wed Sep 30 20:19:06 CEST 2009

Maciej Fijalkowski wrote:
> Hi Leonardo. I think you're not reading this mail in details, let me explain.
>
> On Wed, Sep 30, 2009 at 11:16 AM, Leonardo Santagada
> <santagada at gmail.com> wrote:
> It's not that many hours needed to have 64bit JIT as far as I know.
> I did a lot of refactoring recently so it should be much easier.
> Also we don't have a 64bit buildbot, which means 64bit support might
> rot over time,
> we don't know and it's not officially supported.
>   
It's great to hear that you are already working in the 64-bit 
direction.  Most modern laptops have 64-bit compatible chips.  My 
two-year old centrino duo does.  A dual boot solution (e.g. with Ubunto 
for x86_64) can do the trick for an inexpensive development 
environment.  That's not the same as a buildbot though.

Some of the original 64-bit processors are nearing retirement age 
though, so perhaps some kind soul may see this note and volunteer an old 
system that has been replaced to support pypy-dev.  Just sayin' it's 
important.  Glad you seem to think so too.
>> The GIL in pypy is only there because no one proposed anything to
>> change that, pypy already does not depend on reference counting but
>> can use a garbage collector so it is probably way easier to change
>> than CPython.
>>     
>
> It's true that we don't have a good story here and we need one. Something
> a'la Jython would work (unlike in CPython), but it's work.
>   
The last time I looked, Hoard didn't support x86_64 although it did seem 
to work for threaded environments fairly efficiently if I recall.  
Having a separate arena for each thread (or each virtual processor) 
helps to avoid a lot of locking for frequent/small allocations in a VM.  
That may mean factoring out the allocation so that it calls something 
like myalloc(pool,size) rather than just malloc(size).  I read that pypy 
was trying to factor out the GC code to support multiple back-ends.  
Having an API that supports multiple concurrent allocator pools can be 
useful in that regard.

Similarly, a JIT can be modularized so as not to depend on globals, but 
have a JitContext structure:

    jit_xxx(struct JitContext *jc, ...)

That allows jitting to be going on in multiple threads at once.  I 
looked at libjit and it didn't have that structure, meaning that jit 
processing of functions was a potential bottleneck.  I haven't got deep 
enough into pypy yet to know whether or not that is the case for you folks.

In fact, I'd like to encourage the use of a global-less coding style for 
the sake of improved parallelization.  Every global is another reason 
for a GIL.
>> I haven't read the paper but pypy does already have a JIT, maybe if
>> you are interested in it you can read more on the pypy blog http://morepypy.blogspot.com/
>> . Probably someone with more experience with both pypy and the JIT is
>> going to answer this email so I will not try to explain it in here.
>>
>>     
I'm trying to get the authors to post the paper since it has already 
been presented.  When they do I'll forward a link.
>
> Note that's not precisely what Jeff wants. General purpose JIT is nice, but
> it's rather hard to imagine how it'll generate efficient CUDA code
> automatically,
> without hints from the user. Since PyPy actually has a jit-generator, it should
> be far easier to implement this in PyPy than somewhere else (you can write
> code that is "interpreter" and JIT will be automatically created for
> it), however
> it's still work to get nice paralellizable (or parallelizing?) framework.
>   
Yes.  The SEJITS approach can be used even with a Python that doesn't 
have a JIT as long as it has a suitable foreign function interface.  The 
trick is to interpose in the AST processing to recognize and handle 
"selective" patterns in the tree.  The current system actually generates 
C-code on-the fly then compiles and links it in with FFI hooks so that 
subsequent calls can access it more directly.  This is obviously only 
worth doing for code for which the native code is substantially faster 
and/or will be called sufficiently often.

If I had the cash on hand I would gladly support your work with a 
donation.  Unfortunately I don't have sufficient personal resources nor 
access to corporate funds.  (As a research group, we get our funds from 
outside donations, not dole it out!)  I think it's a great project 
though, and if cheer leading counts, you definitely have my support in 
that regard.