[pypy-dev] PyParallel-style threads
Maciej Fijalkowski
fijall at gmail.com
Mon Jun 20 13:16:34 EDT 2016
no, you misunderstood me:
if you want to use multiple processes, you not gonna start a new one
per thing to do. You'll have a process pool and use that. Also, if you
don't use multiprocessing, you don't use pickling, you use something
sane for communication. The PyParallels essentially allows read-only
access to the global state, but read-only is ill defined and ill
enforced (especially in the case of cpy extensions) in Python. So what
do you get as opposed to multiple processing?
On Mon, Jun 20, 2016 at 6:42 PM, Omer Katz <omer.drow at gmail.com> wrote:
> Let's review what forking does in Python from a 10,000ft view:
> 1) It pickles the current state of the process.
> 2) Starts a new Python process
> 3) Unpickles the current state of the process
> There are a lot more memory allocations when forking comparing to starting a
> new thread. That makes forking unsuitable for small workloads.
> I'm guessing that PyPy does not save the trace/optimized ASM of the forked
> process in the parent process so each time you start a new process you have
> to trace again which makes small workloads even less suitable and even large
> processing batches will need to be traced again.
>
> In case of pre-forking servers, each PyPy instance has to trace and optimize
> the same code when there is no reason. Threads would allow us to reduce
> warmup time for this case. It will also consume less memory.
>
> בתאריך יום ב׳, 20 ביוני 2016 ב-17:47 מאת Maciej Fijalkowski
> <fijall at gmail.com>:
>>
>> so quick question - what's the win compared to multiple processes?
>>
>> On Mon, Jun 20, 2016 at 8:51 AM, Omer Katz <omer.drow at gmail.com> wrote:
>> > Hi all,
>> > There was an experiment based on CPython's code called PyParallel that
>> > allows running threads in parallel without STM and modifying source code
>> > of
>> > both Python and C extensions. The only limitation is that they disallow
>> > mutation of global state in parallel context.
>> > I briefly mentioned it before on PyPy's freenode channel.
>> > I'd like to discuss why the approach is useful, how it can benefit PyPy
>> > users and how can it be implemented.
>> > Allowing to run in parallel without mutating global state can help
>> > servers
>> > use each thread to handle a request. It can also allow to log in
>> > parallel or
>> > send an HTTP request (or an AMQP message) without sharing the response
>> > with
>> > the main thread. This is useful in some cases and since PyParallel
>> > managed
>> > to keep the same semantics it (shouldn't) break CPyExt.
>> > If we keep to the following rules:
>> >
>> > No global state mutation is allowed
>> > No new keywords or code modifications required
>> > No CPyExt code is allowed (for now)
>> >
>> > I believe that users can somewhat benefit from this implementation if
>> > done
>> > correctly.
>> > As for implementation, if we can trace the code running in the thread
>> > and
>> > ensure it's not mutating global state and that CPyExt is never used
>> > during
>> > the thread's course we can simply release the GIL when such a thread is
>> > run.
>> > That requires less knowledge than using STM and less code modifications.
>> > However I think that attempting to do so will introduce the same issue
>> > with
>> > caching traces (Armin am I correct here?).
>> >
>> > As for CPyExt, we could copy the same code modifications that
>> > PyParallels
>> > did but I suspect that it will be so slow that the benefit of running in
>> > parallel will be completely lost for all cases but very long threads.
>> >
>> > Is what I'm suggesting even possible? How challenging will it be?
>> >
>> > Thanks,
>> > Omer Katz.
>> >
>> > _______________________________________________
>> > pypy-dev mailing list
>> > pypy-dev at python.org
>> > https://mail.python.org/mailman/listinfo/pypy-dev
>> >
More information about the pypy-dev
mailing list