[pypy-dev] PyParallel-style threads

Mon Jun 20 12:42:09 EDT 2016

Let's review what forking does in Python from a 10,000ft view:
1) It pickles the current state of the process.
2) Starts a new Python process
3) Unpickles the current state of the process
There are a lot more memory allocations when forking comparing to starting
a new thread. That makes forking unsuitable for small workloads.
I'm guessing that PyPy does not save the trace/optimized ASM of the forked
process in the parent process so each time you start a new process you have
to trace again which makes small workloads even less suitable and even
large processing batches will need to be traced again.

In case of pre-forking servers, each PyPy instance has to trace and
optimize the same code when there is no reason. Threads would allow us to
reduce warmup time for this case. It will also consume less memory.

‫בתאריך יום ב׳, 20 ביוני 2016 ב-17:47 מאת ‪Maciej Fijalkowski‬‏ <‪
fijall at gmail.com‬‏>:‬

> so quick question - what's the win compared to multiple processes?
>
> On Mon, Jun 20, 2016 at 8:51 AM, Omer Katz <omer.drow at gmail.com> wrote:
> > Hi all,
> > There was an experiment based on CPython's code called PyParallel that
> > allows running threads in parallel without STM and modifying source code
> of
> > both Python and C extensions. The only limitation is that they disallow
> > mutation of global state in parallel context.
> > I briefly mentioned it before on PyPy's freenode channel.
> > I'd like to discuss why the approach is useful, how it can benefit PyPy
> > users and how can it be implemented.
> > Allowing to run in parallel without mutating global state can help
> servers
> > use each thread to handle a request. It can also allow to log in
> parallel or
> > send an HTTP request (or an AMQP message) without sharing the response
> with
> > the main thread. This is useful in some cases and since PyParallel
> managed
> > to keep the same semantics it (shouldn't) break CPyExt.
> > If we keep to the following rules:
> >
> > No global state mutation is allowed
> > No new keywords or code modifications required
> > No CPyExt code is allowed (for now)
> >
> > I believe that users can somewhat benefit from this implementation if
> done
> > correctly.
> > As for implementation, if we can trace the code running in the thread and
> > ensure it's not mutating global state and that CPyExt is never used
> during
> > the thread's course we can simply release the GIL when such a thread is
> run.
> > That requires less knowledge than using STM and less code modifications.
> > However I think that attempting to do so will introduce the same issue
> with
> > caching traces (Armin am I correct here?).
> >
> > As for CPyExt, we could copy the same code modifications that PyParallels
> > did but I suspect that it will be so slow that the benefit of running in
> > parallel will be completely lost for all cases but very long threads.
> >
> > Is what I'm suggesting even possible? How challenging will it be?
> >
> > Thanks,
> > Omer Katz.
> >
> > _______________________________________________
> > pypy-dev mailing list
> > pypy-dev at python.org
> > https://mail.python.org/mailman/listinfo/pypy-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20160620/479a89cc/attachment.html>