[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

7 May 2020


      On Thu, May 7, 2020 at 2:50 AM Emily Bowman <silverbacknet@gmail.com> wrote:
...
While large object copies are fairly fast -- I wouldn't say trivial, a gigabyte copy will introduce noticeable lag when processing enough of them -- the flip side of having large objects is that you want to avoid having so many copies that you run into memory pressure and the dreaded swapping. A multiprocessing engine that's fully parallel, every fork takes chunks of data and does everything needed to them won't gain much from zero-copy as long as memory limits aren't hit. But a pipeline of processing would involve many copies, especially if you have a central dispatch thread that passes things from stage to stage. This is a big deal where stages may take longer or slower at any time, especially in low-latency applications, like video conferencing, where dispatch needs the flexibility to skip steps or add extra workers to shove a frame out the door, and using signals to interact with separate processes to tell them to do so is more latency and overhead.
Not that I'm recommending someone go out and make a pure Python videoconferencing unit right now, but it's a use case I'm familiar with. (Since I use Python to test new ideas before converting them into C++.)
Thanks for the insight, Emily (and everyone else).  It's really
helpful to get many different expert perspectives on the matter.  I am
definitely not an expert on big-data/high-performance use cases so,
personally, I rely on folks like Nathaniel, Travis Oliphant, and
yourself.  The more, the better. :)  Again, thanks!

-eric

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

Eric Snow