[Python-ideas] solving multi-core Python

Trent Nelson trent at snakebite.org
Thu Jun 25 08:50:52 CEST 2015


On Wed, Jun 24, 2015 at 04:55:31PM -0700, Nathaniel Smith wrote:
> On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre
> <jeanpierreda at gmail.com> wrote:
> > So there's two reasons I can think of to use threads for CPU parallelism:
> >
> > - My thing does a lot of parallel work, and so I want to save on
> > memory by sharing an address space
> >
> > This only becomes an especially pressing concern if you start running
> > tens of thousands or more of workers. Fork also allows this.
> 
> Not necessarily true... e.g., see two threads from yesterday (!) on
> the pandas mailing list, from users who want to perform queries
> against a large data structure shared between threads/processes:
> 
> https://groups.google.com/d/msg/pydata/Emkkk9S9rUk/eh0nfiGR7O0J
> https://groups.google.com/forum/#!topic/pydata/wOwe21I65-I
> ("Are we just screwed on windows?")

    Ironically (not knowing anything about Pandas' implementation
    details other than... "Cython... and NumPy"), there should be
    no difference between getting a Pandas DataFrame available to
    PyParallel and a NumPy ndarray or Cythonized C-struct (like
    datrie).

    The situation Ryan describes is literally the exact situation
    that PyParallel excels at: large reference data structures
    accessible in parallel contexts.

        Trent.


More information about the Python-ideas mailing list