[Python-ideas] solving multi-core Python
Trent Nelson
trent at snakebite.org
Thu Jun 25 08:50:52 CEST 2015
On Wed, Jun 24, 2015 at 04:55:31PM -0700, Nathaniel Smith wrote:
> On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre
> <jeanpierreda at gmail.com> wrote:
> > So there's two reasons I can think of to use threads for CPU parallelism:
> >
> > - My thing does a lot of parallel work, and so I want to save on
> > memory by sharing an address space
> >
> > This only becomes an especially pressing concern if you start running
> > tens of thousands or more of workers. Fork also allows this.
>
> Not necessarily true... e.g., see two threads from yesterday (!) on
> the pandas mailing list, from users who want to perform queries
> against a large data structure shared between threads/processes:
>
> https://groups.google.com/d/msg/pydata/Emkkk9S9rUk/eh0nfiGR7O0J
> https://groups.google.com/forum/#!topic/pydata/wOwe21I65-I
> ("Are we just screwed on windows?")
Ironically (not knowing anything about Pandas' implementation
details other than... "Cython... and NumPy"), there should be
no difference between getting a Pandas DataFrame available to
PyParallel and a NumPy ndarray or Cythonized C-struct (like
datrie).
The situation Ryan describes is literally the exact situation
that PyParallel excels at: large reference data structures
accessible in parallel contexts.
Trent.
More information about the Python-ideas
mailing list