[Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

Nathaniel Smith njs at pobox.com
Fri Apr 11 19:16:04 EDT 2014

On Sat, Apr 12, 2014 at 12:07 AM, Sturla Molden <sturla.molden at gmail.com> wrote:
> On 12/04/14 00:39, Nathaniel Smith wrote:
>> The spawn mode is fine and all, but (a) the presence of something in
>> 3.4 helps only a minority of users, (b) "spawn" is not a full
>> replacement for fork;
> It basically does the same as on Windows. If you want portability to
> Windows, you must abide by these restrictions anyway.

Yes, but "sorry Unix guys, we've decided to take away this nice
feature from you because it doesn't work on Windows" is a really
terrible argument. If it can't be made to work, then fine, but fork
safety is just not *that* much to ask.

>> with large read-mostly data sets it can be a
>> *huge* win to load them into the parent process and then let them be
>> COW-inherited by forked children.
> The thing is that Python reference counts breaks COW fork. This has been
> discussed several times on the Python-dev list. What happens is that as
> soon as the child process updates a refcount, the OS copies the page.
> And because of how Python behaves, this copying of COW-marked pages
> quickly gets excessive. Effectively the performance of os.fork in Python
> will close to a non-COW fork. A suggested solution is to move the
> refcount out of the PyObject struct, and perhaps keep them in a
> dedicated heap. But doing so will be unfriendly to cache.

Yes, it's limited, but again this is not a reason to break it in the
cases where it *does* work. The case where I ran into this was loading
a big language model using SRILM:
This produces a single Python object that references an opaque,
tens-of-gigabytes mess of C++ objects. For this case explicit shared
mem is useless, but fork worked brilliantly.


Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

More information about the NumPy-Discussion mailing list