On Mon, Aug 11, 2014 at 10:46 PM, Matti Picus email@example.com wrote:
Hi Nathaniel. Thanks for your prompt reply. I think numpy is a wonderful project, and you all do a great job moving it forward. If you ask what would my vision for maturing numpy, I would like to see a grouping of linalg matrix-operation functionality into a python level package, exactly the opposite of more tightly tying linalg into the core of numpy.
As I understood it (though I admit Chuck was pretty terse, maybe he'll correct me :-)), what he was proposing was basically just a build system reorganization -- it's much easier to call between C functions that are in the same Python module than C functions that are in different modules, so we end up with lots of boilerplate gunk for the latter. I don't think it would involve any tighter coupling than we already have in practice.
The orthagonality would allow goups like PyOpenCL to reuse the matrix operations on data located off the CPU's RAM, just to give one example; and make it easier for non-numpy developers to create a complete replacement of lapack with other implementations.
I guess I don't really understand what you're suggesting. If we have a separate package that is the same as current np.linalg, then how does that allow PyOpenCL to suddenly run the np.linalg code on the GPU? What kind of re-use are you envisioning? The important kind of re-use that comes to mind for me is that I should be able to write code that can accept either a RAM matrix or a GPU matrix and works the same. But the key feature to enable this is that there should be a single API that works on both types of objects -- e.g. np.dot(a, b) should work even if a, b are on the GPU. But this is exactly what __numpy_ufunc__ is designed to enable, and that has nothing to do with splitting linalg off into a separate package...
And of course if someone has a better idea about how to implement lapack, then they should do that work in the numpy repo so everyone can benefit, not go off and reimplement their own version from scratch that no-one will use :-).
Much of the linalg package would of course be implemented in c or fortran, but the interface to ndarray would use the well-established idea of contiguous matrices with shapes, strides, and a single memory store, supporting only numeric number types.
It's actually possible today for third-party users to add support for third-party dtypes to most linalg operations, b/c most linalg operations are implemented using the numpy ufunc machinery.
I suggested cffi since it provides a convienent and efficient interface to ndarray. Thus python could remain as a thin wrapper over the calls out to c-based libraries much like lapack_lite does today, but at the python level rather that the capi level. Yes, a python-based interface would slows the code down a bit, but I would argue that
- the current state of lapack_litemodule.c and umath_linalg.c.src, with its
myriad of compile-time macros and complex code paths, scares people away from contributing to the ongoing maintenance of the library while tying the code very closely to the lapack routines, and
I agree that simple is better than complex, but I don't see how moving those macros and code paths into a separate package decreases complexity. If anything it would increase complexity, because now we have two repos instead of one, two release schedules instead of one, and n^2 combinations of (linalg version, numpy version) to test against.
- matrices larger than 3x3 or so should be spending most of the computation
time in the underlying lapack/blas library irregardless of whether the interface is python-based or capi-based. Matti
On 10/08/2014 8:00 PM, firstname.lastname@example.org wrote:
Date: Sat, 9 Aug 2014 21:11:19 +0100 From: Nathaniel Smith email@example.com Subject: Re: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas To: Discussion of Numerical Python firstname.lastname@example.org
On Sat, Aug 9, 2014 at 8:35 PM, Matti Picus email@example.com wrote:
Hi. I am working on numpy in pypy. It would be much more challenging for me if you merged more code into the core of numpy,
I can definitely see how numpy changes cause trouble for you, and sympathize. But, can you elaborate on what kind of changes would make your life easier *that also* help make numpy proper better in their own right? Because unfortunately, I don't see how we can reasonably pass up on improvements to numpy if the only justification is to make numpypy's life easier. (I'd also love to see pypy become usable for general numerical work, but not only is it not there now, I don't see how numpypy will ultimately get us there even if we do help it along -- almost none of the ecosystem can get by numpy's python-level APIs alone.) But obviously if there are changes that are mutually beneficial, well then, that's a lot easier to justify :-)