
Hi, I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x. Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform. However, I now have trouble in providing easy ways to use my project in windows. All the free windows distros for scientific python that I have looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project. All this seems somehow inconvenient. In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code. Would it be possible to have a win distro of scipy which provides some pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says "IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported." Can someone please explain why or link to an explanation? Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed. Many thanks in advance!

I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics. On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari < sergio.callegari@gmail.com> wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project in windows. All the free windows distros for scientific python that I have looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides some pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough. Dag Sverre
On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari <sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com>> wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project in windows. All the free windows distros for scientific python that I have looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides some pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
On 02/18/2013 05:28 PM, Dag Sverre Seljebotn wrote: the first place. That
seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Argh. I meant: The statement was that directly (on the Cython level) calling cblas is 10x-20x **faster** than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are small enough. Dag Sverre
Dag Sverre
On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari <sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com>> wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project in windows. All the free windows distros for scientific python that I have looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides some pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

But I'd hope that the overhead for going through the wrappers is constant, rather than dependent on the size, so that for large matrices you'd get essentially equivalent performance? On Mon, Feb 18, 2013 at 8:28 AM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:
On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Dag Sverre
On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari <sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com>> wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project
in
windows. All the free windows distros for scientific python that I
have
looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides
some
pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 02/18/2013 05:29 PM, rif wrote:
But I'd hope that the overhead for going through the wrappers is constant, rather than dependent on the size, so that for large matrices you'd get essentially equivalent performance?
That is correct. Ah, so then the quality of the BLAS matters much less in this situation. But if you have a code that is used with either many small or fewer large matrices, then a compiled loop over a good BLAS is a good compromise without splitting up the code paths. DS
On Mon, Feb 18, 2013 at 8:28 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no <mailto:d.s.seljebotn@astro.uio.no>> wrote:
On 02/18/2013 05:26 PM, rif wrote: > I have no answer to the question, but I was curious as to why directly > calling the cblas would be 10x-20x slower in the first place. That > seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Dag Sverre
> > > On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari > <sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com> <mailto:sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com>>> wrote: > > Hi, > > I have a project that includes a cython script which in turn does > some direct > access to a couple of cblas functions. This is necessary, since some > matrix > multiplications need to be done inside a tight loop that gets called > thousands > of times. Speedup wrt calling scipy.linalg.blas.cblas routines is > 10x to 20x. > > Now, all this is very nice on linux where the setup script can > assure that the > cython code gets linked with the atlas dynamic library, which is the > same > library that numpy and scipy link to on this platform. > > However, I now have trouble in providing easy ways to use my project in > windows. All the free windows distros for scientific python that I have > looked at (python(x,y) and winpython) seem to repackage the windows > version of > numpy/scipy as it is built in the numpy/scipy development sites. > These appear > to statically link atlas inside some pyd files. So I get no atlas > to link > against, and I have to ship an additional pre-built atlas with my > project. > > All this seems somehow inconvenient. > > In the end, when my code runs, due to static linking I get 3 > replicas of 2 > slightly different atlas libs in memory. One coming with > _dotblas.pyd in numpy, > another one with cblas.pyd or fblas.pyd in scipy. And the last one > as the one > shipped in my code. > > Would it be possible to have a win distro of scipy which provides some > pre built atlas dlls, and to have numpy and scipy dynamically link > to them? > This would save memory and also provide a decent blas to link to for > things > done in cython. But I believe there must be some problem since the > scipy site > says > > "IMPORTANT: NumPy and SciPy in Windows can currently only make use > of CBLAS and > LAPACK as static libraries - DLLs are not supported." > > Can someone please explain why or link to an explanation? > > Unfortunately, not having a good, prebuilt and cheap blas > implementation in > windows is really striking me as a severe limitation, since you > loose the > ability to prototype in python/scipy and then move to C or Cython > the major > bottlenecks to achieve speed. > > Many thanks in advance! > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> <mailto:NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

18.02.2013 19:20, Dag Sverre Seljebotn kirjoitti:
On 02/18/2013 05:29 PM, rif wrote:
But I'd hope that the overhead for going through the wrappers is constant, rather than dependent on the size, so that for large matrices you'd get essentially equivalent performance?
That is correct.
Ah, so then the quality of the BLAS matters much less in this situation.
But if you have a code that is used with either many small or fewer large matrices, then a compiled loop over a good BLAS is a good compromise without splitting up the code paths.
I'm open to suggestions on providing low-level Cython interface to BLAS and LAPACK in scipy.linalg. I think this is possible with Cython --- we already have scipy.interpolate talking to scipy.spatial, so why not also 3rd party modules. Pull requests are accepted --- there are several interesting Cython BLAS/LAPACK projects though. -- Pauli Virtanen

On 02/18/2013 06:48 PM, Pauli Virtanen wrote:
18.02.2013 19:20, Dag Sverre Seljebotn kirjoitti:
On 02/18/2013 05:29 PM, rif wrote:
But I'd hope that the overhead for going through the wrappers is constant, rather than dependent on the size, so that for large matrices you'd get essentially equivalent performance?
That is correct.
Ah, so then the quality of the BLAS matters much less in this situation.
But if you have a code that is used with either many small or fewer large matrices, then a compiled loop over a good BLAS is a good compromise without splitting up the code paths.
I'm open to suggestions on providing low-level Cython interface to BLAS and LAPACK in scipy.linalg. I think this is possible with Cython --- we already have scipy.interpolate talking to scipy.spatial, so why not also 3rd party modules.
Pull requests are accepted --- there are several interesting Cython BLAS/LAPACK projects though.
I think there should be a new project, pylapack or similar, for this, outside of NumPy and SciPy. NumPy and SciPy could try to import it, and if found, fetch a function pointer table. (If not found, just stay with what has been working for a decade.) The main motivation would be to decouple building NumPy from linking with BLAS and have that all happen at run-time. But a Cython interface would follow naturally too. I've wanted to start on this for some time but the Hashdist visions got bigger and my PhD more difficult... As for the interesting Cython BLAS/LAPACK projects, the ones I've seen (Tokyo, my own work on SciPy for .NET) isn't templated enough for my taste. I'd start with writing a YAML file describing BLAS/LAPACK, then generate the Cython code and wrappers from that, since the APIs are so regular.. Dag Sverre

18.02.2013 20:41, Dag Sverre Seljebotn kirjoitti: [clip]
I think there should be a new project, pylapack or similar, for this, outside of NumPy and SciPy. NumPy and SciPy could try to import it, and if found, fetch a function pointer table. (If not found, just stay with what has been working for a decade.)
The main motivation would be to decouple building NumPy from linking with BLAS and have that all happen at run-time. But a Cython interface would follow naturally too.
The main motivation for sticking it into Scipy would be a bit different --- since the build and distribution infra is in place for Scipy, putting it in scipy.linalg makes it more easily available for a larger number of people than some random 3-rd party module. We already ship low-level f2py bindings, so I don't see a reason for not shipping Cython ones too. -- Pauli Virtanen

On 02/18/2013 09:23 PM, Pauli Virtanen wrote:
18.02.2013 20:41, Dag Sverre Seljebotn kirjoitti: [clip]
I think there should be a new project, pylapack or similar, for this, outside of NumPy and SciPy. NumPy and SciPy could try to import it, and if found, fetch a function pointer table. (If not found, just stay with what has been working for a decade.)
The main motivation would be to decouple building NumPy from linking with BLAS and have that all happen at run-time. But a Cython interface would follow naturally too.
The main motivation for sticking it into Scipy would be a bit different --- since the build and distribution infra is in place for Scipy, putting it in scipy.linalg makes it more easily available for a larger number of people than some random 3-rd party module.
Right. In my case it's rather the case that the build and distribution infra is *not* in place for my needs :-). Yes, it was definitely from the POV of power-users on HPC clusters who want to tinker with the build, not as wide reach as possible.
We already ship low-level f2py bindings, so I don't see a reason for not shipping Cython ones too.
Well, in that case (and for the information of anybody else interested, Pauli already knows this), the fwrap-generated files from the SciPy .NET port may be a good starting point, https://github.com/enthought/scipy-refactor/blob/refactor/scipy/linalg/flapa... Dag Sverre

On 18.02.2013 21:23, Pauli Virtanen wrote:
18.02.2013 20:41, Dag Sverre Seljebotn kirjoitti: [clip]
I think there should be a new project, pylapack or similar, for this, outside of NumPy and SciPy. NumPy and SciPy could try to import it, and if found, fetch a function pointer table. (If not found, just stay with what has been working for a decade.)
The main motivation would be to decouple building NumPy from linking with BLAS and have that all happen at run-time. But a Cython interface would follow naturally too.
The main motivation for sticking it into Scipy would be a bit different --- since the build and distribution infra is in place for Scipy, putting it in scipy.linalg makes it more easily available for a larger number of people than some random 3-rd party module.
We already ship low-level f2py bindings, so I don't see a reason for not shipping Cython ones too.
I find Dag's approach more appealing. SciPy can be problematic (windows 64-bit) and if one could offer access to the linear algebra functions without needing SciPy I would certainly prefer it. Armando

18.02.2013 23:29, V. Armando Sole kirjoitti: [clip]
I find Dag's approach more appealing.
SciPy can be problematic (windows 64-bit) and if one could offer access to the linear algebra functions without needing SciPy I would certainly prefer it.
Well, the two approaches are not exclusive. Moreover, there already exist Cython wrappers for BLAS that you can just take and use. Windows 64-bit is probably problematic for everyone who wants to provide binaries --- I don't think there's a big difference in difficulty in making binaries for a light Cython wrapper to BLAS/LAPACK vs. providing the whole of Scipy :) -- Pauli Virtanen

On 18.02.2013 22:47, Pauli Virtanen wrote:
18.02.2013 23:29, V. Armando Sole kirjoitti: [clip]
I find Dag's approach more appealing.
SciPy can be problematic (windows 64-bit) and if one could offer access to the linear algebra functions without needing SciPy I would certainly prefer it.
Well, the two approaches are not exclusive. Moreover, there already exist Cython wrappers for BLAS that you can just take and use.
Please correct me if I am wrong. I assume those wrappers force you to provide the shared libraries so the problem is still there. If not, I would really be interested on getting one of those wrappers :-) It is really nice to provide extensions receiving the pointer to the function to be used even under Linux: the extension does not need to be compiled each time the user changes/updates shared libraries. It is really nice to find your C extension is slow, you find ATLAS is not installed, you install it and your extension becomes very fast without needing to recompile.
Windows 64-bit is probably problematic for everyone who wants to provide binaries --- I don't think there's a big difference in difficulty in making binaries for a light Cython wrapper to BLAS/LAPACK vs. providing the whole of Scipy :)
I have an Intel Fortran compiler license just to be able to provide windows 64-bit frozen binaries and extension modules :-) but that is not enough: - If provide the MKL dll's a person willing to re-distribute the module also needs an MKL license - If I do not provide the MKL dll's the extension module is useless For the time being the best solution I have found is to use pointers to the wrapped functions in SciPy: the extension module use whatever library installed on the target system and I do not need to provide the shared libraries. It is just a pity that having the libraries in numpy, one cannot access them while one can do it in SciPy. Therefore I found Dag's approach quite nice: numpy and SciPy using the linear algebra functions via a third package providing all the needed pointers (or at least having that package available in first instance). Best regards, Armando

On Mon, Feb 18, 2013 at 9:28 AM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:
On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
For really small matrices, not using blas at all provides another speedup. Chuck
Dag Sverre
On Mon, Feb 18, 2013 at 7:38 AM, Sergio Callegari <sergio.callegari@gmail.com <mailto:sergio.callegari@gmail.com>> wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project
in
windows. All the free windows distros for scientific python that I
have
looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides
some
pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Dag Sverre Seljebotn <d.s.seljebotn <at> astro.uio.no> writes:
On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Dag Sverre
Soory for expressing myself badly. I need to call cblas directly from cython, because it is faster. I use matrix multiplication in a tight loop. Let the speed with the standard dot be 100, Speed using the scipy.linalg.blas routines is 200 And speed calling directly atlas from cython is 2000 Which is reasonable, since this avoids any type checking. The point is that I need to ship an extra atlas lib to do so in windows, notwithstanding the fact that numpy/scipy incorporate atlas in the windows build. I was wondering if there is a way to build numpy/scipy with atlas dynamically linked into it, in order to be able to share the atlas libs between my code and scipy.

On 02/20/2013 10:18 AM, Sergio wrote:
Dag Sverre Seljebotn <d.s.seljebotn <at> astro.uio.no> writes:
On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Dag Sverre
Soory for expressing myself badly.
I need to call cblas directly from cython, because it is faster.
I use matrix multiplication in a tight loop.
Let the speed with the standard dot be 100,
Speed using the scipy.linalg.blas routines is 200
And speed calling directly atlas from cython is 2000
Which is reasonable, since this avoids any type checking.
The point is that I need to ship an extra atlas lib to do so in windows, notwithstanding the fact that numpy/scipy incorporate atlas in the windows build.
I was wondering if there is a way to build numpy/scipy with atlas dynamically linked into it, in order to be able to share the atlas libs between my code and scipy.
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.) Dag Sverre

Hi, We also have the same problem for Theano. Having one reusable blas on windows would be useful to many project. Also, if possible try to make it accesible from C,C++ too. Not just cython. Fred On Feb 20, 2013 5:15 AM, "Dag Sverre Seljebotn" <d.s.seljebotn@astro.uio.no> wrote:
On 02/20/2013 10:18 AM, Sergio wrote:
Dag Sverre Seljebotn <d.s.seljebotn <at> astro.uio.no> writes:
On 02/18/2013 05:26 PM, rif wrote:
I have no answer to the question, but I was curious as to why directly calling the cblas would be 10x-20x slower in the first place. That seems surprising, although I'm just learning about python numerics.
The statement was that directly (on the Cython level) calling cblas is 10x-20x slower than going through the (slow) SciPy wrapper routines. That makes a lot of sense if the matrices are smalle nough.
Dag Sverre
Soory for expressing myself badly.
I need to call cblas directly from cython, because it is faster.
I use matrix multiplication in a tight loop.
Let the speed with the standard dot be 100,
Speed using the scipy.linalg.blas routines is 200
And speed calling directly atlas from cython is 2000
Which is reasonable, since this avoids any type checking.
The point is that I need to ship an extra atlas lib to do so in windows, notwithstanding the fact that numpy/scipy incorporate atlas in the windows build.
I was wondering if there is a way to build numpy/scipy with atlas dynamically linked into it, in order to be able to share the atlas libs between my code and scipy.
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)
Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.) Hi, maybe not officially supported, so the integration into numpy is a bit tricky (after long tries I had success with exporting BLAS and LAPACK environment variables prior to running setup.py, if I remember correctly) but IMHE one can use OpenBlas (or in my case it's older version GotoBlas) with (sci|num)py without problems. So I reccomend to use Open/GotoBlas too.
Best, Matyas Novak

On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák <logik@centrum.cz> wrote:
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)
It look slike OpenBLAS is BSD-licensed, and thus compatible with numpy/sciy. It there a reason (other than someone having to do the work) it could not be used as the "standard" BLAS for numpy? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 02/22/2013 05:52 PM, Chris Barker - NOAA Federal wrote:
On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák <logik@centrum.cz> wrote:
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)
It look slike OpenBLAS is BSD-licensed, and thus compatible with numpy/sciy.
It there a reason (other than someone having to do the work) it could not be used as the "standard" BLAS for numpy?
This was discussed some weeks ago (I think the thread title contains openblas). IIRC it was just that somebody needs to do the work but don't take my word for it. Dag Svere

On 22 Feb 2013 16:53, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák <logik@centrum.cz> wrote:
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY
AFAIK.)
It look slike OpenBLAS is BSD-licensed, and thus compatible with
numpy/sciy.
It there a reason (other than someone having to do the work) it could not be used as the "standard" BLAS for numpy?
no reason, and it actually works quite nicely. Bento supports it, at least on Mac/linux. David
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I just read a web page on how to embed python in an application[1]. They explain that we can keep the symbol exported event if we statically link the BLAS library in scipy. This make me think we could just change how we compile the lib that link with BLAS and we will be able to reuse it for other project! But I didn't played much with this type of thing. Do someone have more information? Do you think it would be useful? Fred [1] http://docs.python.org/2/extending/embedding.html#linking-requirements On Fri, Feb 22, 2013 at 3:38 PM, David Cournapeau <cournape@gmail.com> wrote:
On 22 Feb 2013 16:53, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák <logik@centrum.cz> wrote:
You could also look into OpenBLAS, which is easier to build and generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)
It look slike OpenBLAS is BSD-licensed, and thus compatible with numpy/sciy.
It there a reason (other than someone having to do the work) it could not be used as the "standard" BLAS for numpy?
no reason, and it actually works quite nicely. Bento supports it, at least on Mac/linux.
David
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Sergio, I faced a similar problem one year ago. I solved it writing a C function receiving a pointer to the relevant linear algebra routine I needed. Numpy does not offers the direct access to the underlying library functions, but scipy does it: from scipy.linalg.blas import fblas dgemm = fblas.dgemm._cpointer sgemm = fblas.sgemm._cpointer So I wrote a small extension receiving the data to operate with and the relevant pointer. The drawback of the approach is the dependency on scipy but it works nicely. Armando On 18/02/2013 16:38, Sergio Callegari wrote:
Hi,
I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project in windows. All the free windows distros for scientific python that I have looked at (python(x,y) and winpython) seem to repackage the windows version of numpy/scipy as it is built in the numpy/scipy development sites. These appear to statically link atlas inside some pyd files. So I get no atlas to link against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2 slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy, another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one shipped in my code.
Would it be possible to have a win distro of scipy which provides some pre built atlas dlls, and to have numpy and scipy dynamically link to them? This would save memory and also provide a decent blas to link to for things done in cython. But I believe there must be some problem since the scipy site says
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in windows is really striking me as a severe limitation, since you loose the ability to prototype in python/scipy and then move to C or Cython the major bottlenecks to achieve speed.
Many thanks in advance!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 22.02.2013 19:54, Sergio Callegari wrote:
from scipy.linalg.blas import fblas dgemm = fblas.dgemm._cpointer sgemm = fblas.sgemm._cpointer
OK, but this gives me a PyCObject. How do I make it a function pointer of the correct type in cython?
In cython I do not know it. I coded it directly in C. In my code I receive the pointer in input2. The relevant part is: PyObject *input1; PyObject *input2 = NULL; /*pointer to dgemm function */ void * gemm_pointer = NULL; /** statements **/ if (!PyArg_ParseTuple(args, "OO", &input1, &input2)) return NULL; if (input2 != NULL){ #if PY_MAJOR_VERSION >= 3 if (PyCapsule_CheckExact(input2)) gemm_pointer = PyCapsule_GetPointer(input2, NULL); #else gemm_pointer = PyCObject_AsVoidPtr(input2); #endif if (gemm_pointer != NULL) { /* release the GIL */ Py_BEGIN_ALLOW_THREADS /* your function call here */ Py_END_ALLOW_THREADS } } Best regards, Armando
participants (12)
-
"V. Armando Solé"
-
Charles R Harris
-
Chris Barker - NOAA Federal
-
Dag Sverre Seljebotn
-
David Cournapeau
-
Frédéric Bastien
-
Matyáš Novák
-
Pauli Virtanen
-
rif
-
Sergio
-
Sergio Callegari
-
V. Armando Sole