
Hello everyone, I've done some work on making a new version of Numexpr that would fix some of the limitations of the original virtual machine with regards to data types and operation/function count. Basically I re-wrote the Python and C sides to use 4-byte words, instead of null-terminated strings, for operations and passing types. This means the number of operations and types isn't significantly limited anymore. Francesc Alted suggested I should come here and get some advice from the community. I wrote a short proposal on the Wiki here: https://github.com/pydata/numexpr/wiki/Numexpr-3.0-Branch-Overview One can see my branch here: https://github.com/robbmcleod/numexpr/tree/numexpr-3.0 If anyone has any comments they'd be welcome. Questions from my side for the group: 1.) Numpy casting: I downloaded the Numpy source and after browsing it seems the best approach is probably to just use numpy.core.numerictypes.find_common_type? 2.) Can anyone foresee any issues with casting build-in Python types (i.e. float and integer) to their OS dependent numpy equivalents? Numpy already seems to do this. 3.) Is anyone enabling the Intel VML library? There are a number of comments in the code that suggest it's not accelerating the code. It also seems to cause problems with bundling numexpr with cx_freeze. 4.) I took a stab at converting from distutils to setuputils but this seems challenging with numpy as a dependency. I wonder if anyone has tried monkey-patching so that setup.py build_ext uses distutils and then pass the interpreter.pyd/so as a data file, or some other such chicanery? (I was going to ask about attaching a debugger, but I just noticed: https://wiki.python.org/moin/DebuggingWithGdb ) Ciao, Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod@unibas.ch robert.mcleod@bsse.ethz.ch <robert.mcleod@ethz.ch> robbmcleod@gmail.com

On Sun, Feb 14, 2016 at 11:19 PM, Robert McLeod <robbmcleod@gmail.com> wrote:
Not sure what you mean, since numpexpr already uses setuptools: https://github.com/pydata/numexpr/blob/master/setup.py#L22. What is the real goal you're trying to achieve? This monkeypatching is a bad idea: https://github.com/robbmcleod/numexpr/blob/numexpr-3.0/setup.py#L19. Both setuptools and numpy.distutils already do that, and that's already one too many. So you definitely don't want to add a third place.... You can use the -j (--parallel) flag to numpy.distutils instead, see http://docs.scipy.org/doc/numpy-dev/user/building.html#parallel-builds Ralf

On Mon, Feb 15, 2016 at 7:28 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Dear Ralf, Yes, this appears to be a bad idea. I was just trying to think about if I could use the more object-oriented approach that I am familiar with in setuptools to easily build wheels for Pypi. Thanks for the comments and links; I didn't know I could parallelize the numpy build. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod@unibas.ch robert.mcleod@bsse.ethz.ch <robert.mcleod@ethz.ch> robbmcleod@gmail.com

Dear Robert, thanks for your effort on improving numexpr. Indeed, vectorized math libraries (VML) can give a large boost in performance (~5x), except for a couple of basic operations (add, mul, div), which current compilers are able to vectorize automatically. With recent gcc even more functions are vectorized, see https://sourceware.org/glibc/wiki/libmvec <https://sourceware.org/glibc/wiki/libmvec> But you need special flags depending on the platform (SSE, AVX present?), runtime detection of processor capabilities would be nice for distributing binaries. Some time ago, since I lost access to Intels MKL, I patched numexpr to use Accelerate/Veclib on os x, which is preinstalled on each mac, see https://github.com/geggo/numexpr.git <https://github.com/geggo/numexpr.git> veclib_support branch. As you increased the opcode size, I could imagine providing a bit to switch (during runtime) between internal functions and vectorized ones, that would be handy for tests and benchmarks. Gregor

On Sun, Feb 14, 2016 at 11:19 PM, Robert McLeod <robbmcleod@gmail.com> wrote:
Not sure what you mean, since numpexpr already uses setuptools: https://github.com/pydata/numexpr/blob/master/setup.py#L22. What is the real goal you're trying to achieve? This monkeypatching is a bad idea: https://github.com/robbmcleod/numexpr/blob/numexpr-3.0/setup.py#L19. Both setuptools and numpy.distutils already do that, and that's already one too many. So you definitely don't want to add a third place.... You can use the -j (--parallel) flag to numpy.distutils instead, see http://docs.scipy.org/doc/numpy-dev/user/building.html#parallel-builds Ralf

On Mon, Feb 15, 2016 at 7:28 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Dear Ralf, Yes, this appears to be a bad idea. I was just trying to think about if I could use the more object-oriented approach that I am familiar with in setuptools to easily build wheels for Pypi. Thanks for the comments and links; I didn't know I could parallelize the numpy build. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod@unibas.ch robert.mcleod@bsse.ethz.ch <robert.mcleod@ethz.ch> robbmcleod@gmail.com

Dear Robert, thanks for your effort on improving numexpr. Indeed, vectorized math libraries (VML) can give a large boost in performance (~5x), except for a couple of basic operations (add, mul, div), which current compilers are able to vectorize automatically. With recent gcc even more functions are vectorized, see https://sourceware.org/glibc/wiki/libmvec <https://sourceware.org/glibc/wiki/libmvec> But you need special flags depending on the platform (SSE, AVX present?), runtime detection of processor capabilities would be nice for distributing binaries. Some time ago, since I lost access to Intels MKL, I patched numexpr to use Accelerate/Veclib on os x, which is preinstalled on each mac, see https://github.com/geggo/numexpr.git <https://github.com/geggo/numexpr.git> veclib_support branch. As you increased the opcode size, I could imagine providing a bit to switch (during runtime) between internal functions and vectorized ones, that would be handy for tests and benchmarks. Gregor
participants (3)
-
Gregor Thalhammer
-
Ralf Gommers
-
Robert McLeod