Temporary arrays generated in expressions are expensive as the imply
extra memory bandwidth which is the bottleneck in most numpy operations.
r = a + b + c
creates the b + c temporary and then adds a to it.
This can be rewritten to be more efficient using inplace operations:
r = b + c
r += a
This saves some memory bandwidth and can speedup the operation by 50%
for very large arrays or even more if the inplace operation allows it to
be completed completely in the cpu cache.
The problem is that inplace operations are a lot less readable so they
are often only used in well optimized code. But due to pythons
refcounting semantics we can actually do some inplace conversions
If an operand in python has a reference count of one it must be a
temporary so we can use it as the destination array. CPython itself does
this optimization for string concatenations.
In numpy we have the issue that we can be called from the C-API directly
where the reference count may be one for other reasons.
To solve this we can check the backtrace until the python frame
evaluation function. If there are only numpy and python functions in
between that and our entry point we should be able to elide the temporary.
This PR implements this:
It currently only supports Linux with glibc (which has reliable
backtraces via unwinding) and maybe MacOS depending on how good their
backtrace is. On windows the backtrace APIs are different and I don't
know them but in theory it could also be done there.
A problem is that checking the backtrace is quite expensive, so should
only be enabled when the involved arrays are large enough for it to be
worthwhile. In my testing this seems to be around 180-300KiB sized
arrays, basically where they start spilling out of the CPU L2 cache.
I made a little crappy benchmark script to test this cutoff in this branch:
If you are interested you can run it with:
python setup.py build_ext -j 4 --inplace
ipython --profile=null check.ipy
At the end it will plot the ratio between elided and non-elided runtime.
It should get larger than one around 180KiB on most cpus.
If no one points out some flaw in the approach, I'm hoping to get this
into the next numpy version.
For those interested in continuing the __numpy_ufunc__ saga, there is a pull
request enabling it <https://github.com/numpy/numpy/pull/8247>. Likely we
will want to make some changes up front before merging that, so some
discussion is in order.
I built an external c-module (pygsl) using mingw 64 from msys2 mingw64-gcc compiler.
This built required some changes to numpy.distutils to get the
“python setup.py config”
“python setup.py build”
working. In this process I replaced 2 files in numpy.distutils from numpy git repository:
- numpy.dist_utils.misc_utils.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5…> on 14 Dec 2016
- numpy.dist_utils. mingw32ccompiler.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5…> on 14 Dec 2016
mingw32ccompiler.py required to be modified to get it work
n preprocessor had to be defined as I am using setup.py config
n specifying the runtime library search path to the linker
n include path of the vcrtruntime
I attached a patch reflecting the changes I had to make to file mingw32ccompile.py
If this information is useful I am happy to answer questions
PS Version infos:
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Help on module numpy.version in numpy:
full_version = '1.12.0'
git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa'
release = True
short_version = '1.12.0'
version = '1.12.0'
gcc.exe (Rev2, Built by MSYS2 project) 6.2.0
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Announcing Numexpr 2.6.2
This is a maintenance release that fixes several issues, with special
emphasis in keeping compatibility with newer NumPy versions. Also,
initial support for POWER processors is here. Thanks to Oleksandr
Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and
Antonio Valentino for their nice contributions.
In case you want to know more in detail what has changed in this
Numexpr is a fast numerical expression evaluator for NumPy. With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.
It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors. Look here for a some benchmarks of numexpr using MKL:
Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.
Where I can find Numexpr?
The project is hosted at GitHub in:
You can get the packages from PyPI as well (but not for RC releases):
Share your experience
Let us know of any bugs, suggestions, gripes, kudos, etc. you may
What is the best way to make sure that a matrix inversion makes any
sense before preforming it? I am currently struggling to understand some
results from matrix inversions in my work, and I would like to see if I
am dealing with an ill-conditioned problem. It is probably user error,
but I don't like having the possibility hanging over my head.
I naively put a call to np.linalg.cond into my code; all of my cores
went to 100% and a few minutes later I got a number. To be fair A is
6400 elements square, but this takes ~20x more time than the inversion.
This is not really practical for what I am doing, is there a better way?
This is partly in response to Ilhan Polat's post about introducing the
A\b operator to numpy. I also couldn't check the Numpy mailing list
archives to see if this has been asked before, the numpy-discussion
gmane link isn't working for me at all.
Thanks for your time,
I may be wrong, but I think that the result of the current implementation
is actually the expected one.
Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4
P([1,2]) = P( | 1st=) P() + P( | 1st=) P()
Now, P() = 0.2 and P() = 0.4. However:
P( | 1st=) = 0.5 (2 and 3 have the same sampling probability)
P( | 1st=) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once
normalised, translate into 1/3 and 2/3 respectively)
Therefore P([1,2]) = 0.7/3 = 0.23333
Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
What am I missing?
2017-01-17 13:00 GMT+01:00 <numpy-discussion-request(a)scipy.org>:
> Hi, I'm looking for a way to find a random sample of C different items out
> of N items, with a some desired probabilty Pi for each item i.
> I saw that numpy has a function that supposedly does this,
> numpy.random.choice (with replace=False and a probabilities array), but
> looking at the algorithm actually implemented, I am wondering in what sense
> are the probabilities Pi actually obeyed...
> To me, the code doesn't seem to be doing the right thing... Let me explain:
> Consider a simple numerical example: We have 3 items, and need to pick 2
> different ones randomly. Let's assume the desired probabilities for item 1,
> 2 and 3 are: 0.2, 0.4 and 0.4.
> Working out the equations there is exactly one solution here: The random
> outcome of numpy.random.choice in this case should be [1,2] at probability
> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
> a solution for the desired probabilities because it yields item 1 in
> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
> 0.2+0.6 = 0.8 = 2*P2, etc.
> However, the algorithm in numpy.random.choice's replace=False generates, if
> I understand correctly, different probabilities for the outcomes: I believe
> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333,
> and [2,3] at probability 0.53333.
> My question is how does this result fit the desired probabilities?
> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
> then the expect number of "1" results we'll get per drawing is 0.23333 +
> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for
> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
> common than item 1 as we originally desired (we asked for probabilities
> 0.2, 0.4, 0.4 for the individual items!).
> Nadav Har'El
While building the latest scipy on top of numpy 1.11.3, I have noticed
crashes while running the scipy test suite, in scipy.special (e.g. in
scipy.special hyp0f1 test).. This only happens on windows for python 3.5
(where we use MSVC 2015 compiler).
Applying some violence to distutils, I re-built numpy/scipy with debug
symbols, and the debugger claims that crashes happen inside scipy.special
ufunc cython code, when calling clog or csqrt. I first suspected a compiler
bug, but disabling those functions in numpy, to force using our own
versions in npymath, made the problem go away.
I am a bit suspicious about the whole thing as neither conda's or gholke's
wheel crashed. Has anybody else encountered this ?
Is there a simple way to fill in diagonal elements in an array for other
than main diagonal?
As far as I can see, the diagxxx functions that have offset can only read
and not inplace modify, and the functions for modifying don't have offset
and only allow changing the main diagonal.
Usecase: creating banded matrices (2-D arrays) similar to toeplitz.