is there a C-API function for numpy which implements Python's
multidimensional indexing? Say, I have a 2d-array
PyArrayObject * M;
and an index
how do I extract the i-th row or column M[i,:] respectively M[:,i]?
I am looking for a function which gives again a PyArrayObject * and
which is a view to M (no copied data; the result should be another
PyArrayObject whose data and strides points to the correct memory
portion of M).
I searched the API documentation, Google and mailing lists for quite a
long time but didn't find anything. Can you help me?
I'm trying to do something that at first glance I think should be simple
but I can't quite figure out how to do it. The problem is as follows:
I have a 3D grid Values[Nx, Ny, Nz]
I want to slice Values at a 2D surface in the Z dimension specified by
Z_index[Nx, Ny] and return a 2D slice[Nx, Ny].
It is not as simple as Values[:,:,Z_index].
I tried this:
(4, 5, 6)
>>> slice = values[:,:,coords]
(4, 5, 4, 5)
>>> slice = np.take(values, coords, axis=2)
(4, 5, 4, 5)
Obviously I could create an empty 2D slice and then fill it by using
np.ndenumerate to fill it point by point by selecting values[i, j,
Z_index[i, j]]. This just seems too inefficient and not very pythonic.
On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner <cmkleffner(a)gmail.com> wrote:
> a discussion about OpenBLAS on the octave maintainer list:
I'm getting the impression that OpenBLAS is being both a tantalizing
opportunity and a practical thorn-in-the-side for everyone -- Python,
Octave, Julia, R.
How crazy would it be to get together an organized effort to fix this
problem -- "OpenBLAS for everyone"? E.g., by collecting patches to fix
the bits we don't like (like unhelpful build system defaults),
applying more systematic QA, etc. Ideally this could be done upstream,
but if upstream is MIA or disagrees about OpenBLAS's goals, then it
could be maintained as a kind of "OpenBLAS++" that merges regularly
from upstream (compare to  for successful projects handled in
this way). If hardware for testing is a problem, then I suspect
NumFOCUS would be overjoyed to throw a few kilodollars at buying one
instance of each widely-distributed microarchitecture released in the
last few years as a test farm...
I think the goal is pretty clear: a modern optionally-multithreaded
BLAS under a BSD-like license with a priority on correctness,
out-of-the-box functionality (like runtime configurability and feature
detection), speed, and portability, in that order.
I unfortunately don't have the skills to actually lead such an effort
(I've never written a line of asm in my life...), but surely our
collective communities have people who do?
 http://www.eglibc.org/mission (a "friendly fork" of glibc holding
stuff that Ulrich Drepper got cranky about, which eventually was
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
following ipython session illustrates the issue:
ln : import numpy as np
In : x = np.linspace(1, 10, 50000000)
In : %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In : %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.
Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?
Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .
Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very complicated),
but the (apparent) poor performance of npz files if a deal breaker.
I'm trying to compile 32-bit numpy on a 64 bit Centos 6 system, but fails with the message:
"Broken toolchain: cannot link a simple C program"
It get's the compile flags right, but not the linker:
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -m32 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/python/ia32/include/python2.7 -c'
gcc -pthread _configtest.o -o _configtest
_configtest.o: could not read symbols: File in wrong format
collect2: ld returned 1 exit status
I'm bulding it using a 32bit python build that I compiled on the same system.
OPT="-m32" FOPT="-m32" python setup.py build
setarch x86_64 -B python setup.py build
But with the same results.
Someone worked around this by altering ccompiler.py, but I'm trying to find a cleaner solution.
Is there a standard way of doing this?
Announcing HDF5 for Python (h5py) 2.3.0
The h5py team is happy to announce the availability of h5py 2.3.0 (final).
Thanks to everyone who provided beta feedback!
The h5py package is a Pythonic interface to the HDF5 binary data format.
It lets you store huge amounts of numerical data, and easily manipulate
that data from NumPy. For example, you can slice into multi-terabyte
datasets stored on disk, as if they were real NumPy arrays. Thousands of
datasets can be stored in a single file, categorized and tagged however
This release introduces some important new features, including:
* Support for arbitrary vlen data
* Improved exception messages
* Improved setuptools support
* Multiple additions to the low-level API
* Improved support for MPI features
* Single-step build for HDF5 on Windows
Major fixes since beta:
* LZF compression crash on Win64
* Unhelpful error message relating to chunked storage
* Import error for IPython completer on certain platforms
A complete description of changes is available online:
Where to get it
Downloads, documentation, and more are available at the h5py website:
The h5py package relies on third-party testing and contributions. For the
2.3 release, thanks especially to:
* Martin Teichmann
* Florian Rathgerber
* Pierre de Buyl
* Thomas Caswell
* Andy Salnikov
* Darren Dale
* Robert David Grant
* Toon Verstraelen
* Many others who contributed bug reports and testing
It's been a while since the last datetime and timezones discussion thread was visited (linked below):
It looks like the best approach to follow is the UTC only approach in the linked thread with an optional flag to indicate the timezone (to avoid confusing applications where they don't expect any timezone info). Since this is slightly more useful than having just a naive datetime64 package and would be open to extension if required, it's probably the best way to start improving the datetime64 library.
If we do wish to have full timezone support it would very likely lead to performance drops (as reasoned in the thread) and we would need to have a dedicated, maintained tzinfo package, at which point it would make much more sense to just incorporate the pytz library. (I also don't have the expertise to implement this, so I would be unable to help resolve the current logjam)
I would like to start writing a NEP for this followed by implementation, however I'm not sure what the format etc. is, could someone direct me to a page where this information is provided?
Please let me know if there are any ideas, comments etc.