is there a C-API function for numpy which implements Python's
multidimensional indexing? Say, I have a 2d-array
PyArrayObject * M;
and an index
how do I extract the i-th row or column M[i,:] respectively M[:,i]?
I am looking for a function which gives again a PyArrayObject * and
which is a view to M (no copied data; the result should be another
PyArrayObject whose data and strides points to the correct memory
portion of M).
I searched the API documentation, Google and mailing lists for quite a
long time but didn't find anything. Can you help me?
I'm trying to do something that at first glance I think should be simple
but I can't quite figure out how to do it. The problem is as follows:
I have a 3D grid Values[Nx, Ny, Nz]
I want to slice Values at a 2D surface in the Z dimension specified by
Z_index[Nx, Ny] and return a 2D slice[Nx, Ny].
It is not as simple as Values[:,:,Z_index].
I tried this:
(4, 5, 6)
>>> slice = values[:,:,coords]
(4, 5, 4, 5)
>>> slice = np.take(values, coords, axis=2)
(4, 5, 4, 5)
Obviously I could create an empty 2D slice and then fill it by using
np.ndenumerate to fill it point by point by selecting values[i, j,
Z_index[i, j]]. This just seems too inefficient and not very pythonic.
I'm pleased to announce the availability of the first release candidate of
Please try this RC and report any issues here: Pandas
We will be releasing officially in about 2 weeks or so.
This is a major release from 0.13.1 and includes a small number of API
changes, several new features, enhancements, and
performance improvements along with a large number of bug fixes.
- Officially support Python 3.4
- SQL interfaces updated to use sqlalchemy,
- Display interface changes
- MultiIndexing Using Slicers
- Ability to join a singly-indexed DataFrame with a multi-indexed
- More consistency in groupby results and more flexible groupby
- Holiday calendars are now supported in CustomBusinessDay
- Several improvements in plotting functions, including: hexbin, area
and pie plots.
- Performance doc section on I/O operations
Since there are some significant changes in the default way DataFrames are
displayed. I have put
up a comment issue looking for some feedback
Here are the full whatsnew and documentation links:
v0.14.0 Documentation Page<http://pandas-docs.github.io/pandas-docs-travis/>
Source tarballs, and windows builds are available here:
Pandas v0.14rc1 Release <https://github.com/pydata/pandas/releases>
A big thank you to everyone who contributed to this release!
On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner <cmkleffner(a)gmail.com> wrote:
> a discussion about OpenBLAS on the octave maintainer list:
I'm getting the impression that OpenBLAS is being both a tantalizing
opportunity and a practical thorn-in-the-side for everyone -- Python,
Octave, Julia, R.
How crazy would it be to get together an organized effort to fix this
problem -- "OpenBLAS for everyone"? E.g., by collecting patches to fix
the bits we don't like (like unhelpful build system defaults),
applying more systematic QA, etc. Ideally this could be done upstream,
but if upstream is MIA or disagrees about OpenBLAS's goals, then it
could be maintained as a kind of "OpenBLAS++" that merges regularly
from upstream (compare to  for successful projects handled in
this way). If hardware for testing is a problem, then I suspect
NumFOCUS would be overjoyed to throw a few kilodollars at buying one
instance of each widely-distributed microarchitecture released in the
last few years as a test farm...
I think the goal is pretty clear: a modern optionally-multithreaded
BLAS under a BSD-like license with a priority on correctness,
out-of-the-box functionality (like runtime configurability and feature
detection), speed, and portability, in that order.
I unfortunately don't have the skills to actually lead such an effort
(I've never written a line of asm in my life...), but surely our
collective communities have people who do?
 http://www.eglibc.org/mission (a "friendly fork" of glibc holding
stuff that Ulrich Drepper got cranky about, which eventually was
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
following ipython session illustrates the issue:
ln : import numpy as np
In : x = np.linspace(1, 10, 50000000)
In : %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In : %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.
Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?
Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .
Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very complicated),
but the (apparent) poor performance of npz files if a deal breaker.
I was just having a new look into the mess that is, imo, the support for automatic
line ending recognition in genfromtxt, and more generally, the Python file openers.
I am glad at least reading gzip files is no longer entirely broken in Python3, but
actually detecting in particular “old Mac” style CR line endings currently only work
for uncompressed and bzip2 files under 2.6/2.7.
This is largely because genfromtxt wants to open everything in binary mode,
which arguably makes no sense for ASCII text files with numbers. I think the
only reason this works in 2.x is that the ‘U’ reading mode overrides the ‘b’.
So on the Python side what actually works for automatic line ending detection is:
Python 2.6 2.7 3.2 3.3/3.4
uncompressed: U U t t
gzip: E N E t
bzip2: U U E t*
lzma: - - - t*
U - works with mode ‘rU’
E - mode ‘rU’ raises an error
N - mode ‘rU’ is accepted, but does not detect CR (‘\r’) line endings
(actually I think ‘U’ is simply internally discarded by gzip.open() in 2.7.4+)
t - works with mode ‘rt’ (default with plain open())
- * means requires the '.open()' rather than the '.XXXFile()' method of bz2/lzma
Therefore I’d propose the changes in
to extend universal newline recognition as far as possible with the above openers.
There are some potential issues with this:
1. Switching to ‘rt’ mode for Python3.x means that np.lib._datasource.open() does not
return byte strings by itself, so genfromtxt has to use asbytes() on the returned lines.
Since this occurs only in two places, I don’t see a major problem with this.
2. In the tests I had to work around the lack of fileobj support in bz2.BZ2File by using
os.system(‘bzip2 …’) on the temporary file, which might not work on all systems.
In particular I’d expect it to fail under Windows, but it’s not clear to me how far the entire
mkstemp thing works under Windows...
As a final note, http://bugs.python.org/issue13989#msg153127 suggests a workaround
that might make this work with gzip.open() (and perhaps bz2?) on 3.2 as well.
I am not sure how high 3.2 support is ranking for the near future; for the moment I am not
strongly inclined to implement it…
Grateful for comments or tests (especially under Windows!) of the commit(s) above -
As one of the co-chairs in charge of organizing the birds-of-a-feather
sesssions at the SciPy conference this year, I wanted to solicit through
the NumPy list to see if we could get enough interest to hold a NumPy
centered BoF this year. The BoF format would be up to those who would lead
the discussion, a couple of ideas used in the past include picking out a
few of the lead devs to be on a panel and have a Q&A type of session or an
open Q&A with perhaps audience guided list of topics. I can help
facilitate organization of something but we would really like to get
something organized this year (last year NumPy was the only major project
that was not really represented in the BoF sessions).
Kyle Manldi (and via proxy Matt McCormick)