[Python-ideas] 'where' statement in Python?
sturla at molden.no
Tue Jul 20 19:18:57 CEST 2010
Den 20.07.2010 16:57, skrev Andrey Popp:
> a = (b, b) where b = 43
I am +1 for a where module and -1 for a where keyword, and here is the
In MATLAB, we have the "find" function that serves the role of where. In
NumPy, we have a function numpy.where and also masked arrays.
The above statement with NumPy ndarrays would be:
idx, = np.where(b == 47)
a = (b[idx], b[idx])
or we could simply do this:
a = (b[b==47], b[b==47])
And if we look at this proposed expression,
mylist = [y for y in another_list if y < 5 where y == f(x)]
Using NumPy, we can proceed like this:
idx, = np.where(another_array == f(x))
mylist = [y for y in another_array[idx] if y < 5]
The intension is just as clear, and it avoids a new "where" keyword. It
is also similar to NumPy and Matlab. Not to mention that the "where
keyword" in the above expression could be replaced with an "and", so it
serve no real purpose:
mylist = [y for y in another_list if (y < 5 and y == f(x))]
why have a where keyword here? It is just redundant.
So I'd rather speak of something useful instead: NumPy's "Fancy indexing".
"Fancy indexing" (NumPy jargon) will in this context mean that we allow
indexes to be an iterable, not just integers:
mylist[(1,2,3)] == mylist[1,2,3]
mylist[iterable] == [a(i) for i in iterable]
That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++
libraries such as Blitz++). It has all the power of the "where keyword",
while being more flexible to use, and intention is more explicit. It is
also well tested syntax.
Thus with "fancy indexing":
alist[iterable] == [alist[i] for i in iterable]
That is what we really need!
Note that this is not a language syntax change, it is just a change of
how __setitem__ and __getitem__ works for certain container types. NumPy
already does this, so the syntax itself is completely valid Python. And
as for "where", it is just a function.
Andrey's proposed where keyword is a crippled tool in comparison. That
is, the real power of a list of indexers is that it can be obtained and
manipulated with any conceivable method, e.g. slicing. It also allows
numpy to have an "argsort" function, since an index list can be reused
on multiple arrays:
idx = np.argsort(array_a)
sorteda = array_a[idx]
sortedb = array_b[idx]
is the same as
tmp = sorted([a,i for i,a in enumerate(lista)])
sorteda = [a for a,i in tmp]
sortedb = [listb[i] for a,i in tmp]
Which is the more readable?
Implementing a generic "where function" can be achieved with a lambda:
idx = where(lambda x: x== 47, alist)
or a list comprehension (this would be very similar to NumPy):
idx = where([x==47 for x in alist])
But to begin with, I think we should get NumPy style "fancy indexing" to
standard container types like list, tuple, string, bytes, bytearray,
array and deque. That would just be a handful of subclasses, and I think
they should (initially) be put in a standard library module, and
possibly replace the current cointainers in Python 4000.
But as for a where keyword: My opinion is a big -1, if I have the right
to vote. We should rather implement a where function and overload the
mentioned container types. The where function should go in the same module.
So all in all, I am +1 for a "where module" and -1 for a "where keyword".
P.S. I'll admit that dict and set might add to some confusion, since
"fancy indexing" would be ambigous for them.
More information about the Python-ideas