[Python-ideas] 'where' statement in Python?

Tue Jul 20 19:18:57 CEST 2010

Den 20.07.2010 16:57, skrev Andrey Popp:
>      a = (b, b) where b = 43
>

I am +1 for a where module and -1 for a where keyword, and here is the 
reason:

In MATLAB, we have the "find" function that serves the role of where. In 
NumPy, we have a function numpy.where and also masked arrays.

The above statement with NumPy ndarrays would be:

    idx, = np.where(b == 47)
    a = (b[idx], b[idx])

or we could simply do this:

    a = (b[b==47], b[b==47])

And if we look at this proposed expression,

    mylist = [y for y in another_list if y < 5 where y == f(x)]

Using NumPy, we can proceed like this:

    idx, = np.where(another_array == f(x))
    mylist = [y for y in another_array[idx] if y < 5]

The intension is just as clear, and it avoids a new "where" keyword. It 
is also similar to NumPy and Matlab. Not to mention that the "where 
keyword" in the above expression could be replaced with an "and", so it 
serve no real purpose:

    mylist = [y for y in another_list if (y < 5 and y == f(x))]

why have a where keyword here? It is just redundant.

So I'd rather speak of something useful instead: NumPy's "Fancy indexing".

"Fancy indexing" (NumPy jargon) will in this context mean that we allow 
indexes to be an iterable, not just integers:

    mylist[(1,2,3)] ==  mylist[1,2,3]
    mylist[iterable] == [a(i) for i in iterable]

That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++ 
libraries such as Blitz++). It has all the power of the "where keyword", 
while being more flexible to use, and intention is more explicit. It is 
also well tested syntax.

Thus with "fancy indexing":

    alist[iterable] == [alist[i] for i in iterable]

That is what we really need!

Note that this is not a language syntax change, it is just a change of 
how __setitem__ and __getitem__ works for certain container types. NumPy 
already does this, so the syntax itself is completely valid Python. And 
as for "where", it is just a function.

Andrey's proposed where keyword is a crippled tool in comparison. That 
is, the real power of a list of indexers is that it can be obtained and 
manipulated with any conceivable method, e.g. slicing. It also allows 
numpy to have an "argsort" function, since an index list can be reused 
on multiple arrays:

    idx = np.argsort(array_a)
    sorteda  = array_a[idx]
    sortedb = array_b[idx]

is the same as

    tmp = sorted([a,i for i,a in enumerate(lista)])
    sorteda = [a for a,i in tmp]
    sortedb = [listb[i] for a,i in tmp]

Which is the more readable?

Implementing a generic "where function" can be achieved with a lambda:

    idx = where(lambda x:  x== 47, alist)

or a list comprehension (this would be very similar to NumPy):

    idx = where([x==47 for x in alist])

But to begin with, I think we should get NumPy style "fancy indexing" to 
standard container types like list, tuple, string, bytes, bytearray, 
array and deque. That would just be a handful of subclasses, and I think 
they should (initially) be put in a standard library module, and 
possibly replace the current cointainers in Python 4000.

But as for a where keyword: My opinion is a big -1, if  I have the right 
to vote. We should rather implement a where function and overload the 
mentioned container types. The where function should go in the same module.

So all in all, I am +1 for a "where module" and -1 for a "where keyword".

P.S. I'll admit that dict and set might add to some confusion, since 
"fancy indexing" would be ambigous for them.

Regards,
Sturla