Den 20.07.2010 16:57, skrev Andrey Popp:

`a = (b, b) where b = 43`

I am +1 for a where module and -1 for a where keyword, and here is the reason:

In MATLAB, we have the "find" function that serves the role of where. In NumPy, we have a function numpy.where and also masked arrays.

The above statement with NumPy ndarrays would be:

idx, = np.where(b == 47) a = (b[idx], b[idx])

or we could simply do this:

a = (b[b==47], b[b==47])

And if we look at this proposed expression,

mylist = [y for y in another_list if y < 5 where y == f(x)]

Using NumPy, we can proceed like this:

idx, = np.where(another_array == f(x)) mylist = [y for y in another_array[idx] if y < 5]

The intension is just as clear, and it avoids a new "where" keyword. It is also similar to NumPy and Matlab. Not to mention that the "where keyword" in the above expression could be replaced with an "and", so it serve no real purpose:

mylist = [y for y in another_list if (y < 5 and y == f(x))]

why have a where keyword here? It is just redundant.

So I'd rather speak of something useful instead: NumPy's "Fancy indexing".

"Fancy indexing" (NumPy jargon) will in this context mean that we allow indexes to be an iterable, not just integers:

mylist[(1,2,3)] == mylist[1,2,3] mylist[iterable] == [a(i) for i in iterable]

That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++ libraries such as Blitz++). It has all the power of the "where keyword", while being more flexible to use, and intention is more explicit. It is also well tested syntax.

Thus with "fancy indexing":

alist[iterable] == [alist[i] for i in iterable]

That is what we really need!

Note that this is not a language syntax change, it is just a change of how __setitem__ and __getitem__ works for certain container types. NumPy already does this, so the syntax itself is completely valid Python. And as for "where", it is just a function.

Andrey's proposed where keyword is a crippled tool in comparison. That is, the real power of a list of indexers is that it can be obtained and manipulated with any conceivable method, e.g. slicing. It also allows numpy to have an "argsort" function, since an index list can be reused on multiple arrays:

idx = np.argsort(array_a) sorteda = array_a[idx] sortedb = array_b[idx]

is the same as

tmp = sorted([a,i for i,a in enumerate(lista)]) sorteda = [a for a,i in tmp] sortedb = [listb[i] for a,i in tmp]

Which is the more readable?

Implementing a generic "where function" can be achieved with a lambda:

idx = where(lambda x: x== 47, alist)

or a list comprehension (this would be very similar to NumPy):

idx = where([x==47 for x in alist])

But to begin with, I think we should get NumPy style "fancy indexing" to standard container types like list, tuple, string, bytes, bytearray, array and deque. That would just be a handful of subclasses, and I think they should (initially) be put in a standard library module, and possibly replace the current cointainers in Python 4000.

But as for a where keyword: My opinion is a big -1, if I have the right to vote. We should rather implement a where function and overload the mentioned container types. The where function should go in the same module.

So all in all, I am +1 for a "where module" and -1 for a "where keyword".

P.S. I'll admit that dict and set might add to some confusion, since "fancy indexing" would be ambigous for them.

Regards, Sturla