[changing the subject; was: 'where' statement in Python?]
I think this is an interesting idea (whether worth adding is a different
question). I think it would be confusing that
a[x] = (y,z)
does something entirely different when x is 1 or (1,2). If python *were* to
add something like this, I think perhaps a different syntax should be
considered:
a[[x]] = y
y = a[[x]]
which call __setitems__ and __getitems__ respectively. This makes it clear
that something different is going on and eliminates the ambiguity for dicts.
--- Bruce
http://www.vroospeak.com
http://google-gruyere.appspot.com
On Tue, Jul 20, 2010 at 10:18 AM, Sturla Molden
So I'd rather speak of something useful instead: NumPy's "Fancy indexing".
"Fancy indexing" (NumPy jargon) will in this context mean that we allow indexes to be an iterable, not just integers:
mylist[(1,2,3)] == mylist[1,2,3] mylist[iterable] == [a(i) for i in iterable]
That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++ libraries such as Blitz++). It has all the power of the "where keyword", while being more flexible to use, and intention is more explicit. It is also well tested syntax.
Thus with "fancy indexing":
alist[iterable] == [alist[i] for i in iterable]
That is what we really need!
Note that this is not a language syntax change, it is just a change of how __setitem__ and __getitem__ works for certain container types. NumPy already does this, so the syntax itself is completely valid Python. And as for "where", it is just a function.
Andrey's proposed where keyword is a crippled tool in comparison. That is, the real power of a list of indexers is that it can be obtained and manipulated with any conceivable method, e.g. slicing. It also allows numpy to have an "argsort" function, since an index list can be reused on multiple arrays:
idx = np.argsort(array_a) sorteda = array_a[idx] sortedb = array_b[idx]
is the same as
tmp = sorted([a,i for i,a in enumerate(lista)]) sorteda = [a for a,i in tmp] sortedb = [listb[i] for a,i in tmp]
Which is the more readable?
Implementing a generic "where function" can be achieved with a lambda:
idx = where(lambda x: x== 47, alist)
or a list comprehension (this would be very similar to NumPy):
idx = where([x==47 for x in alist])
But to begin with, I think we should get NumPy style "fancy indexing" to standard container types like list, tuple, string, bytes, bytearray, array and deque. That would just be a handful of subclasses, and I think they should (initially) be put in a standard library module, and possibly replace the current cointainers in Python 4000.
But as for a where keyword: My opinion is a big -1, if I have the right to vote. We should rather implement a where function and overload the mentioned container types. The where function should go in the same module.
So all in all, I am +1 for a "where module" and -1 for a "where keyword".
P.S. I'll admit that dict and set might add to some confusion, since "fancy indexing" would be ambigous for them.
Regards, Sturla
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
[changing the subject; was: 'where' statement in Python?]
a[[x]] = y y = a[[x]]
which call __setitems__ and __getitems__ respectively. This makes it clear that something different is going on and eliminates the ambiguity for dicts.
Or use the * operator used to expand tuples for fucntion calls: a[*x] = y y = a[*x] analogous to foobar(*x). The intent would be the same. S.
I'm not sure what this is about but do you mean something like this?
l=[1,2,3,4] l[1:2] = ['a','b'] l [1, 'a', 'b', 3, 4]
On 07/20/2010 09:17 PM, Bruce Leban wrote:
[changing the subject; was: 'where' statement in Python?]
I think this is an interesting idea (whether worth adding is a different question). I think it would be confusing that a[x] = (y,z) does something entirely different when x is 1 or (1,2). If python *were* to add something like this, I think perhaps a different syntax should be considered:
a[[x]] = y y = a[[x]]
which call __setitems__ and __getitems__ respectively. This makes it clear that something different is going on and eliminates the ambiguity for dicts.
Den 21.07.2010 01:38, skrev Carl M. Johnson:
Does this need new syntax? Couldn’t it just be a method? Perhaps .where()? ;-)
It is just a library issue. And adding it would not break anything, because lists and tuples don't accept iterables as indexers now. The problem is the dict and the set, which can take tuples as index. A .where() method would work, if it e.g. took a predicate as argument. But we would still need no pass the return value (e.g. a tuple) to the [] operator. That is all legal syntax today (which is why NumPy can do this), but lists are implemented to only accept integers to __setitem__ and __getitem__. Sturla
x = a[[y]] would be approximately equivalent to x = [a[i] for i in y] and a[[x]] = y would be approximately equivalent to for (i,j) in zip(x,y): a[i] = j except that zip throws away excess values in the longer sequence and I think [[..]] would throw an exception. --- Bruce http://www.vroospeak.com http://google-gruyere.appspot.com On Tue, Jul 20, 2010 at 3:51 PM, Mathias Panzenböck < grosser.meister.morti@gmx.net> wrote:
I'm not sure what this is about but do you mean something like this?
l=[1,2,3,4] l[1:2] = ['a','b'] l [1, 'a', 'b', 3, 4]
On 07/20/2010 09:17 PM, Bruce Leban wrote:
[changing the subject; was: 'where' statement in Python?]
I think this is an interesting idea (whether worth adding is a different question). I think it would be confusing that a[x] = (y,z) does something entirely different when x is 1 or (1,2). If python *were* to add something like this, I think perhaps a different syntax should be considered:
a[[x]] = y y = a[[x]]
which call __setitems__ and __getitems__ respectively. This makes it clear that something different is going on and eliminates the ambiguity for dicts.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Tue, Jul 20, 2010 at 4:43 PM, Bruce Leban
x = a[[y]] would be approximately equivalent to x = [a[i] for i in y]
You realize that syntax /already/ has a valid meaning in Python, right? Namely, using a single-element list as a subscript:
class Foo(object): ... def __getitem__(self, index): ... print "Subscript:", index ... a = Foo() y = 42 x = a[[y]] Subscript: [42] # hey, whaddya know!
Making this syntax do something else would lead to some surprising inconsistencies to say the least; albeit I don't know how common it is to use lists as subscripts. Cheers, Chris -- http://blog.rebertia.com
Den 21.07.2010 00:51, skrev Mathias Panzenböck:
I'm not sure what this is about but do you mean something like this?
l=[1,2,3,4] l[1:2] = ['a','b'] l [1, 'a', 'b', 3, 4]
No, that is slicing. A fancy index is a more flexible slice, as it has no regular structure. It's just a list, tuple or array of indexes, in arbitrary order, possibly repeated. It would e.g. work like this:
alist = [1,2,3,4] alist[(1,2,1,1,3)] [2, 3, 2, 2, 4]
If know SQL, it means that you can do with indexing what SQL can do with WHERE and JOIN. You can e.g. search a list in O(N) for indexes where a certain condition evaluates to True (cf. SQL WHERE), and then apply these indexes to any list (cf. SQL JOIN). It is not just for queries, but also for things like sorting. It is what lets NumPy have an "argsort" function. It does not return a sorted array, but an array of indices, which when applied to the array, will return a sorted instance. These indices can in turn be applied to other arrays as well. Think about what happens when you sort each row in an Excel spreadsheet by the values in a certain column. One column is sorted, the other columns are reordered synchroneously. That is the kind of thing that fancy indexing allows us to do rather easily. Yes there are other ways of doing this in Python now, but not as elegent I think. And it is not a syntax change to Python (NumPy can do it), it is just a library issue. This is at least present in NumPy, MATLAB, C# and LINQ, SQL, Fortran 95 (in two ways), Scilab, Octave, and C++ (e.g. Blitz++). The word "fancy indexing" is the name used for it in NumPy. Sturla
participants (5)
-
Bruce Leban
-
Carl M. Johnson
-
Chris Rebert
-
Mathias Panzenböck
-
Sturla Molden