Proposal : allowing grouping by relation
Hi, I am using groupby (from itertools) to group objects by a key. It would be very useful for me to be able to group objects by the relation of two consecutive objects or by an object relation to the first object in the current group. I think it should be done by adding a "relation" keyword to the function, that accept two argument functions that return true or false. It would also be useful to create a function that enable easily creating relation functions. (Like attrgetter does for keys) ls = "aaabcdddefgjklm" groupby(ls, relation=difference(3,key = ord)) #[['a', 'a', 'a', 'b', 'c', 'd', 'd', 'd', 'e', 'f', 'g'], ['j', 'k', 'l', 'm']] I think in this case the function won't return a key-group tuple, but just a group iterable. This is already very useful for me, to group event objects in a list if they are close enough in time. I wrote most of what I had in mind here: https://github.com/tomirendo/Grouper
Yotam Vaknin writes:
Hi,
I am using groupby (from itertools) to group objects by a key. It would be very useful for me to be able to group objects by the relation of two consecutive objects or by an object relation to the first object in the current group.
I don't think you need to extend groupby. You can just cache the object to compare to. How about a decorator like def relate_to_first(is_related): def wrapped(this, _first=[]): if _first: if is_related(this, _first[0]): pass else: _first[0] = this _first[1] += 1 else: _first[0] = this _first[1] = 0 return _first[1] return wrapped @relate_to_first def some_relation(this, that): pass and similarly for a decorator relate_to_last? There are probably more elegant ways to do this, such as a class whose instances are callable. Such a class could also provide a reset method so you could reuse the relation Warning: that code is untested.
On Aug 23, 2014, at 12:23, Yotam Vaknin <tomirendo@gmail.com> wrote:
Hi,
I am using groupby (from itertools) to group objects by a key. It would be very useful for me to be able to group objects by the relation of two consecutive objects or by an object relation to the first object in the current group.
I think it should be done by adding a "relation" keyword to the function, that accept two argument functions that return true or false.
This _should be_ easy to write as a wrapper around groupby with a key that checks your relation. But there's one problem: groupby checks the _first_ key in a group against each new key, instead of the most recent one. I wrote a blog post last year about this (http://stupidpythonideas.blogpost.com/2014/01/grouping-into-runs-of-adjacent...). It turns our to be pretty easy if your relation is symmetric, but only one of the obvious ways to do it actually works. Anyway, it might be worth changing groupby so it never compares x==y instead of y==x, and making the C implementation and the Python equivalent in the docs actually equivalent. Beyond that, I think it might make sense to add a relation_to_key function and/or to change cmp_to_key so it's directly usable with groupby. Then, it should be possible to make groupby_relation into a 3-line wrapper around groupby, in which case I think it might be better as a recipe (and submitted to more_itertools on PyPI) than to add it to itertools itself.
It would also be useful to create a function that enable easily creating relation functions. (Like attrgetter does for keys) ls = "aaabcdddefgjklm" groupby(ls, relation=difference(3,key = ord)) #[['a', 'a', 'a', 'b', 'c', 'd', 'd', 'd', 'e', 'f', 'g'], ['j', 'k', 'l', 'm']]
I think in this case the function won't return a key-group tuple, but just a group iterable.
The key actually can be useful here. You can use it as a label for the "column". Especially if you've written your key function so it keeps track of both the first and most recent values, instead of just the most recent, so you can label it "a-_", where that _ is the current value at any given point, and the last value once you've consumed the group iterator. Sure, you _could_ recover that information from the group itself if you need it, but isn't it even easier to discard it if you don't need it?
This is already very useful for me, to group event objects in a list if they are close enough in time.
I wrote most of what I had in mind here: https://github.com/tomirendo/Grouper
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (3)
-
Andrew Barnert -
Stephen J. Turnbull -
Yotam Vaknin