A way to re-organize a list

Alex Martelli aleax at mac.com
Thu Jul 19 17:22:31 CEST 2007


beginner <zyzhu2000 at gmail.com> wrote:

> Hi Everyone,
> 
> I have a simple list reconstruction problem, but I don't really know
> how to do it.
> 
> I have a list that looks like this:
> 
> l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
> "b", 2), ("B", "a", 1), ("B", "b", 1)]
> 
> What I want to do is to reorganize it in groups, first by the middle
> element of the tuple, and then by the first element. I'd like the
> output look like this:
> 
> out=[
>    [    #group by first element "A"
>           [("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
> second element "a"
>           [ ("A", "b", 1), ("A", "b", 2)], #group by second element
> "b"
>    ],
>    [   #group by first element "B"
>           [("B", "a", 1)],
>           [("B", "b", 1)]
>    ]
> ]
> 
> 
> All the solutions I came up with are difficult to read and even harder
> to go back and change, and I just feel are too complicated for such a
> simple problem. I am wondering if anyone here has some insight.
> 
> If there is a 'functional' way to do this, it would be even greater.

If it's already sorted as in your example, itertools.groupby may be what
you want.  To pick a key, you can use operator.itemgetter, which builds
functions picking a specified N-th item of a sequence.

Consider...:

>>> l
[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2), ('B', 'a', 1), ('B', 'b', 1)]
>>> itm=operator.itemgetter
>>> gb=itertools.groupby
>>> list(gb(l, itm(0)))
[('A', <itertools._grouper object at 0x4d110>), ('B',
<itertools._grouper object at 0x4d120>)]

the items in the sequence groupby returns are (key, subsequence) pairs;
you don't care about the key, so here's a first step (grouping by first
item only):

>>> [list(ss) for k, ss in gb(l, itm(0))]
[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2)], [('B', 'a', 1), ('B', 'b', 1)]]

However, you want a second level of grouping -- so, instead of the plain
list(ss), you need to apply this concept again:

>>> [[list(sss) for k,sss in gb(ss,itm(1))] for k, ss in gb(l, itm(0))]
[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

...and there you are.

May be more readable with a little auxiliary function to capture what I
just called "this concept" with a readable name:

>>> def group(s, i): return [list(ss) for k, ss in gb(s, itm(i))]
... 
>>> [group(ss,1) for ss in group(l,0)]
[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

This does one more "list(...)" step, but if your lists aren't huge the
readability may be worth the small slow-down.


Alex



More information about the Python-list mailing list