[Tutor] reducing lists within list to their set of unique values

Steven D'Aprano steve at pearwood.info
Tue May 21 01:45:17 CEST 2013


On 21/05/13 08:49, Treder, Robert wrote:
> Hi python folks,
>
> I have a list of lists that looks something like this:
>
> tst = [ [], ['test'], ['t1', 't2'], ['t1', 't1', 't2'] ]
>
> I want to change the empty sets to a blank string, i.e., '' and the lists with repeat values to the unique set of values. So I have done the following:
>
>>>> for t in tst:
> 	if len(t) == 0:
> 		tst.__setitem__(tst.index(t), '')
> 	else:
> 		tst.__setitem__(tst.index(t), set(t))


As a general rule, if you are writing double-underscore special methods like __setitem__ directly, you're doing it wrong. (There are exceptions, but consider them "for experts".)

So instead of tst.__setitem__(a, b) you should write tst[a] = b.

But that's still the wrong way to do this! You're doing a lot of extra work with the calls to tst.index. You won't notice for a short list like the example above, but for a long list, this will get really, really slow.

The way to do this is to keep track of the index as you walk over the list, and not recalculate it by searching the list:


for index, item in enumerate(tst):
     if item == []:
         item = ""
     else:
         item = list(set(item))
     tst[index] = item


Notice that I call set() to get the unique values, then list() again to turn it back into a list. This does the job you want, but it is not guaranteed to keep the order:

py> L = ['b', 'd', 'c', 'a', 'b']
py> list(set(L))
['a', 'c', 'b', 'd']


If keeping the order is important, you cannot use set, and you'll need another way to extract only the unique values. Ask if you need help on that.



> What I get in return is
>
>>>> tst
> ['', set(['test']), set(['t2', 't1']), set(['t2', 't1'])]
>
> The empty list is fine but the other lists seem to be expressions rather than values. What do I need to do to simply get the values back liike the following?
>
> ['', ['test'], ['t2', 't1'], ['t2', 't1']]


They are values. It is just that they are *sets* rather than *lists*. When printed, lists have a nice compact representation using square brackets [], but unfortunately sets do not. However, if you upgrade to Python 3, they have been upgraded to look a little nicer:


# Python 2:
set(['a', 'c', 'b', 'd'])

# Python 3
{'d', 'b', 'c', 'a'}


Notice that the order of the items is not guaranteed, but apart from that, the two versions are the same despite the difference in print representation.



-- 
Steven


More information about the Tutor mailing list