[Tutor] reducing lists within list to their set of unique values

Treder, Robert Robert.Treder at morganstanley.com
Tue May 21 15:31:08 CEST 2013


> Message: 6
> Date: Tue, 21 May 2013 09:45:17 +1000
> From: Steven D'Aprano <steve at pearwood.info>
> To: tutor at python.org
> Subject: Re: [Tutor] reducing lists within list to their set of unique
>	values
> Message-ID: <519AB58D.9020206 at pearwood.info>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 21/05/13 08:49, Treder, Robert wrote:
>> Hi python folks,
>>
>> I have a list of lists that looks something like this:
>>
>> tst = [ [], ['test'], ['t1', 't2'], ['t1', 't1', 't2'] ]
>>
>> I want to change the empty sets to a blank string, i.e., '' and the lists with repeat values to the unique set of values. So I have done the >> following:
>>
>>>>> for t in tst:
>> 	if len(t) == 0:
>> 		tst.__setitem__(tst.index(t), '')
>> 	else:
>> 		tst.__setitem__(tst.index(t), set(t))
>
>
> As a general rule, if you are writing double-underscore special methods like __setitem__ directly, you're doing it wrong. (There are
> exceptions, but consider them "for experts".)
>
> So instead of tst.__setitem__(a, b) you should write tst[a] = b.
>
> But that's still the wrong way to do this! You're doing a lot of extra work with the calls to tst.index. You won't notice for a short list like > the example above, but for a long list, this will get really, really slow.
>
> The way to do this is to keep track of the index as you walk over the list, and not recalculate it by searching the list:
>
>
> for index, item in enumerate(tst):
>     if item == []:
>         item = ""
>     else:
>         item = list(set(item))
>     tst[index] = item
>
>
> Notice that I call set() to get the unique values, then list() again to turn it back into a list. This does the job you want, but it is not 
> guaranteed to keep the order:
>
> py> L = ['b', 'd', 'c', 'a', 'b']
> py> list(set(L))
> ['a', 'c', 'b', 'd']
>
>
> If keeping the order is important, you cannot use set, and you'll need another way to extract only the unique values. Ask if you need help on 
> that.

Thanks, Steven. Very helpful. It looks like the order is only changed on the inner list that set() is applied to, not on the outer list since the outer list order is controlled by index. For this application I don't care about the order of the inner lists. However there are other applications where that will be import. Can you please describe the alternate method for extracting the unique values that maintains order. 

Thanks, 
Bob

>
>
>
>> What I get in return is
>>
>>>>> tst
>> ['', set(['test']), set(['t2', 't1']), set(['t2', 't1'])]
>>
>> The empty list is fine but the other lists seem to be expressions rather than values. What do I need to do to simply get the values back 
>> liike the following?
>>
>> ['', ['test'], ['t2', 't1'], ['t2', 't1']]
>
>
> They are values. It is just that they are *sets* rather than *lists*. When printed, lists have a nice compact representation using square 
> brackets [], but unfortunately sets do not. However, if you upgrade to Python 3, they have been upgraded to look a little nicer:
>
>
> # Python 2:
> set(['a', 'c', 'b', 'd'])
> 
> # Python 3
> {'d', 'b', 'c', 'a'}
>
>
> Notice that the order of the items is not guaranteed, but apart from that, the two versions are the same despite the difference in print 
> representation.
>
> -- 
> Steven




--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.


More information about the Tutor mailing list