Sorting by item_in_another_list

Cameron Walsh cameron.walsh at gmail.com
Thu Oct 26 08:48:03 CEST 2006


Paul McGuire wrote:
> "J. Clifford Dyer" <jcd at sdf.lonestar.org> wrote in message 
> news:eho2kq$lqc$1 at aioe.server.aioe.org...
>> ZeD wrote:
>>> Paul Rubin wrote:
>>>
>>>>> A = [0,1,2,3,4,5,6,7,8,9,10]
>>>>> B = [2,3,7,8]
>>>>>
>>>>> desired_result = [2,3,7,8,0,1,4,5,6,9,10]
>>>> How about:
>>>>
>>>>   desired_result = B + sorted(x for x in A if x not in B)
>>> this. is. cool.
>>>
>> Cool, yes, but I'm not entirely sure it does what the OP wanted.  Partly 
>> because I'm not entirely sure what the OP wanted.  Counter example:
>>
>> Given these variables:
>>
>> A = [0,1,2,3,4,5,6,8,9,10]  # Note 7 is missing
>> B = [2,3,7,8]
>>
>> which of the following should the function yield?
>>
> 
> From the original post:
> 
> "I have two lists, A and B, such that B is a subset of A."
> 
> So this is not a case that needs to be supported.
> 
> I envisioned something like the OP had a sequence of items to start with, 
> did a random sampling from the list, and wanted to move the sampled items to 
> the front of the list.
> 
> -- Paul
> 
> 
> 

The sequence of items I had was a list of files to be processed in a 
particular order.  I noted that the order in which the files were placed 
into the list might not match the desired order of processing, and that 
the filenames might not be sortable by filename into this desired order 
(such are the vagaries of human file naming.)

So yes, Paul is correct.

However, Cliff is also correct in that the solution provided is not what 
I (the OP) wanted.  The solution provided is half right, in that it 
places the elements also in B before the elements only in A, but it 
sorts the remaining elements only in A instead of keeping the original 
order.

A more correct solution (at least in terms of answering this particular 
problem) would be:

desired_result = B + list(x for x in A if x not in B)
rather than
desired_result = B + sorted(x for x in A if x not in B)

This solution is also guaranteed to work if for some reason the sort 
algorithm cannot be guaranteed to be stable in the future or on some 
other implementation of Python.


Which brings me to the question, would this solution:

B = set(B)
A = B + list(x for x in A if x not in B)

be faster than this solution:

B = set(B)
A.sort(key=B.__contains__, reverse=True)


My guess is yes, since while the __contains__ method is only run once 
for each object, the list still does not require sorting.

Thanks everyone for all your helpful replies,

Cameron.



More information about the Python-list mailing list