newbie:unique problem

Brian van den Broek bvande at po-box.mcgill.ca
Thu Mar 17 21:47:48 CET 2005


Leeds, Mark said unto the world upon 2005-03-17 14:08:
> I have a function uniqueList that is below :
> 
>  
> 
> Def uniqueList(origList):
> 
>  
> 
>     nodups= {}
> 
>     for temp in origList:
> 
>        nodups[temp]  = None
> 
>     returns nodups.keys()
> 
>  
> 
> When used in the following context :
> 
>  
> 
> industryList = uniqueList(jpbarradata[group])
> 
>  
> 
> where jpbarradata[group] might look like
> 
>  
> 
> ["AAA BC",BBB KK","CCC TD","AAA KP","CCC TD"]
> 
>  
> 
> ,the function works in the sense that it would return
> 
>  
> 
> ["AAA BC","BBB KK","CCC TD",AAA KP"]
> 
>  
> 
> because CCC TD is duplicated.
> 
>  
> 
> But, I also want it to get rid of the AAA KP because
> 
> there are two AAA's even though the last two letters
> 
> are different. It doesn't matter to me which one
> 
> is gotten rid of but I don't know how to change
> 
> the function to handle this ? I have a feeling
> 
> it's not that hard though ? Thanks.

Hi Mark,

please turn off the HTML formatting when posting. It makes your 
message quite a lot bigger than need be, and, in this case anyway, 
makes the plain text version doubled spaced (as above) and thus a bit 
nasty to read. Thanks.

For the question:

Is order in your output important? If so, I wouldn't use a dictionary 
to store the unique items. I see why you did it, but since 
dictionaries don't have order, your output might get permuted.

How about this (don't take the naming as a model!):

def unique_up_to_n_char(orig_list, n):
     '''-> list of elements where each is unique up to the first n chars.
     '''

     # Needs Python 2.4 for set type. You could use a list, too.
     seen_leading_chars = set()

     output_list = []
     for member in orig_list:
         if member[:n] not in seen_leading_chars:
             seen_leading_chars.add(member[:n])
             output_list.append(member)
     return output_list

test_list = ["AAA BC", "BBB KK", "CCC TD", "AAA KP", "CCC TD", "AAB KP"]

print unique_up_to_n_char(test_list, 3)
print unique_up_to_n_char(test_list, 2)

which produces:
['AAA BC', 'BBB KK', 'CCC TD', 'AAB KP']
['AAA BC', 'BBB KK', 'CCC TD']

There may be still better ways. But, this is general and preserves order.

Best,

Brian vdB




More information about the Python-list mailing list